MAB
A single-step (decision point) problem that differs from RL in that it has no state and focuses on finding the balance between exploration and exploitation. Exploit means utilizing items that resulted in high returns, and exploration means trying new items.
Multi-armed bandit
In probability theory and machine learning, the multi-armed bandit problem is a problem in which a decision maker iteratively selects one of multiple fixed choices when the properties of each choice are only partially known at the time of allocation, and may become better understood as time passes. A fundamental aspect of bandit problems is that choosing an arm does not affect the properties of the arm or other arms.
https://en.wikipedia.org/wiki/Multi-armed_bandit


Seonglae Cho