Multi-Armed Bandit

MAB

A single-step (decision point) problem that differs from RL in that it has no state and focuses on finding the balance between exploration and exploitation. Exploit means utilizing items that resulted in high returns, and exploration means trying new items.

Multi-armed bandit

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a decision maker iteratively selects one of multiple fixed choices when the properties of each choice are only partially known at the time of allocation, and may become better understood as time passes. A fundamental aspect of bandit problems is that choosing an arm does not affect the properties of the arm or other arms.

https://en.wikipedia.org/wiki/Multi-armed_bandit

Multi-Armed Bandit

MAB

Backlinks

Recommendations