Thompson sampling

When you have multiple options (e.g., several slot machines, advertisements, types of medicine, etc.) and don't know the success probability of each, this is a method to experimentally find "which one works best?"

It manages each option's success probability as a "probability distribution" (e.g., beta distribution), then randomly samples one probability from each option's distribution. Based on the results, it updates the distribution for that option.

Thompson sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration–exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

https://en.wikipedia.org/wiki/Thompson_sampling

Thompson sampling

Recommendations