Thompson sampling

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 10 20:41
Editor
Edited
Edited
2025 Jul 10 20:45
Refs
Refs
When you have multiple options (e.g., several slot machines, advertisements, types of medicine, etc.) and don't know the success probability of each, this is a method to experimentally find "which one works best?"
It manages each option's success probability as a "probability distribution" (e.g., beta distribution), then randomly samples one probability from each option's distribution. Based on the results, it updates the distribution for that option.
 
 
 
Thompson sampling
Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration–exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
Thompson sampling
 
 
 

Recommendations