Contextual Bandit Model

Created
Created
2025 Mar 5 12:32
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Dec 29 13:14
A problem where an agent observes context, selects an action once, and receives a reward based on that action.
  • Action space
  • Reward design
    • Short term vs long term objectives
Naively, the bandit has to try every possible combination of {item x explanation} many times, before being able to exploit the best combination.
notion image
 
 
 
BaRT: Explore, Exploit, Explain
Distribution aware rewards
 
 

Recommendations