Adaptive Branching
ARC-AGI2 from 23 to 30
At each step, a choice is made between "creating a new answer (breadth)" vs "refining an existing answer (depth)", as well as deciding "which LLM (model) to assign the task". This selection is made using Thompson sampling, which considers performance history to make decisions.
Each node (answer) receives a score (Reward) based on "how well this answer solved the problem"