MCTS
Approximate tree search with sampling which is faster than brute force. MCTS is for inference, not training.
Alpha Go 에서 사용
How to build a tree
We don’t train tree policy here, this is kind of inference step for game.
- Selection: Traverse a tree using slow tree policy until it needs to expand a new node.
- Expansion:
- Simulation: Run a simulation using a fast default policy and get reward (win/lose), evaluate the new state using default policy.
- Backpropagation
After iteration, most frequently visited action will be chosen.
We give a change by adding incentive to barely visited node.
Efficient MCTS
Pruned out less likely actions by only search plausible states (reduced breadth) and reduced depth by estimate expected results (win/lose) without simulation.
Specifically, reduced depth with value network and reduced breadth with policy network.
encourages exploitation vs encourages exploration