### MCTS

Approximate tree search with sampling which is faster than brute force. MCTS is for inference, not training.

Alpha Go 에서 사용

#### How to build a tree

We don’t train tree policy here, this is kind of inference step for game.

- Selection: Traverse a tree using slow tree policy until it needs to expand a new node.

- Expansion:

- Simulation: Run a simulation using a fast default policy and get reward (win/lose), evaluate the new state using default policy.

- Backpropagation

After iteration, most frequently visited action will be chosen.

We give a change by adding incentive to barely visited node.

### Efficient MCTS

Pruned out less likely actions by only search plausible states (reduced breadth) and reduced depth by estimate expected results (win/lose) without simulation.

Specifically, reduced depth with value network and reduced breadth with policy network.

encourages exploitation vs encourages exploration