Simple but powerful
Use the current network for action selection and the RL Target Network for action evaluation to de-correlate errors in action selection and evaluation
Clipped Double Q-learning
Learn 2 Q-functions and choose the minimum as target
선택과 가치추정을 분리하는 게 아니라 두 추정값중 최소를 사용하여 전반적으로 줄