DoubleDQN

Creator

Created

2024 Apr 23 18:3

Editor

Edited

2024 May 7 17:14

Refs

Use the current network for action selection and the

RL Target Network for action evaluation to de-correlate errors in action selection and evaluation

Learn 2 Q-functions and choose the minimum as target

선택과 가치추정을 분리하는 게 아니라 두 추정값중 최소를 사용하여 전반적으로 줄

//////////