Implicit Q Learning

Creator

Creator

Seonglae Cho

Created

Created

2024 May 1 2:16

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Jun 16 11:53

Refs

Refs

IQL

SARSA style but use only good actions using loss

Expectile regression

Prediction tends to map higher targets

notion image

notion image

Prediction is larger than target → small loss → prediction stays large

prediction is smaller than target → larger loss → prediction becomes larger

Properties

Avoids training on any OOD actions!

Policy (still) only trained on actions in data

Implementation

two hyperparameters compared to
CQL has 1 which means hard to tune

once converged, extract using
AWR

Recommendations

//////////