Implicit Q Learning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 1 2:16
Editor
Edited
Edited
2024 Jun 16 11:53
Refs
Refs
CQL

IQL

SARSA
style but use only good actions using loss

Expectile regression

Prediction tends to map higher targets
notion image
notion image
  • Prediction is larger than target → small loss → prediction stays large
  • prediction is smaller than target → larger loss → prediction becomes larger
 
 

Properties

  • Avoids training on any OOD actions!
  • Policy (still) only trained on actions in data
 

Implementation

  • two hyperparameters compared to
    CQL
    has 1 which means hard to tune
  • once converged, extract using
    AWR
 
 
 
 
 
 

Recommendations