Optimal Model Design

OMD optimizes expected rewards by directly updating model parameters, where the Q-function is implicitly differentiated with respect to model parameters through Implicit Differentiation
arxiv.org
https://arxiv.org/pdf/2106.03273
Seonglae Cho
Seonglae Cho