Optimal Model Design OMD optimizes expected rewards by directly updating model parameters, where the Q-function is implicitly differentiated with respect to model parameters through Implicit Differentiation arxiv.orghttps://arxiv.org/pdf/2106.03273