Rapid motor adaptation
conservative suboptimal when you apply domain randomization
They introduced environment factor, and then they find best policy fit in to the factor given that environment factor encoder.
However wee need a separate environment predictor (adaptation module to minimize reconstruction loss) based on the history since the env factor is not available in the real world.