Goal classifier

Idea is that adding states that RL visits as negative examples for the classifier

Use final states as success state examples → train binary classifier

VQ-GAN

We want goal classifier to match goal state distribution

Goal is to slightly difference from

Behavior Cloning (match expert state-action distribution)

typically sample positive and negative half and half for make expectation 0.5

with demonstrating trajectories