deterministic policy easy to fall into noisy overfittingdeterministic policy cannot catch well in multi modality