YSU RL HW1

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 23 13:7
Editor
Edited
Edited
2024 Mar 30 12:50
Refs
Refs
eval length 짧은 건 넘어져서 unhealthy
내 비법
  • activation control
  • dropout

Ant default

python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Ant.pkl \ --env_name Ant-v4 --exp_name bc_ant --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Ant-v4.pkl \ --video_log_freq -1 --ep_len 1000 --eval_batch_size 5000

Cheetah default

python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/HalfCheetah.pkl \ --env_name HalfCheetah-v4 --exp_name bc_cheetah --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_HalfCheetah-v4.pkl \ --video_log_freq -1 --ep_len 1000 --eval_batch_size 5000 \ --train_batch_size 500
Eval_AverageReturn : 3972.516357421875 Eval_StdReturn : 67.79310607910156 Eval_MaxReturn : 4080.548583984375 Eval_MinReturn : 3896.896484375 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 4205.7783203125 Train_StdReturn : 83.038818359375 Train_MaxReturn : 4288.81689453125 Train_MinReturn : 4122.7392578125 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 6.673825025558472 Training Loss : -1.26580810546875 Initial_DataCollection_AverageReturn : 4205.7783203125

Hopper

python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Hopper.pkl \ --env_name Hopper-v4 --exp_name bc_hopper --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Hopper-v4.pkl \ --video_log_freq -1 --ep_len 1000 \ --eval_batch_size 5000
Eval_AverageReturn : 622.1619262695312 Eval_StdReturn : 226.34414672851562 Eval_MaxReturn : 1207.558837890625 Eval_MinReturn : 423.8189392089844 Eval_AverageEpLen : 215.20833333333334 Train_AverageReturn : 3772.67041015625 Train_StdReturn : 1.9483642578125 Train_MaxReturn : 3774.61865234375 Train_MinReturn : 3770.721923828125 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 7.289942979812622 Training Loss : -1.0678434371948242 Initial_DataCollection_AverageReturn : 3772.67041015625 Eval_AverageReturn : 1332.905029296875 Eval_StdReturn : 187.6711883544922 Eval_MaxReturn : 1683.35986328125 Eval_MinReturn : 1144.32958984375 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 3772.67041015625 Train_StdReturn : 1.9483642578125 Train_MaxReturn : 3774.61865234375 Train_MinReturn : 3770.721923828125 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 6.983166933059692 Training Loss : -1.0678434371948242 Initial_DataCollection_AverageReturn : 3772.67041015625

Walker2d

python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Walker2d.pkl \ --env_name Walker2d-v4 --exp_name bc_walker --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Walker2d-v4.pkl \ --video_log_freq -1 --ep_len 1000 --eval_batch_size 5000
Eval_AverageReturn : 551.5746459960938 Eval_StdReturn : 421.64508056640625 Eval_MaxReturn : 1592.2484130859375 Eval_MinReturn : 0.6480514407157898 Eval_AverageEpLen : 166.93333333333334 Train_AverageReturn : 5566.845703125 Train_StdReturn : 9.237548828125 Train_MaxReturn : 5576.08349609375 Train_MinReturn : 5557.6083984375 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 7.33293890953064 Training Loss : -0.7608211636543274 Initial_DataCollection_AverageReturn : 5566.845703125 Eval_AverageReturn : 688.03173828125 Eval_StdReturn : 313.3283996582031 Eval_MaxReturn : 1185.9598388671875 Eval_MinReturn : 292.66357421875 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 5566.845703125 Train_StdReturn : 9.237548828125 Train_MaxReturn : 5576.08349609375 Train_MinReturn : 5557.6083984375 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 7.390901327133179 Training Loss : -0.7608211636543274 Initial_DataCollection_AverageReturn : 5566.845703125
 

Hyperparameter Optimization

  • action, observation parameters
  • training dataset size

Ant

export dataset 2000
  • step 10000
  • layers 1
  • width 64
python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Ant.pkl \ --env_name Ant-v4 --exp_name bc_ant --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Ant-v4.pkl \ --video_log_freq -1 --num_agent_train_steps_per_iter 10000 --n_layers 1 --ep_len 1000 --eval_batch_size 5000
Eval_AverageReturn : 4786.14306640625 Eval_StdReturn : 50.20363235473633 Eval_MaxReturn : 4861.13818359375 Eval_MinReturn : 4705.3603515625 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 4713.6533203125 Train_StdReturn : 12.196533203125 Train_MaxReturn : 4725.849609375 Train_MinReturn : 4701.45654296875 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 35.20740628242493 Training Loss : -2.541884183883667 Initial_DataCollection_AverageReturn : 4713.6533203125

Cheetah

expert dataset 2000
  • layers 3
  • batch 500
  • step 12000
  • lr 1e-3
python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/HalfCheetah.pkl \ --env_name HalfCheetah-v4 --exp_name bc_cheetah --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_HalfCheetah-v4.pkl \ --video_log_freq -1 --num_agent_train_steps_per_iter 10000 --ep_len 1000 --eval_batch_size 5000 --n_layers 3 \ --train_batch_size 500 -lr 1e-3
Eval_AverageReturn : 4148.51953125 Eval_StdReturn : 37.12542724609375 Eval_MaxReturn : 4215.1220703125 Eval_MinReturn : 4104.57373046875 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 4205.7783203125 Train_StdReturn : 83.038818359375 Train_MaxReturn : 4288.81689453125 Train_MinReturn : 4122.7392578125 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 49.86260509490967 Training Loss : -2.8239476680755615 Initial_DataCollection_AverageReturn : 4205.7783203125

Hopper

expert dataset 2000
  • layers 1
  • batch 1000
  • width 100
  • steps 8800
  • activation leaky_relu
python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Hopper.pkl \ --env_name Hopper-v4 --exp_name bc_hopper --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Hopper-v4.pkl \ --video_log_freq -1 --num_agent_train_steps_per_iter 8800 \ --ep_len 1000 --eval_batch_size 5000 \ --train_batch_size 1000 --size 100 --n_layers 1
Eval_AverageReturn : 3652.104248046875 Eval_StdReturn : 84.24939727783203 Eval_MaxReturn : 3726.7509765625 Eval_MinReturn : 3493.478515625 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 3772.67041015625 Train_StdReturn : 1.9483642578125 Train_MaxReturn : 3774.61865234375 Train_MinReturn : 3770.721923828125 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 31.361346006393433 Training Loss : -3.103459596633911 Initial_DataCollection_AverageReturn : 3772.67041015625

Walker

  • layers 1
  • batch 500
  • width 100
  • steps 12000
  • activation sigmoid
  • lr 4e-3
python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Walker2d.pkl \ --env_name Walker2d-v4 --exp_name bc_walker --n_iter 1 \ --expert_data aai4160/expert_data/expert_data_Walker2d-v4.pkl \ --video_log_freq -1 --ep_len 1000 --eval_batch_size 5000 \ --train_batch_size 500 --size 100 --n_layers 1 \ --num_agent_train_steps_per_iter 12000 -lr 4e-3
Eval_AverageReturn : 5305.3662109375 Eval_StdReturn : 61.93048858642578 Eval_MaxReturn : 5403.990234375 Eval_MinReturn : 5211.0244140625 Eval_AverageEpLen : 1000.0 Train_AverageReturn : 5566.845703125 Train_StdReturn : 9.237548828125 Train_MaxReturn : 5576.08349609375 Train_MinReturn : 5557.6083984375 Train_AverageEpLen : 1000.0 Train_EnvstepsSoFar : 0 TimeSinceStart : 40.743810415267944 Training Loss : -1.9663548469543457 Initial_DataCollection_AverageReturn : 5566.845703125
 
notion image
 
 

Dagger

Ant

python aai4160/scripts/run_hw1.py \ --expert_policy_file aai4160/policies/experts/Ant.pkl \ --env_name Ant-v4 --exp_name dagger_ant --n_iter 10 \ --do_dagger --ep_len 1000 --eval_batch_size 5000 \ --expert_data aai4160/expert_data/expert_data_Ant-v4.pkl \ --video_log_freq -1

Cheetah

python aai4160/scripts/run_hw1.py -lr 1e-3 \ --expert_policy_file aai4160/policies/experts/HalfCheetah.pkl \ --env_name HalfCheetah-v4 --exp_name dagger_cheetah --n_iter 10 \ --do_dagger --ep_len 1000 --eval_batch_size 5000 \ --expert_data aai4160/expert_data/expert_data_HalfCheetah-v4.pkl \ --video_log_freq -1

Hopper

python aai4160/scripts/run_hw1.py -lr 1e-3 \ --expert_policy_file aai4160/policies/experts/Hopper.pkl \ --env_name Hopper-v4 --exp_name dagger_hopper --n_iter 10 \ --do_dagger --ep_len 1000 --eval_batch_size 5000 \ --expert_data aai4160/expert_data/expert_data_Hopper-v4.pkl \ --video_log_freq -1 --which_gpu 1

Walker

 
 
 

Recommendations