UCL DRL Coursework 2

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Sep 30 14:4
Editor
Edited
Edited
2024 Dec 16 20:27

Questions have been marked under the title Task, ensure that you answer/address all of the bullet points under these headings

If u wanna change the source code of the package, This could include: hosting your own version of the package on your own GitHub and then pip installing within the notebook or copy and pasting the altered functionality into the notebook.
!pip install --force-reinstall git+https://github.com/joshuaspear/pymlrf.git !pip install wandb !pip install torchinfo !pip install jaxtyping !pip install typeguard==2.13.3 !pip install git+https://github.com/joshuaspear/comp0188_cw2_public.git

Question 1: Supervised learning outperforms the existing model

Within question 1 where you are required to justify the choice of loss function: please reference theoretically what the range of values that the model should predict is I.e., continuous/binary etc.
If the error pertains to 'nan' then check your loss and it is likely extremely high and overflowing. This is fixed when not using half precision as it is less likely to overflow.
  • Observations are split into 5 attributes,
    • front_cam_ob: observations from 3rd person cam
    • mount_cam_ob: observations from mounted camera
    • ee_cartesian_pos_ob: end effector cartesian position. ee_cartesian_pos_ob[0:3] corresponds to position and ee_cartesian_pos_ob[3:7] corresponds to orientation in quarternian format
    • ee_cartesian_vel_ob: end effector cartesian velocity. ee_cartesian_pos_ob[0:3] corresponds to change in position and ee_cartesian_pos_ob[3:6] corresponds to change in orientation in roll, pitch yaw format
    • joint_pos_ob: joint positions of the jaco arm (we only use the last 2 elements of this that correspond to the gripper joints)
  • actions: first 3 elements are cartesian deltas and 4th element is a label from {0, 1, 2} meaning {open gripper, don't move gripper, close gripper}
  • terminals 1 at the end of each skill
  • prompts: Natural language description of the goal
  • reward: 1 at the end of each skill

C

기본 20
Metrics for Open: Accuracy: 0.8187 Precision: 0.3053 Recall: 0.3217 F1 Score: 0.3133 ------------------------------ Metrics for No Move: Accuracy: 0.7360 Precision: 0.8077 Recall: 0.8473 F1 Score: 0.8270 ------------------------------ Metrics for Close: Accuracy: 0.8435 Precision: 0.3195 Recall: 0.2098 F1 Score: 0.2533
기본 90
Metrics for Open: Accuracy: 0.8118 Precision: 0.1440 Recall: 0.0938 F1 Score: 0.1136
Metrics for No Move: Accuracy: 0.7470 Precision: 0.7892 Recall: 0.9010 F1 Score: 0.8414
Metrics for Close: Accuracy: 0.8614 Precision: 0.4084 Recall: 0.2125 F1 Score: 0.2796
0.4115
 
  • Batch normalization
    • open close 강조 - 성능도 안좋고 바로 val grp loss 올라가서 안좋은듯
  • Layer normalization - 90 no move 성능 유지하면서 효율적으로 open close 성능 일부 향상
    • Metrics for Open: Accuracy: 0.8283 Precision: 0.2568 Recall: 0.1769 F1 Score: 0.2095
    • Metrics for No Move: Accuracy: 0.7656 Precision: 0.8059 Recall: 0.9028 F1 Score: 0.8516
    • Metrics for Close: Accuracy: 0.8731 Precision: 0.4978 Recall: 0.3025 F1 Score: 0.3763
    • 20일때는 안해봄
  • Gradient normalization, Gradient Clipping
    • step 이 좀 이상한듯 안되는듯
    • 조절해도 걍 학습이 안댐
  • Learning Rate Scheduler
    • 안정적 학습되나 20일대는 성능 오히려 감소
      • Metrics for Open: Accuracy: 0.8266 Precision: 0.3153 Recall: 0.2976 F1 Score: 0.3062
      • Metrics for No Move: Accuracy: 0.7239 Precision: 0.7990 Recall: 0.8408 F1 Score: 0.8194
      • Metrics for Close: Accuracy: 0.8290 Precision: 0.2655 Recall: 0.1989 F1 Score: 0.2274
90
  • Metrics for Open: Accuracy: 0.8128 Precision: 0.3059 Recall: 0.3592 F1 Score: 0.3305
  • Metrics for No Move: Accuracy: 0.7266 Precision: 0.8132 Recall: 0.8218 F1 Score: 0.8175
  • Metrics for Close: Accuracy: 0.8394 Precision: 0.3226 Recall: 0.2452 F1 Score: 0.2786
 

Loss Weighting

212 에서는 뷸균형 보이다가 414 818에서는 균형 보이다가 32에서는 꽤 높은 성능 균형 보이면서 16은 성능은 좋았지만 목표인 recall f1 이 안좋았고 32 가 좋았다
  • 212 20
    • Metrics for Open: Accuracy: 0.7067 Precision: 0.2208 Recall: 0.5067 F1 Score: 0.3076
    • Metrics for No Move: Accuracy: 0.6653 Precision: 0.8427 Recall: 0.6770 F1 Score: 0.7508
    • Metrics for Close: Accuracy: 0.8166 Precision: 0.2330 Recall: 0.1962 F1 Score: 0.2130
  • 32 20
    • Metrics for Open: Accuracy: 0.7515 Precision: 0.2307 Recall: 0.3995 F1 Score: 0.2924
    • Metrics for No Move: Accuracy: 0.6849 Precision: 0.8692 Recall: 0.6793 F1 Score: 0.7626
    • Metrics for Close: Accuracy: 0.7983 Precision: 0.3074 Recall: 0.4741 F1 Score: 0.3730
  • 414 20
    • Metrics for Open: Accuracy: 0.7184 Precision: 0.1773 Recall: 0.3271 F1 Score: 0.2300
    • Metrics for No Move: Accuracy: 0.6842 Precision: 0.8292 Recall: 0.7256 F1 Score: 0.7739
    • Metrics for Close: Accuracy: 0.8349 Precision: 0.3261 Recall: 0.2861 F1 Score: 0.3048
  • 16 20
    • Metrics for Open: Accuracy: 0.8325 Precision: 0.2930 Recall: 0.2145 F1 Score: 0.2477
    • Metrics for No Move: Accuracy: 0.7608 Precision: 0.8075 Recall: 0.8913 F1 Score: 0.8473
    • Metrics for Close: Accuracy: 0.8532 Precision: 0.3786 Recall: 0.2507 F1 Score: 0.3016
    •  
  • 818 20
    • Metrics for Open: Accuracy: 0.7797 Precision: 0.2462 Recall: 0.3458 F1 Score: 0.2876
    • Metrics for No Move: Accuracy: 0.7173 Precision: 0.8241 Recall: 0.7890 F1 Score: 0.8061
    • Metrics for Close: Accuracy: 0.8356 Precision: 0.3214 Recall: 0.2698 F1 Score: 0.2933
    • 32 90
    • Metrics for Open: Accuracy: 0.7756 Precision: 0.2660 Recall: 0.4236 F1 Score: 0.3268
    • Metrics for No Move: Accuracy: 0.6860 Precision: 0.8729 Recall: 0.6770 F1 Score: 0.7626
    • Metrics for Close: Accuracy: 0.7890 Precision: 0.3059 Recall: 0.5259 F1 Score: 0.3868
    •  
  • [Model Scaling]
  • 최적의 batch - 64
  • 최적의 close weight 32 최적의 open weight 64

layer norm + 32

  • Metrics for Open: Accuracy: 0.7973 Precision: 0.2595 Recall: 0.3110 F1 Score: 0.2829
  • Metrics for No Move: Accuracy: 0.7435 Precision: 0.8297 Recall: 0.8251 F1 Score: 0.8274
  • Metrics for Close: Accuracy: 0.8566 Precision: 0.4197 Recall: 0.3488 F1 Score: 0.3810

layer norm + 64 32 - 별로. 411 도 쓰레기

idea - remove data normalization, model regularizer loss, dropout 적용
  • dropout 구림
  • remove data normalization 망함
  • model regularizer loss 별로
  • schduler 흠.. 딱히

Question 2: VAE → latent representation → downstream model → BC

Design choice
  • normalization

Question 3: Evaluate on the test set

Pointers

  • with transition log
  • Evaluation template
  • Preprocessing script for split
  • collect function
 
 
 
 

Recommendations