UCL DRL Coursework 2

Questions have been marked under the title `Task`, ensure that you answer/address all of the bullet points under these headings

If u wanna change the source code of the package, This could include: hosting your own version of the package on your own GitHub and then pip installing within the notebook or copy and pasting the altered functionality into the notebook.


!pip install --force-reinstall git+https://github.com/joshuaspear/pymlrf.git
!pip install wandb
!pip install torchinfo
!pip install jaxtyping
!pip install typeguard==2.13.3
!pip install git+https://github.com/joshuaspear/comp0188_cw2_public.git

Question 1: Supervised learning outperforms the existing model

Within question 1 where you are required to justify the choice of loss function: please reference theoretically what the range of values that the model should predict is I.e., continuous/binary etc.

If the error pertains to 'nan' then check your loss and it is likely extremely high and overflowing. This is fixed when not using half precision as it is less likely to overflow.

Observations are split into 5 attributes,

front_cam_ob: observations from 3rd person cam
mount_cam_ob: observations from mounted camera
ee_cartesian_pos_ob: end effector cartesian position. ee_cartesian_pos_ob[0:3] corresponds to position and ee_cartesian_pos_ob[3:7] corresponds to orientation in quarternian format
ee_cartesian_vel_ob: end effector cartesian velocity. ee_cartesian_pos_ob[0:3] corresponds to change in position and ee_cartesian_pos_ob[3:6] corresponds to change in orientation in roll, pitch yaw format
joint_pos_ob: joint positions of the jaco arm (we only use the last 2 elements of this that correspond to the gripper joints)

actions: first 3 elements are cartesian deltas and 4th element is a label from {0, 1, 2} meaning {open gripper, don't move gripper, close gripper}

terminals 1 at the end of each skill

prompts: Natural language description of the goal

reward: 1 at the end of each skill

C

기본 20

Metrics for Open: Accuracy: 0.8187 Precision: 0.3053 Recall: 0.3217 F1 Score: 0.3133 ------------------------------ Metrics for No Move: Accuracy: 0.7360 Precision: 0.8077 Recall: 0.8473 F1 Score: 0.8270 ------------------------------ Metrics for Close: Accuracy: 0.8435 Precision: 0.3195 Recall: 0.2098 F1 Score: 0.2533

기본 90

Metrics for Open: Accuracy: 0.8118 Precision: 0.1440 Recall: 0.0938 F1 Score: 0.1136

Metrics for No Move: Accuracy: 0.7470 Precision: 0.7892 Recall: 0.9010 F1 Score: 0.8414

Metrics for Close: Accuracy: 0.8614 Precision: 0.4084 Recall: 0.2125 F1 Score: 0.2796

0.4115

Batch normalization

open close 강조 - 성능도 안좋고 바로 val grp loss 올라가서 안좋은듯

Layer normalization - 90 no move 성능 유지하면서 효율적으로 open close 성능 일부 향상

Metrics for Open: Accuracy: 0.8283 Precision: 0.2568 Recall: 0.1769 F1 Score: 0.2095
Metrics for No Move: Accuracy: 0.7656 Precision: 0.8059 Recall: 0.9028 F1 Score: 0.8516
Metrics for Close: Accuracy: 0.8731 Precision: 0.4978 Recall: 0.3025 F1 Score: 0.3763
20일때는 안해봄

Gradient normalization, Gradient Clipping

step 이 좀 이상한듯 안되는듯
조절해도 걍 학습이 안댐

Weight Decay

weight decay 는 아무효과 없고

Learning Rate Scheduler

안정적 학습되나 20일대는 성능 오히려 감소

Metrics for Open: Accuracy: 0.8266 Precision: 0.3153 Recall: 0.2976 F1 Score: 0.3062
Metrics for No Move: Accuracy: 0.7239 Precision: 0.7990 Recall: 0.8408 F1 Score: 0.8194
Metrics for Close: Accuracy: 0.8290 Precision: 0.2655 Recall: 0.1989 F1 Score: 0.2274

Metrics for Open: Accuracy: 0.8128 Precision: 0.3059 Recall: 0.3592 F1 Score: 0.3305

Metrics for No Move: Accuracy: 0.7266 Precision: 0.8132 Recall: 0.8218 F1 Score: 0.8175

Metrics for Close: Accuracy: 0.8394 Precision: 0.3226 Recall: 0.2452 F1 Score: 0.2786

Loss Weighting

212 에서는 뷸균형 보이다가 414 818에서는 균형 보이다가 32에서는 꽤 높은 성능 균형 보이면서 16은 성능은 좋았지만 목표인 recall f1 이 안좋았고 32 가 좋았다

212 20

Metrics for Open: Accuracy: 0.7067 Precision: 0.2208 Recall: 0.5067 F1 Score: 0.3076
Metrics for No Move: Accuracy: 0.6653 Precision: 0.8427 Recall: 0.6770 F1 Score: 0.7508
Metrics for Close: Accuracy: 0.8166 Precision: 0.2330 Recall: 0.1962 F1 Score: 0.2130

32 20

Metrics for Open: Accuracy: 0.7515 Precision: 0.2307 Recall: 0.3995 F1 Score: 0.2924
Metrics for No Move: Accuracy: 0.6849 Precision: 0.8692 Recall: 0.6793 F1 Score: 0.7626
Metrics for Close: Accuracy: 0.7983 Precision: 0.3074 Recall: 0.4741 F1 Score: 0.3730

414 20

Metrics for Open: Accuracy: 0.7184 Precision: 0.1773 Recall: 0.3271 F1 Score: 0.2300
Metrics for No Move: Accuracy: 0.6842 Precision: 0.8292 Recall: 0.7256 F1 Score: 0.7739
Metrics for Close: Accuracy: 0.8349 Precision: 0.3261 Recall: 0.2861 F1 Score: 0.3048

16 20

Metrics for Open: Accuracy: 0.8325 Precision: 0.2930 Recall: 0.2145 F1 Score: 0.2477
Metrics for No Move: Accuracy: 0.7608 Precision: 0.8075 Recall: 0.8913 F1 Score: 0.8473
Metrics for Close: Accuracy: 0.8532 Precision: 0.3786 Recall: 0.2507 F1 Score: 0.3016

818 20

Metrics for Open: Accuracy: 0.7797 Precision: 0.2462 Recall: 0.3458 F1 Score: 0.2876
Metrics for No Move: Accuracy: 0.7173 Precision: 0.8241 Recall: 0.7890 F1 Score: 0.8061
Metrics for Close: Accuracy: 0.8356 Precision: 0.3214 Recall: 0.2698 F1 Score: 0.2933

32 90

Metrics for Open: Accuracy: 0.7756 Precision: 0.2660 Recall: 0.4236 F1 Score: 0.3268
Metrics for No Move: Accuracy: 0.6860 Precision: 0.8729 Recall: 0.6770 F1 Score: 0.7626
Metrics for Close: Accuracy: 0.7890 Precision: 0.3059 Recall: 0.5259 F1 Score: 0.3868

[Model Scaling]

최적의 batch - 64

최적의 close weight 32 최적의 open weight 64

layer norm + 32

Metrics for Open: Accuracy: 0.7973 Precision: 0.2595 Recall: 0.3110 F1 Score: 0.2829

Metrics for No Move: Accuracy: 0.7435 Precision: 0.8297 Recall: 0.8251 F1 Score: 0.8274

Metrics for Close: Accuracy: 0.8566 Precision: 0.4197 Recall: 0.3488 F1 Score: 0.3810

layer norm + 64 32 - 별로. 411 도 쓰레기

idea - remove data normalization, model regularizer loss, dropout 적용

dropout 구림

remove data normalization 망함

model regularizer loss 별로

schduler 흠.. 딱히

Question 2: VAE → latent representation → downstream model → BC

Design choice

normalization

Question 3: Evaluate on the test set

Pointers

with transition log

Evaluation template

Preprocessing script for split

collect function