GSM8k
- mse loss 구림
1
L/wandb/run-20251119_030313-afy9k4zp/logs ^C% ~/c/code/Users/Seonglae.Cho/ControlRL @a16e6307 ··· 43m 44s azureml_py38 azureuser@a100research 03:46:41 ❯ python train.py train --task=gsm8k --num_samples=300 --mask=generation --validate_every=10 --limit=48 --policy_deep --eval --cot 2025-11-19 03:46:49.078581: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.04s/it] wandb: Currently logged in as: seonglae (texonom) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.23.0 wandb: Run data is saved locally in /mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/wandb/run-20251119_034700-ilisme39 wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run gemma2b_gsm8k_20_ppo_1e-05_1119_034700_30.0_cot wandb: ⭐️ View project at https://wandb.ai/texonom/control_rl wandb: 🚀 View run at https://wandb.ai/texonom/control_rl/runs/ilisme39 wandb: Detected [huggingface_hub.inference] in use. wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script. wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/ Training Steps: 0%| | 0/38 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect. Step 0: Avg Train Acc 0.7500, Val Acc 0.7917, Train Think Len 336.75, Val Think Len 208.10 Layer 20: Policy Loss 27.1982, Critic Loss 0.8358, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.8077, Unique Indices: 1577, Avg Activation: 0.2197, Avg Act Values: 0.8499 Training Steps: 26%|█████████████████▉ | 10/38 [30:16<44:27, 95.28s/it]Step 10: Avg Train Acc 0.6750, Val Acc 0.7917, Train Think Len 160.62, Val Think Len 207.85 Layer 20: Policy Loss 11.6340, Critic Loss 0.5464, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.0208, Unique Indices: 1556, Avg Activation: 0.6085, Avg Act Values: 0.8533 Training Steps: 53%|██████████████████████████████████▏ | 20/38 [1:00:52<39:56, 133.11s/it]Step 20: Avg Train Acc 0.5375, Val Acc 0.7917, Train Think Len 321.62, Val Think Len 208.31 Layer 20: Policy Loss 14.8523, Critic Loss 0.6786, Grad Norms (P/C) 0.00/0.00, Recon Loss 11.2051, Unique Indices: 1568, Avg Activation: 0.2310, Avg Act Values: 0.8484 Training Steps: 79%|████████████████████████████████████████████████████ | 30/38 [1:25:35<07:58, 59.78s/it]Step 30: Avg Train Acc 0.7250, Val Acc 0.7708, Train Think Len 229.62, Val Think Len 209.79 Layer 20: Policy Loss 5.4466, Critic Loss 0.8762, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.9612, Unique Indices: 1557, Avg Activation: 0.2099, Avg Act Values: 0.8437 Training Steps: 100%|█████████████████████████████████████████████████████████████████| 38/38 [1:57:45<00:00, 185.95s/it] Config model: gemma2b task: gsm8k layers: [20] select_token: False decode: False category: None cot: True Evaluating: 100%|████████████████████████████████████████████████████████████████████| 1319/1319 [57:27<00:00, 2.61s/it] Final gsm8k Accuracy with Steering: 53.68% Results saved to ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_1119_034700_30.0_cot/gsm8k_20_steered.json Stats saved to ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_1119_034700_30.0_cot/gsm8k_eval.json Every outputs are saved to the folder ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_1119_034700_30.0_cot wandb: wandb: 🚀 View run gemma2b_gsm8k_20_ppo_1e-05_1119_034700_30.0_cot at: https://wandb.ai/texonom/control_rl/runs/ilisme39 wandb: Find logs at: ../../../../../../../mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/wandb/run-20251119_034700-ilisme39/logs ~/c/code/U/Seonglae.Cho/ControlRL @a16e6307 ···· 2h 55m 58s azureml_py38 azureuser@a100research 06:42:42 ❯ zsh: timeout timed out waiting for input: auto-logoutcloudfiles/code/Users/Seonglae.Cho/ControlRL$ ~/cloudfiles/code/Users/Seonglae.Cho/ControlRL @a16e6307 ········ 4h 37m 15s azureuser@a100research 07:12:42 ❯ bash z(azureml_py38) azureuser@a100research:~/cloudfiles/code/Users/Seonglae.Cho/ControlRL$ zsh ~/cloudfiles/code/Users/Seonglae.Cho/ControlRL @a16e6307 ···· azureml_py38 azureuser@a100research 10:04:30 ❯ sleep 10000 && python train.py train --task=gsm8k --num_samples=300 --mask=generation --validate_every=10 --limit=48 --policy_deep --eval --cot ^C
2
/mnt/b/t/sh/L/m/c/a/code/U/Seonglae.Cho/ControlRL @a16e6307 · azureml_py38 azureuser@a100research 10:04:31 ❯ sleep 10000 && python train.py train --task=gsm8k --num_samples=300 --mask=generation --validate_every=10 --limit=48 --policy_deep --eval --cot 2025-11-19 12:54:18.137335: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.97s/it] wandb: Currently logged in as: seonglae (texonom) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.23.0 wandb: Run data is saved locally in /mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/wandb/run-20251119_125444-ump8t0tn wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run gemma2b_gsm8k_20_ppo_1e-05_1119_125444_30.0_cot wandb: ⭐️ View project at https://wandb.ai/texonom/control_rl wandb: 🚀 View run at https://wandb.ai/texonom/control_rl/runs/ump8t0tn wandb: Detected [huggingface_hub.inference] in use. wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script. wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/ Training Steps: 0%| | 0/38 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect. Step 0: Avg Train Acc 0.7500, Val Acc 0.7292, Train Think Len 336.88, Val Think Len 222.31 Layer 20: Policy Loss 27.1939, Critic Loss 0.8356, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.8082, Unique Indices: 1670, Avg Activation: 0.0773, Avg Act Values: 0.2858 Training Steps: 26%|████████████████▌ | 10/38 [1:20:04<2:09:42, 277.93s/it]Step 10: Avg Train Acc 0.6500, Val Acc 0.7500, Train Think Len 163.25, Val Think Len 236.90 Layer 20: Policy Loss 11.8022, Critic Loss 0.5458, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.0527, Unique Indices: 1686, Avg Activation: 0.2392, Avg Act Values: 0.2684 Training Steps: 53%|█████████████████████████████████▏ | 20/38 [2:20:20<1:29:32, 298.48s/it]Step 20: Avg Train Acc 0.5500, Val Acc 0.7500, Train Think Len 213.50, Val Think Len 236.75 Layer 20: Policy Loss 13.8454, Critic Loss 0.7873, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.0390, Unique Indices: 1700, Avg Activation: 0.2805, Avg Act Values: 0.2681 Training Steps: 79%|████████████████████████████████████████████████████ | 30/38 [2:52:09<09:38, 72.27s/it]Step 30: Avg Train Acc 0.7500, Val Acc 0.7083, Train Think Len 235.75, Val Think Len 253.50 Layer 20: Policy Loss 9.6553, Critic Loss 0.7875, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.9896, Unique Indices: 1692, Avg Activation: 0.0820, Avg Act Values: 0.2517 Training Steps: 100%|█████████████████████████████████████████████████████████████████| 38/38 [3:41:16<00:00, 349.38s/it] Traceback (most recent call last): File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 802, in <module> File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 796, in train if __name__ == "__main__": File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 628, in collect_stats batch_size=self.batch_size, File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/eval.py", line 507, in steered ckpt = TrainResult.model_validate(torch.load(checkpoint)) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 1479, in load with _open_file_like(f, "rb") as opened_file: File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 759, in _open_file_like return _open_file(name_or_buffer, mode) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 740, in __init__ super().__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/gemma2b_gsm8k_20_ppo_1e-05_1119_125444_30.0_cot/step10_acc75.0.pt' Traceback (most recent call last): File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 802, in <module> File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 796, in train if __name__ == "__main__": File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/train.py", line 628, in collect_stats batch_size=self.batch_size, File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/a100research/code/Users/Seonglae.Cho/ControlRL/eval.py", line 507, in steered ckpt = TrainResult.model_validate(torch.load(checkpoint)) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 1479, in load with _open_file_like(f, "rb") as opened_file: File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 759, in _open_file_like return _open_file(name_or_buffer, mode) File "/anaconda/envs/azureml_py38/lib/python3.10/site-packages/torch/serialization.py", line 740, in __init__ super().__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/gemma2b_gsm8k_20_ppo_1e-05_1119_125444_30.0_cot/step10_acc75.0.pt' wandb: wandb: 🚀 View run gemma2b_gsm8k_20_ppo_1e-05_1119_125444_30.0_cot at: https://wandb.ai/texonom/control_rl/runs/ump8t0tn wandb: Find logs at: wandb/run-20251119_125444-ump8t0tn/logs
3
Seonglae Cho