CRL GSM8k Result


Training Steps:   0%|                                                                   | 0/38 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.
Step 0: Avg Train Acc 0.7500, Val Acc 0.7708, Train Think Len 139.75, Val Think Len 169.50
  Layer 20: Policy Loss 5.8489, Critic Loss 0.9254, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.5010, Unique Indices: 1520, Avg Activation: 0.8079, Avg Act Values: 0.9981
Training Steps: 100%|███████████████████████████████████████████████████████| 38/38 [3:07:27<00:00, 295.98s/it]
Step 10: Avg Train Acc 0.6375, Val Acc 0.7708, Train Think Len 156.00, Val Think Len 169.73
  Layer 20: Policy Loss 11.2105, Critic Loss 0.5497, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.0266, Unique Indices: 1507, Avg Activation: 0.6343, Avg Act Values: 0.9989
Step 20: Avg Train Acc 0.5125, Val Acc 0.7708, Train Think Len 326.62, Val Think Len 172.48
  Layer 20: Policy Loss 16.9192, Critic Loss 0.6790, Grad Norms (P/C) 0.00/0.00, Recon Loss 11.2273, Unique Indices: 1524, Avg Activation: 0.2517, Avg Act Values: 0.9900
Step 30: Avg Train Acc 0.7000, Val Acc 0.7917, Train Think Len 134.38, Val Think Len 188.38
  Layer 20: Policy Loss 10.0069, Critic Loss 0.8600, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.0790, Unique Indices: 1537, Avg Activation: 0.6786, Avg Act Values: 0.9113
/cs/student/projects2/aisd/2024/seongcho/steer-rl/eval.py:507: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = TrainResult.model_validate(torch.load(checkpoint))
Config
        model: gemma2b
        task: gsm8k
        layers: [20]
        select_token: False
        decode: False
        category: None
        cot: True

Evaluating: 100%|██████████████████████████████████████████████████████████| 1319/1319 [38:45<00:00,  1.76s/it]
Final gsm8k Accuracy with Steering: 55.42%
Results saved to ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_0802_162059_30.0_cot/gsm8k_20_steered.json
Stats saved to ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_0802_162059_30.0_cot/gsm8k_eval.json
Starting analysis...
Getting baselines took: 0.00s

Final gsm8k Accuracy with Steering (Analysis): 55.42%
Final gsm8k Accuracy without Steering (Baseline): 54.74%

Overall Accuracy:
Steered Model: 55.42%
Baseline Model: 54.74%
Baseline answer analysis took: 0.51s
Analyzing layer 20...

Critic Analysis Results:
Total samples: 1319
Correct (reward > 0): 731
Incorrect (reward = 0): 588
Corrected (steered reward > baseline reward): 39
Misguided (steered reward < baseline reward): 30
/cs/student/projects2/aisd/2024/seongcho/steer-rl/analyze.py:111: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  return original_barplot(*args, **kwargs)
Feature analysis saved to ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_0802_162059_30.0_cot/feature_analysis_20.json
  Layer 20 naive analysis took: 20.41s
Layer 20 total analysis took: 20.41s
Building result dictionaries took: 0.00s
Total analysis completed in: 20.92s
Every outputs are saved to the folder ./checkpoints/gemma2b_gsm8k_20_ppo_1e-05_0802_162059_30.0_cot

Gemma

Baseline

non cot: 41.24%

Baseline Accuracy: 54.51%

decode 54.74%

official 23.9%

10th

gemma2b_gsm8k_10_ppo_1e-05_0802_171747_10.0_cot

python train.py train --eval --layers="10," --task="gsm8k" --cot --limit=48 --validate_every=10 --num_samples=300 --policy_deep --analysis --minimum=10 --mask="generation" 55.50%

15th

python train.py train --eval --layers="15," --task="gsm8k" --cot --limit=48 --validate_every=10 --num_samples=300 --policy_deep --analysis --minimum=10 --mask="generation" 54.59%

20th

gemma2b_gsm8k_20_ppo_1e-05_0802_162059_30.0_cot

20th

54.89% python train.py train --eval --layers="20," --task="gsm8k" --cot --limit=48 --validate_every=10 --grpo --num_samples=1000

Cross loss?

python train.py train --eval --layers="20," --task="gsm8k" --cot --limit=48 --validate_every=10 --num_samples=300 --policy_deep --analysis --minimum=30 --mask="generation" 55.42%

python train.py train --eval --layers="20," --task="gsm8k" --cot --limit=48 --validate_every=10 --num_samples=300 --policy_deep --analysis --minimum=30 --mask="all" 49.13%

feature 분석 gemma2b_gsm8k_20_ppo_1e-05_0725_172103_50.0_cot 나름 의미있었음 좋다

normal total avg

24th

gemma2b_gsm8k_24_ppo_1e-05_0709_021753_30.0

55.88

Corrsteer

42.61% mean

3~ max


❯ python train.py train --eval --layers="20," --task="gsm8k" --cot --limit=48 --validate_every=10 --num_samples=1000
Loading checkpoint shards: 100%|█████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.43it/s]
wandb: Currently logged in as: seonglae (texonom). Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.4
wandb: Run data is saved locally in /cs/student/projects2/aisd/2024/seongcho/steer-rl/wandb/run-20250721_173437-lj691tq4
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run gemma2b_gsm8k_20_ppo_1e-05_0721_173436_30.0_cot
wandb: ⭐️ View project at https://wandb.ai/texonom/control_rl
wandb: 🚀 View run at https://wandb.ai/texonom/control_rl/runs/lj691tq4
Training Steps:   0%|                                                                  | 0/126 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Step 0: Avg Train Acc 0.8750, Val Acc 0.7708, Train Think Len 134.75, Val Think Len 188.44
  Layer 20: Policy Loss 1.3381, Critic Loss 1.5084, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.7741, Unique Indices: 7099, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:   8%|████▌                                                    | 10/126 [03:42<26:08, 13.52s/it]Step 10: Avg Train Acc 0.6875, Val Acc 0.7083, Train Think Len 272.88, Val Think Len 173.00
  Layer 20: Policy Loss 15.1046, Critic Loss 0.9180, Grad Norms (P/C) 0.00/0.00, Recon Loss 11.1088, Unique Indices: 6853, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  16%|█████████                                                | 20/126 [06:16<15:08,  8.57s/it]Step 20: Avg Train Acc 0.5750, Val Acc 0.7917, Train Think Len 416.25, Val Think Len 175.21
  Layer 20: Policy Loss 7.2584, Critic Loss 1.1115, Grad Norms (P/C) 0.00/0.00, Recon Loss 11.0407, Unique Indices: 6896, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  24%|█████████████▌                                           | 30/126 [09:03<14:40,  9.18s/it]Step 30: Avg Train Acc 0.6750, Val Acc 0.7083, Train Think Len 136.75, Val Think Len 219.46
  Layer 20: Policy Loss 2.9773, Critic Loss 1.2548, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.3639, Unique Indices: 7505, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  32%|██████████████████                                       | 40/126 [11:43<11:44,  8.19s/it]Step 40: Avg Train Acc 0.6750, Val Acc 0.7292, Train Think Len 150.50, Val Think Len 187.88
  Layer 20: Policy Loss 1.6447, Critic Loss 1.2167, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.5426, Unique Indices: 7068, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  40%|██████████████████████▌                                  | 50/126 [14:43<20:16, 16.01s/it]Step 50: Avg Train Acc 0.5875, Val Acc 0.7500, Train Think Len 356.62, Val Think Len 166.52
  Layer 20: Policy Loss 14.2365, Critic Loss 0.9431, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.8705, Unique Indices: 6617, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  48%|███████████████████████████▏                             | 60/126 [16:54<08:23,  7.62s/it]Step 60: Avg Train Acc 0.7000, Val Acc 0.7500, Train Think Len 150.50, Val Think Len 168.40
  Layer 20: Policy Loss 1.6700, Critic Loss 1.3669, Grad Norms (P/C) 0.00/0.00, Recon Loss 15.2031, Unique Indices: 6665, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  56%|███████████████████████████████▋                         | 70/126 [19:15<08:58,  9.61s/it]Step 70: Avg Train Acc 0.6375, Val Acc 0.7500, Train Think Len 168.12, Val Think Len 196.96
  Layer 20: Policy Loss 2.4080, Critic Loss 1.2116, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.5739, Unique Indices: 7218, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  63%|████████████████████████████████████▏                    | 80/126 [22:56<14:46, 19.28s/it]Step 80: Avg Train Acc 0.6500, Val Acc 0.7500, Train Think Len 155.12, Val Think Len 187.73
  Layer 20: Policy Loss 3.9269, Critic Loss 1.0200, Grad Norms (P/C) 0.00/0.00, Recon Loss 13.2892, Unique Indices: 7101, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  71%|████████████████████████████████████████▋                | 90/126 [25:45<08:23, 13.98s/it]Step 90: Avg Train Acc 0.6000, Val Acc 0.7708, Train Think Len 167.25, Val Think Len 216.92
  Layer 20: Policy Loss 1.4571, Critic Loss 1.4306, Grad Norms (P/C) 0.00/0.00, Recon Loss 14.1553, Unique Indices: 7576, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  79%|████████████████████████████████████████████▍           | 100/126 [29:21<05:42, 13.18s/it]Step 100: Avg Train Acc 0.5875, Val Acc 0.7083, Train Think Len 259.62, Val Think Len 167.38
  Layer 20: Policy Loss 12.0867, Critic Loss 1.1006, Grad Norms (P/C) 0.00/0.00, Recon Loss 11.0181, Unique Indices: 6684, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  87%|████████████████████████████████████████████████▉       | 110/126 [31:49<03:29, 13.11s/it]Step 110: Avg Train Acc 0.7500, Val Acc 0.7500, Train Think Len 130.00, Val Think Len 187.60
  Layer 20: Policy Loss 2.3315, Critic Loss 1.5050, Grad Norms (P/C) 0.00/0.00, Recon Loss 15.0282, Unique Indices: 7103, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps:  95%|█████████████████████████████████████████████████████▎  | 120/126 [34:42<01:35, 15.98s/it]Step 120: Avg Train Acc 0.6000, Val Acc 0.6875, Train Think Len 241.75, Val Think Len 192.08
  Layer 20: Policy Loss 12.8202, Critic Loss 1.3851, Grad Norms (P/C) 0.00/0.00, Recon Loss 10.9607, Unique Indices: 7111, Avg Activation: 30.0000, Avg Act Values: 30.0000
Training Steps: 100%|████████████████████████████████████████████████████████| 126/126 [37:59<00:00, 18.09s/it]
/cs/student/projects2/aisd/2024/seongcho/steer-rl/eval.py:476: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = TrainResult.model_validate(torch.load(checkpoint))
Config
        model: gemma2b
        task: gsm8k
        layers: [20]
        select_token: False
        decode: False
        category: None
        cot: True
        
Evaluating:  27%|████████████████                                           | 360/1319 [10:16<35:08,  2.20s/it]

normal, only avg


0|layers-gemma-gsm8k  | You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.
0|layers-gemma-gsm8k  | Step 0: Avg Train Acc 0.7500, Val Acc 0.6667, Train Think Len 236.50, Val Think Len 206.29
0|layers-gemma-gsm8k  |   Layer 10: Policy Loss 6.6327, Critic Loss 1.1613, Grad Norms (P/C) 0.00/0.00, Recon Loss 2.0147, Unique Indices: 947, Avg Activation: 5.2121, Avg Act Values: 7.5435
Training Steps:   2%|▊                                                     | 1/63 [15:00<15:30:06, 900.10s/it]
Training Steps:   3%|█▋                                                     | 2/63 [15:16<6:26:32, 380.21s/it]
Training Steps:   5%|██▌                                                    | 3/63 [15:42<3:38:21, 218.36s/it]
Training Steps:   6%|███▍                                                   | 4/63 [20:00<3:50:04, 233.98s/it]
Training Steps:   8%|████▎                                                  | 5/63 [24:23<3:56:26, 244.59s/it]
Training Steps:  10%|█████▏                                                 | 6/63 [24:51<2:42:25, 170.97s/it]
Training Steps:  11%|██████                                                 | 7/63 [25:22<1:56:51, 125.20s/it]
Training Steps:  13%|██████▉                                                | 8/63 [29:27<2:29:50, 163.46s/it]
Training Steps:  14%|███████▊                                               | 9/63 [33:33<2:50:08, 189.05s/it]
Training Steps:  16%|████████▌                                             | 10/63 [33:52<2:00:41, 136.62s/it]
0|layers-gemma-gsm8k  | Step 10: Avg Train Acc 0.6625, Val Acc 0.6667, Train Think Len 309.38, Val Think Len 208.33
0|layers-gemma-gsm8k  |   Layer 10: Policy Loss 65.6082, Critic Loss 0.7492, Grad Norms (P/C) 0.00/0.00, Recon Loss 2.0016, Unique Indices: 963, Avg Activation: 4.8784, Avg Act Values: 6.9036
Training Steps:  17%|█████████▍                                            | 11/63 [48:30<5:15:09, 363.65s/it]
Training Steps:  19%|██████████▎                                           | 12/63 [48:58<3:42:15, 261.49s/it]
^C

softmax, total avg



0|layers-gemma-gsm8k  | Step 0: Avg Train Acc 0.7500, Val Acc 0.7292, Train Think Len 236.50, Val Think Len 205.12
0|layers-gemma-gsm8k  |   Layer 10: Policy Loss -129.9479, Critic Loss 1.1613, Grad Norms (P/C) 1.00/0.00, Recon Loss 2.0147, Unique Indices: 825, Avg Activation: 5.2121, Avg Act Values: 8.6891
Training Steps:   2%|▊                                                     | 1/63 [14:58<15:28:34, 898.62s/it]
Training Steps:   3%|█▋                                                     | 2/63 [15:14<6:25:55, 379.60s/it]
Training Steps:   5%|██▌                                                    | 3/63 [19:36<5:25:31, 325.53s/it]
Training Steps:   6%|███▍                                                   | 4/63 [23:53<4:53:39, 298.63s/it]
Training Steps:   8%|████▎                                                  | 5/63 [28:17<4:36:37, 286.16s/it]
Training Steps:  10%|█████▏                                                 | 6/63 [28:39<3:06:40, 196.50s/it]
Training Steps:  11%|██████                                                 | 7/63 [29:04<2:10:50, 140.19s/it]
Training Steps:  13%|██████▉                                                | 8/63 [29:26<1:33:59, 102.54s/it]
Training Steps:  14%|███████▊                                               | 9/63 [33:31<2:12:29, 147.21s/it]
Training Steps:  16%|████████▌                                             | 10/63 [37:39<2:37:26, 178.24s/it]
0|layers-gemma-gsm8k  | Step 10: Avg Train Acc 0.6250, Val Acc 0.7292, Train Think Len 283.75, Val Think Len 200.62
0|layers-gemma-gsm8k  |   Layer 10: Policy Loss -23.1662, Critic Loss 0.7059, Grad Norms (P/C) 1.00/0.00, Recon Loss 2.0102, Unique Indices: 366, Avg Activation: 7.2268, Avg Act Values: 19.4772
Training Steps:  17%|█████████▍                                            | 11/63 [51:41<5:30:27, 381.30s/it]
^C

softmax, total avg


 _cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.
0|layers-gemma-gsm8k  | Step 0: Avg Train Acc 0.7500, Val Acc 0.7500, Train Think Len 232.62, Val Think Len 194.52
0|layers-gemma-gsm8k  |   Layer 10: Policy Loss -125.7120, Critic Loss 1.1553, Grad Norms (P/C) 1.00/0.00, Recon Loss 2.0171, Unique Indices: 891, Avg Activation: 0.0971, Avg Act Values: 0.3620
Training Steps:   2%|▊                                                     | 1/63 [11:55<12:19:09, 715.32s/it]
Training Steps:   3%|█▋                                                     | 2/63 [15:52<7:21:27, 434.22s/it]