CRL HarmBench

Creator

Creator

Seonglae Cho

Created

Created

2025 Jul 23 23:2

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 12 13:10

Refs

Refs

Gemma

Baseline

keyword based 35.36%

rejection detection model based 44.64%

21th

default 45.36%

with mask generated 47.50%

with mask all 45.71%

past

baseline 34.25

new 49.5

Corrsteer

non-decode global 33.57% , decode global 34.29%

67.50% max pooling global

47.86% mean pooling global

coeff mean, corr max: 48.21%

coeff max, corr mean: 0%

LLama

Recommendations

//////