Explicitly design action space by PPO by designing dense reward function without Value network and Genetic Algorithm based Refinement based Jailbreaking
RL-JACK
Creator
Creator

Created
Created
2024 Dec 21 1:58Editor
Editor

Edited
Edited
2025 Jan 14 11:17Refs
Refs