gpt-oss-safeguard

Unlike

ShieldGemma, it accepts policy at inference time and makes judgments based on reasoning. This means when policy content changes, it can be immediately reflected without model retraining. Flexible and explainable, but slower and higher compute cost using CoT

Introducing gpt-oss-safeguard

New open safety reasoning models (120b and 20b) that support custom safety policies.

https://openai.com/index/introducing-gpt-oss-safeguard/

gpt-oss-safeguard - a openai Collection

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss

https://huggingface.co/collections/openai/gpt-oss-safeguard

arxiv.org

https://arxiv.org/pdf/2508.10925

gpt-oss-safeguard

Recommendations