Prompt Hierarchy
Teaches LLMs to selectively ignore lower-privileged instructions

Can prevent Prompt Injection
Instruction hierarchy from OpenAI
arxiv.org
https://arxiv.org/pdf/2404.13208
Prioritize platform, developer, and user guidelines in that order, reject harmful requests, and provide clear, fact-based information (Instruction hierarchy on CoT)
OpenAI Model Spec
The Model Spec specifies desired behavior for the models underlying OpenAI's products (including our APIs).
https://model-spec.openai.com/2025-02-12.html

Claude model is good at following Instruction Hierarchy
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
https://openai.com/index/openai-anthropic-safety-evaluation/


Seonglae Cho