Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Language Model/LLM/Prompt Engineering/
Instruction hierarchy
Search

Instruction hierarchy

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 1 3:38
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Sep 1 11:58
Refs
Refs
AI Safety
AI Jailbreak
AI Alignment

Prompt Hierarchy

Teaches LLMs to selectively ignore lower-privileged instructions
notion image
Can prevent
Prompt Injection
 
 

Instruction hierarchy from OpenAI

arxiv.org
https://arxiv.org/pdf/2404.13208
Prioritize platform, developer, and user guidelines in that order, reject harmful requests, and provide clear, fact-based information (
Instruction hierarchy
on
CoT
)
OpenAI Model Spec
The Model Spec specifies desired behavior for the models underlying OpenAI's products (including our APIs).
OpenAI Model Spec
https://model-spec.openai.com/2025-02-12.html
OpenAI Model Spec
Claude model is good at following Instruction Hierarchy
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
https://openai.com/index/openai-anthropic-safety-evaluation/
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
 
 

Backlinks

AI JailbreakSystem PromptAI RedteamingPreparedness Framework

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Language Model/LLM/Prompt Engineering/
Instruction hierarchy
Copyright Seonglae Cho