Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Emergent Misalignment/
Petri
Search

Petri

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 10 13:13
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Oct 14 9:12
Refs
Refs
petri
safety-research • Updated 2025 Oct 12 20:48

Parallel Exploration Tool for Risky Interactions

 
 
 
 
Petri: An open-source auditing tool to accelerate AI safety research
We're releasing Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework for automated auditing that uses AI agents to test the behaviors of target models across diverse scenarios. When applied to 14 frontier models with 111 seed instructions, Petri successfully elicited a broad set of misaligned behaviors including autonomous deception, oversight subversion, whistleblowing, and cooperation with human misuse. The tool is available now at github.com/safety-research/petri.
https://alignment.anthropic.com/2025/petri/
Petri: An open-source auditing tool to accelerate AI safety research
A new automated auditing tool for AI safety research
Petri: An open-source auditing tool to accelerate AI safety research
https://www.anthropic.com/research/petri-open-source-auditing
Petri: An open-source auditing tool to accelerate AI safety research
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Emergent Misalignment/
Petri
Copyright Seonglae Cho