Petri

Creator

Creator

Seonglae Cho

Created

Created

2025 Oct 10 13:13

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Oct 14 9:12

Refs

Refs

safety-research • Updated 2025 Oct 12 20:48

Parallel Exploration Tool for Risky Interactions

Petri: An open-source auditing tool to accelerate AI safety research

We're releasing Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework for automated auditing that uses AI agents to test the behaviors of target models across diverse scenarios. When applied to 14 frontier models with 111 seed instructions, Petri successfully elicited a broad set of misaligned behaviors including autonomous deception, oversight subversion, whistleblowing, and cooperation with human misuse. The tool is available now at github.com/safety-research/petri.

https://alignment.anthropic.com/2025/petri/

Petri: An open-source auditing tool to accelerate AI safety research

A new automated auditing tool for AI safety research

https://www.anthropic.com/research/petri-open-source-auditing

Petri: An open-source auditing tool to accelerate AI safety research

Recommendations

///////