Petri: An open-source auditing tool to accelerate AI safety research
We're releasing Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework for
automated auditing that uses AI agents to test the behaviors of target models across diverse
scenarios. When applied to 14 frontier models with 111 seed instructions, Petri successfully elicited a
broad set of misaligned behaviors including autonomous deception, oversight subversion, whistleblowing,
and cooperation with human misuse. The tool is available now at github.com/safety-research/petri.
https://alignment.anthropic.com/2025/petri/