Eliciting Language Model Behaviors with Investigator Agents
We trained a set of generalist investigator LMs to automatically elicit behaviors from other target LMs, where a “behavior” is formalized as a response that satisfies some example-specific rule. Below, we give some qualitative examples of prompts produced by these investigators.
https://transluce.org/automated-elicitation