Automated AI Eliciting

Creator

Creator

Seonglae Cho

Created

Created

2024 Oct 24 23:23

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Oct 24 23:25

Refs

Refs

Eliciting Language Model Behaviors with Investigator Agents

We trained a set of generalist investigator LMs to automatically elicit behaviors from other target LMs, where a “behavior” is formalized as a response that satisfies some example-specific rule. Below, we give some qualitative examples of prompts produced by these investigators.

Eliciting Language Model Behaviors with Investigator Agents

https://transluce.org/automated-elicitation

Eliciting Language Model Behaviors with Investigator Agents

Recommendations

///////