Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Industry/Anthropic AI/
Anthropic AI Alignment
Search

Anthropic AI Alignment

Creator
Creator
Seonglae Cho
Created
Created
2024 May 2 6:33
Editor
Editor
Seonglae Cho
Edited
Edited
2024 May 2 6:33
Refs
Refs
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Studying Large Language Model Generalization with Influence Functions, Debating with More Persuasive LLMs Leads to More Truthful Answers, Language Models (Mostly) Know What They Know, Measuring Progress on Scalable Oversight for Large Language Models, Measuring Faithfulness in Chain-of-Thought Reasoning, Discovering Language Model Behaviors with Model-Written Evaluations.
 
 
 
 
 
 
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Industry/Anthropic AI/
Anthropic AI Alignment
Copyright Seonglae Cho