Goodhart's Law

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 18 14:9
Editor
Edited
Edited
2024 Nov 18 14:26
Refs

When a measure becomes a target, it ceases to be a good measure

We might be unable to foresee negative consequences that arise due to excessive optimization pressure on a goal that would look otherwise well specified to humans
 
 
 
`
 
Goodhart's Law - LessWrong
Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good. Goodhart's Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown. Goodhart Taxonomy In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting: * Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. * Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal. * Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed. * Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal. See Also * Groupthink, Information cascade, Affective death spiral * Adaptation executers, Superstimulus * Signaling, Filtered evidence * Cached thought * Modesty argument, Egalitarianism * Rationalization, Dark arts * Epistemic hygiene * Scoring rule
Goodhart's Law - LessWrong
 
 

Recommendations