The Goodhart's Law

Creator
Creator
Seonglae Cho
Created
Created
2021 May 31 5:44
Editor
Edited
Edited
2025 May 25 17:10

When a measure becomes a target, it ceases to be a good measure

관측된 통계적 규칙성은 그것을 조종할 목적으로 개입할 경우 사라져버리는 경향
특정 수치를 만족시키기 위해 시스템을 "게임화"함으로써 국소적으로만 최적화하는 경향
영국의 경제학자 Charles Goodhart의 이름을 따서 명명
We might be unable to foresee negative consequences that arise due to excessive optimization pressure on a goal that would look otherwise well specified to humans
 
 
 
 
 
Goodhart's Law - LessWrong
Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good. Goodhart's Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown. Goodhart Taxonomy In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting: * Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal. * Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal. * Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed. * Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal. See Also * Groupthink, Information cascade, Affective death spiral * Adaptation executers, Superstimulus * Signaling, Filtered evidence * Cached thought * Modesty argument, Egalitarianism * Rationalization, Dark arts * Epistemic hygiene * Scoring rule
Goodhart's Law - LessWrong
 
 
 
 
 

Backlinks

AI Problem

Recommendations