Value learning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 18 14:10
Editor
Edited
Edited
2024 Apr 23 10:14
Refs
Refs
 
 
 
 
 
Value Learning - LessWrong
Value learning is a proposed method for incorporating human values in an AGI. It involves the creation of an artificial learner whose actions consider many possible sets of values and preferences, weighed by their likelihood. Value learning could prevent an AGI of having goals detrimental to human values, hence helping in the creation of Friendly AI. Many ways have been proposed to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition, mostly proposed around 2004-2010). Value learning was suggested in 2011 by Daniel Dewey in ‘Learning What to Value’. Like most authors, he assumes that an artificial agent needs to be intentionally aligned to human goals. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, this could suffer from goal misspecification or reward hacking. He proposes a utility function maximizer comparable to AIXI, which considers all possible utility functions weighted by their Bayesian probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" Nick Bostrom also discusses value learning at length in his book Superintelligence. Value learning is closely related to various proposals for AI-assisted Alignment and AI-assisted/AI automated Alignment research. Since human values are complex and fragile, learning human values well is a challenging problem, much like AI-assisted Alignment (but in a less supervised setting, so
Value Learning - LessWrong
 
 

Recommendations