Instrumental Convergence

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 29 1:19
Editor
Edited
Edited
2024 Dec 5 9:52
Refs
Refs

AI pursues means to an extreme degree in order to achieve given objectives

This is from a very high-level, non-technical perspective while
AI Incentive
is induced not specified, and the notion that it will operate perpetually for a specific purpose does not align with current technological directions
Loss is a bottom-up approach to induce AI functionality, while the reward function is a top-down approach to deduce AI features. In reinforcement learning or human evolution, natural selection acts as feedback to acquire features like self-replication and survival. In contrast, AI mimics such features through metrics like loss. At least until now, RLHF like reinforcement learning for Large Model is just for an alignment not creating a new feature based on
Mechanistic interpretability
.
 
 
 
Instrumental Convergence - LessWrong
Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1]. This concept has also been discussed under the term basic drives. The idea was first explored by Steve Omohundro. He argued that sufficiently advanced AI systems would all naturally discover similar instrumental subgoals. The view that there are important basic AI drives was subsequently defended by Nick Bostrom as the instrumental convergence thesis, or the convergent instrumental goals thesis. On this view, a few goals are instrumental to almost all possible final goals. Therefore, all advanced AIs will pursue these instrumental goals. Omohundro uses microeconomic theory by von Neumann to support this idea. Omohundro’s Drives Omohundro presents two sets of values, one for self-improving artificial intelligences [2] and another he says will emerge in any sufficiently advanced AGI system [3]. The former set is composed of four main drives: * Self-preservation: A sufficiently advanced AI will probably be the best entity to achieve its goals. Therefore it must continue existing in order to maximize goal fulfillment. Similarly, if its goal system were modified, then it would likely begin pursuing different ends. Since this is not desirable to the current AI, it will act to preserve the content of its goal system. * Efficiency: At any time, the AI will have finite resources of time, space, matter, energy and computational power. Using these more efficiently will increase its utility. This will lead the AI to do things like implement more efficient algorithms, physical embodiments, and particular mechanisms. It will also lead the AI to replace desired physical events with computational simulations as much as possible, to expend fewer resources. * Acquisition: Resources like matter and energy are indispensable for action. The more resou
Instrumental Convergence - LessWrong
 
 

Recommendations