Reward misspecification problem
A phenomenon where things progress in an unexpected direction
Waluigi is Luigi's "villain version" - a character created specifically to be "Luigi's anti" since Luigi originally had no enemies. This illustrates the concept that when a good character is defined, its opposite (Antagonist ) conceptually emerges
This means that when a concept repeatedly appears as good, the model can implicitly form its opposite (negative character)
Paperclip maximizer is Nick Bostrom's thought experiment
If you input a simple goal reward to a superintelligence, it will use any means necessary to achieve it, potentially consuming all of Earth's iron and energy to continue production indefinitely
The risk of superintelligence is not simply at the level of a Paperclip maximizer

Seonglae Cho
