A Maximally Curious AI Would Not Be Safe For Humanity
Aligned doesn’t mean perfect
가르친 행동과 다른 행동의 불일치 정렬
모델의 능력보다 정렬이 더 빠르게 발생해야 한다
신경망의 내부를 보고 해석하는 다른 신경망이 필요할 것
AI Alignment Notion
The risks of AI are real but manageable
Bill Gates explains the risks associated with AI and argues that they are manageable. Innovations often create new risks that need to be controlled.
OpenAI is forming a new team to bring 'superintelligent' AI under control
OpenAI says that it's forming a new team, led by its chief scientist, to discover technical ways to align "superintelligent" AI with human intentions.
In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests. Some definitions of AI alignment require that the AI system advances more general goals such as human values, other ethical principles, or the intentions its designers would have if they were more informed and enlightened.
What could a solution to the alignment problem look like?
My currently favored approach to alignment research is to build a system that does alignment research better than us. But what would that system actually do? The obvious answer is "whatever we're doing right now." This is unsatisfactory because we're not actually trying to solve the whole alignment problem-we're just trying to build a better alignment researcher.