Induced incentive
- The goal of LLMs is not language - it is something that was induced
- ChatGPT is a community platform where the public participates in aligning AGI
Observation
- computing cost is decreasing exponentially
- teaching more low-level intelligence from induce incentive requires more computing
- low level (transformer) for high incentive structure (intelligence)
- unlike human, machine has different time budget
Loss is a bottom-up approach to induce AI functionality, while the reward function is a top-down approach to deduce AI features. In reinforcement learning or human evolution, natural selection acts as feedback to acquire features like self-replication and survival. In contrast, AI mimics such features through metrics like loss. At least until now, RLHF like reinforcement learning for Large Model is just for an alignment not creating a new feature based on Mechanistic interpretability.
Future architecture
Just as every value in the society are converted into money in society, all rewards in the brain are reduced to dopamine or happiness. Similarly, in AI models, this is currently modeled as reward or loss, mostly future prediction. However, this structure is too simple and easy to be reward hacked, and in the era where AI agents and Fast Weight are being introduced, AI could also incorporate more biological insights like running time, additional computation cost, or self-replication as incentives.
Teaching a man how to fish is more valuable than just giving him a fish. Teach him the taste of fish and make him hungry, rather than merely teaching him how to fish
One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior.