Language Model Rewards
Language Model Reward Benchmarks
Reinforcement Learning Transformers
Language Model RL Frameworks
Era of Experience
but, it has low Sample efficiency in larger samples and only finds better reasoning paths within its existing capacity which makes its total problem solving coverage smaller
While the provocative title is not exactly correct, it provides insight even for Multimodality