Reinforcement Learning with Model-rewarded ThinkingReasoning Model reward such as Verifiable Reward www.arxiv.orghttps://www.arxiv.org/pdf/2509.20357