In real-life practical coding scenarios, we primarily use multi-turn chat, making it crucial to improve multi-turn ability. The biggest issues arise from Compounding Error and misalignment between human perspective and AI understanding. Therefore, when iterating improvements, it's important to provide context dumps while clearly specifying the human user's context summary, identified problems, and focus areas. The bias introduced by this approach is relatively minimal compared to the frustration and inefficiency caused by misalignment.
Voice input is closer to real-life conversation and can enhance quality by delivering richer context compared to keyboard input.
Multi-Session Chat(MSC) Dataset
The authors provided annotated summary of each session and summarizer trained with the summaries.
In GPT agent-based experiments, repeated interactions consistently lead to belief convergence and diversity (entropy) reduction. Additionally, using Bayesian updates and trust matrices, researchers prove that when mutual trust exceeds a certain threshold, groups become overly confident in factually incorrect beliefs. In other words, the mutual feedback loop between humans and Large Language Models (LLMs) can reduce the diversity of user beliefs and lock in incorrect beliefs.
The Lock-in Hypothesis: Stagnation by Algorithm
Frontier AI systems, such as large language models (LLMs) (Zhao et al., 2023), are increasingly influencing human beliefs and values (Fisher et al., 2024; Leib et al., 2021; Costello et al., 2024). This creates a self-reinforcing feedback loop: AI systems learn values from human data at pre- and post-training stages (Conneau & Lample, 2019; Bai et al., 2022; Santurkar et al., 2023), influence human opinions through their interactions, and then reabsorb those influenced beliefs, and so on. What equilibrium will this dynamic process reach?
https://arxiv.org/html/2506.06166
Multi turn conversation is the weak joint

Seonglae Cho