Visually continuous Focusing aligned with token generationarxiv.orghttps://arxiv.org/pdf/2312.09237.pdf