QVQ

Creator

Creator

Seonglae Cho

Created

Created

2025 Jan 3 22:15

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jan 3 22:15

Refs

Refs

QVQ: To See the World with Wisdom

GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD Language and vision intertwine in the human mind, shaping how we perceive and understand the world around us. Our ability to reason is deeply rooted in both linguistic thought and visual memory - but what happens when we extend these capabilities to AI? Today’s large language models have demonstrated remarkable reasoning abilities, but we wondered: could they harness the power of visual understanding to reach new heights of cognitive capability?

https://qwenlm.github.io/blog/qvq-72b-preview/

Recommendations

///////