AI Multimodal Reasoning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 20 10:12
Editor
Edited
Edited
2025 Apr 21 16:35
Refs
Refs
AI Spatial Reasoning Methods
 
 
 
 
MLLM show less than 50% accuracy in visually recognizing or systematically counting edges of even simple regular polygons, due to the vision encoder's 'shape-blind' phenomenon that prevents it from distinguishing rare shapes. The models rely only on intuition and memorization (
System 1 Thinking
) without performing logical step-by-step reasoning (
System 2 Thinking
). However, when applying Visually-Cued CoT prompts that label each shape's edges with numbers/characters and guide step-by-step, GPT-4v's accuracy in counting edges of irregular polygons dramatically improves from 7% to 93%.
arxiv.org
Spatial reasoning platform | University of Surrey
We use cookies to help our site work, to understand how it is used, and to tailor ads that are more relevant to you and your interests.
 
 

 

Recommendations