AI Spatial Reasoning Methods
MLLM show less than 50% accuracy in visually recognizing or systematically counting edges of even simple regular polygons, due to the vision encoder's 'shape-blind' phenomenon that prevents it from distinguishing rare shapes. The models rely only on intuition and memorization (System 1 Thinking) without performing logical step-by-step reasoning (System 2 Thinking). However, when applying Visually-Cued CoT prompts that label each shape's edges with numbers/characters and guide step-by-step, GPT-4v's accuracy in counting edges of irregular polygons dramatically improves from 7% to 93%.
Spatial reasoning platform | University of Surrey
We use cookies to help our site work, to understand how it is used, and to tailor ads that are more relevant to you and your interests.
https://www.surrey.ac.uk/spatial-reasoning

Seonglae Cho