AI 3D Memory
AI Spatial Memory Methods
Spatial understanding
VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models | Menlo Research
VoxRep is a novel framework that enables standard Vision-Language Models to interpret structured 3D voxel data, facilitating richer spatial comprehension vital for advanced AI systems.
https://menlo.ai/blog/voxrep

VLMs excel at semantic understanding but struggle with spatial relationships. Current VLMs have a real spatial blindspot, caused by encoder design and 1D alignment methods. To fix this, spatial grounding itself must be treated as a separate design axis, not just semantic capability.
arxiv.org
https://arxiv.org/pdf/2601.09954

Seonglae Cho