The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchMay 19, 2026

CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

Vision-Language Models (VLMs) achieve strong performance on spatial question answering benchmarks, yet it remains unclear whether such gains reflect genuine spatial intelligence. We show that existing spatial VLMs lack basic camera motion understanding, a key component of spatial cognition. We propo...

Read Original Article →

Source

http://arxiv.org/abs/2605.20165v1