The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchMay 26, 2026

Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models

Large vision-language models increasingly rely on long-context modeling to reason over documents, hour-level videos, and long-horizon agent trajectories, requiring them to locate relevant evidence across interleaved text and images. Prior work has studied this behavior using retrieval heads in large...

Read Original Article →

Source

http://arxiv.org/abs/2605.27243v1