AI News Archive: May 14, 2026 — Part 20

Sourced from 500+ daily AI sources, scored by relevance.

Minimal-Intervention KV Retention: A Design-Space Study and a Diversity-Penalty Survivor
KV-cache compression at small budgets is a crowded design space spanning cache representation, head-wise routing, compression cadence, decoding behavior, and within-budget scoring. We study seven mechanisms across these five families under matched mean cache on long-form mathematical reasoning (MATH...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14292v1
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to privacy constraints, making unified MoE training challenging. We prop...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14289v1
Exploring Vision-Language Models for Online Signature Verification: A Zero-Shot Capability Study
Recent advancements in Vision-Language Models (VLMs) have demonstrated strong capabilities in general visual reasoning, yet their applicability to rigorous biometric tasks remains unexplored. This work presents an exploratory study evaluating the zero-shot performance of state-of-the-art VLMs (GPT-5...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14845v1
Learning Direct Control Policies with Flow Matching for Autonomous Driving
We present a flow-matching planner for autonomous driving that directly outputs actionable control trajectories defined by acceleration and curvature profiles. The model is conditioned on a bird's-eye-view (BEV) raster of the surrounding scene and generates control sequences in a small number of Ord...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14832v1
Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation
In recent years, computer vision has witnessed remarkable progress, fueled by the development of innovative architectures such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), diffusion-based architectures, Vision Transformers (ViTs), and, more recently, Vision-Langua...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14799v1
Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning
Neural networks suffer from catastrophic forgetting in class-incremental learning (CIL) settings. Rehearsal$\unicode{x2013}$replaying a subset of past samples$\unicode{x2013}$is a well-established mitigation strategy. However, recent results suggest that, despite balanced rehearsal allocation, some ...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14785v1
MonoPRIO: Adaptive Prior Conditioning for Unified Monocular 3D Object Detection
Monocular 3D object detection remains challenging because metric size and depth are underdetermined by single-view evidence, particularly under occlusion, truncation, and projection-induced scale-depth ambiguity. Although recent methods improve depth and geometric reasoning, metric size remains unst...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14781v1
EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding
Understanding human--environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still struggle with accurate interaction reasoning and fine-grained pixel grounding. To this end, this paper...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14742v1
Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
Recent unified models integrate multimodal understanding and generation within a single framework. However, an "understanding-generation gap" persists, where models can capture user intent but often fail to translate this semantic knowledge into precise pixel-level manipulation. This gap results in ...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14709v1
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation
Style-conditioned scene text generation faces unique challenges in extracting precise text styles from complex backgrounds and maintaining fine-grained style consistency across characters, especially for multilingual scripts. We propose StyleTextGen, a novel framework that learns to perceive and rep...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14708v1
EponaV2: Driving World Model with Comprehensive Future Reasoning
Data scaling plays a pivotal role in the pursuit of general intelligence. However, the prevailing perception-planning paradigm in autonomous driving relies heavily on expensive manual annotations to supervise trajectory planning, which severely limits its scalability. Conversely, although existing p...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14696v1
Are Candidate Models Really Needed for Active Learning?
Deep learning has profoundly impacted domains such as computer vision and natural language processing by uncovering complex patterns in vast datasets. However, the reliance on extensive labeled data poses significant challenges, including resource constraints and annotation errors, particularly in t...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14689v1
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging
Self-supervised pre-training methods in medical imaging typically treat each individual as an isolated instance, learning representations through augmentation-based objectives or masked reconstruction. They often do not adequately capitalize on a key characteristic of physiological features: anatomi...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14654v1
Deep Image Segmentation via Discriminant Feature Learning
Accurate image segmentation remains challenging, particularly in generating sharp, confident boundaries. While modern architectures have advanced the field, many of them still rely on standard loss functions like Cross-Entropy and Dice, which often neglect the discriminative structure of learned fea...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14609v1
Towards Accurate Single Panoramic 3D Detection: A Semantic Gaussian Centric Approach
Three-dimensional object detection in panoramic imagery is crucial for comprehensive scene understanding, yet accurately mapping 2D features to 3D remains a significant challenge. Prevailing methods often project 2D features onto discrete 3D grids, which break geometric continuity and limit represen...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14601v1
MechVerse: Evaluating Physical Motion Consistency in Video Generation Models
Text- and image-conditioned video generation models have achieved strong visual fidelity and temporal coherence, but they often fail to generate motion governed by kinematic and geometric constraints. In these settings, object parts must remain rigid, maintain contact or coupling with neighboring co...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14843v1
Multi-proposal Collaboration and Multi-task Training for Weakly-supervised Video Moment Retrieval
This study focuses on weakly-supervised Video Moment Retrieval (VMR), aiming to identify a moment semantically similar to the given query within an untrimmed video using only video-level correspondences, without relying on temporal annotations during training. Previous methods either aggregate predi...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14838v1
HDRFace: Rethinking Face Restoration with High-Dimensional Representation
Face restoration under complex degradations still remains an ill-posed inverse problem due to severe information loss. Although diffusion models benefit from strong generative priors, most methods still condition only on low-quality inputs, making it difficult to recover identity-critical details un...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14821v1
The Velocity Deficit: Initial Energy Injection for Flow Matching
While Flow Matching theoretically guarantees constant-velocity trajectories, we identify a critical breakdown in high-dimensional practice: the Velocity Deficit. We show that the MSE objective systematically underestimates velocity magnitude, causing generated samples to fail to reach the data manif...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14819v1
Probing into Camera Control of Video Models
Video is a rich and scalable source of 3D/4D visual observations, and camera control is a key capability for video generation models to produce geometrically meaningful content. Existing approaches typically learn a mapping from camera motion to video using additional camera modules and paired data....
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14815v1
SuperADD: Training-free Class-agnostic Anomaly Segmentation -- CVPR 2026 VAND 4.0 Workshop Challenge Industrial Track
Visual anomaly detection (AD) for industrial inspection is a highly relevant task in modern production environments. The problem becomes particularly challenging when training and deployment data differ due to changes in acquisition conditions during production. In the VAND 4.0 Industrial Track, mod...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14808v1
COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking
Referring Multi-Object Tracking (RMOT) faces a fundamental structural contradiction between the high-discriminability demand and the sparse semantic supervision. This mismatch is particularly acute in highly homogeneous scenarios that require fine-grained discrimination over complex compositional se...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14795v1
BioHuman: Learning Biomechanical Human Representations from Video
Understanding human motion beyond surface kinematics is crucial for motion analysis, rehabilitation, and injury risk assessment. However, progress in this domain is limited by the lack of large-scale datasets with biomechanical annotations, and by existing approaches that cannot directly infer inter...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14772v1
Video-Zero: Self-Evolution Video Understanding
Self-evolution offers a promising path for improving reasoning models without relying on intensive human annotation. However, extending this paradigm to video understanding remains underexplored and challenging: videos are long, dynamic, and redundant, while the evidence needed for reasoning is ofte...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14733v1
UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars
Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of massive human motion...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14731v1
Deepheem
Evidence-backed AML & compliance investigations
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/deepheem?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
AirDoc AI
What if invoices just.. showed up in Xero
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/airdoc-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Nexalearn
AI Mentor for UPSC & Competitive Exam Success
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/nexalearn?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
LangXP
Your free AI- language exam grader for multiple languages
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/langxp?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
CoverRoute
Generate tailored cover letters from your CV
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/coverroute?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Lawcrative
All in one Legal software for AI-powered practice management
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/lawcrative?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
LangProtect Guardia
Prompt injection’s least favorite product.
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/langprotect?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Tokavy — Less Tokens. Better Answers.
Your AI prompts, refined in a keystroke
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/tokavy-less-tokens-better-answers?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Selint
Sales meeting assistant with real-time suggestions for teams
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/selint-ai-co-pilot-for-sales-meetings?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Penta-V Kernel
Sub-nanosecond (845ps) Rust safety kernel for AI alignment.
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/penta-v-kernel?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Visual Prompt Builder for GPT — EmojiPT
Build better prompts with clicks, not typing in ChatGPT
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/visual-prompting-for-gpt-ai-emojipt?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
OpenInterpretability
Open-source toolkit to audit what your LLM knows
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/openinterpretability?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
overlandai
Generate real-world adventures with AI in seconds
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/overlandai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
RedactEngine
Batch Blur sensitive information in images with offline AI
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/redactengine?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
X402AgentPay
x402 commerce layer for AI agents — beyond raw HTTP payments
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/x402agentpay?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Mindmesh
AI workspace for focused work and modern teams
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/mindmesh-3?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
BrandAuditor
Brand auditor for India - TM, Domain, Name, Social, SEO, AI
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/brandauditor?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Cloe
A realistic AI character living on your desktop
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/cloe-3?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
BAIN tools
Finding best ais
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/bain-tools?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
AI Prism
Gathering the latest AI news from around the world
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/ai-prism?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
TubeFilter
AI reads YouTube so you don't have to
🧰 ToolsMay 14, 2026https://www.producthunt.com/products/tubefilter?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
MiVE: Multiscale Vision-language features for reference-guided video Editing
Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and unedited content. Existing methods fall into two paradigms, each with inherent limitations: deco...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14664v1
TERRA-CD: Multi-Temporal Framework for Multi-class and Semantic Change Detection
Urban vegetation monitoring plays a vital role in understanding environmental changes, yet comprehensive datasets for this purpose remain limited. To address this gap, we present the Temporal Remote-sensing Repository for Analyzing Change Detection (TERRA-CD), a benchmark dataset comprising 5,221 Se...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14651v1
UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation
RGB-T semantic segmentation requires strictly aligned VIS-IR-Label triplets; however, such aligned triplet data are often scarce in real-world scenarios. Existing generative augmentation methods usually adopt cascaded generation paradigms, decomposing joint triplet generation into local conditional ...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14626v1
ViMU: Benchmarking Video Metaphorical Understanding
Any new medium, once it emerges, is used for more than the transmission of overt content alone. The information it carries typically operates on two levels: one is the content directly presented, while the other is the subtext beneath it-the implicit ideas and intentions the creator seeks to convey ...
📄 ResearchMay 14, 2026http://arxiv.org/abs/2605.14607v1