AI News Archive: May 14, 2026 — Part 20
Sourced from 500+ daily AI sources, scored by relevance.
- Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation
Psychological defense mechanisms (PDMs) are unconscious cognitive processes that modulate how individuals perceive and respond to emotional distress. Automatically classifying PDMs from text is clinically valuable but severely hindered by data scarcity and class imbalance, challenges which generativ...
- Herculean: An Agentic Benchmark for Financial Intelligence
As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static compet...
- Dynamic Latent Routing
We investigate the temporal concatenation of sub-policies in Markov Decision Processes (MDP) with time-varying reward functions. We introduce General Dijkstra Search (GDS), and prove that globally optimal goal-reaching policies can be recovered through temporal composition of intermediate optimal su...
- Minimal-Intervention KV Retention: A Design-Space Study and a Diversity-Penalty Survivor
KV-cache compression at small budgets is a crowded design space spanning cache representation, head-wise routing, compression cadence, decoding behavior, and within-budget scoring. We study seven mechanisms across these five families under matched mean cache on long-form mathematical reasoning (MATH...
- MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to privacy constraints, making unified MoE training challenging. We prop...
- Exploring Vision-Language Models for Online Signature Verification: A Zero-Shot Capability Study
Recent advancements in Vision-Language Models (VLMs) have demonstrated strong capabilities in general visual reasoning, yet their applicability to rigorous biometric tasks remains unexplored. This work presents an exploratory study evaluating the zero-shot performance of state-of-the-art VLMs (GPT-5...
- Learning Direct Control Policies with Flow Matching for Autonomous Driving
We present a flow-matching planner for autonomous driving that directly outputs actionable control trajectories defined by acceleration and curvature profiles. The model is conditioned on a bird's-eye-view (BEV) raster of the surrounding scene and generates control sequences in a small number of Ord...
- Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation
In recent years, computer vision has witnessed remarkable progress, fueled by the development of innovative architectures such as Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), diffusion-based architectures, Vision Transformers (ViTs), and, more recently, Vision-Langua...
- Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning
Neural networks suffer from catastrophic forgetting in class-incremental learning (CIL) settings. Rehearsal$\unicode{x2013}$replaying a subset of past samples$\unicode{x2013}$is a well-established mitigation strategy. However, recent results suggest that, despite balanced rehearsal allocation, some ...
- MonoPRIO: Adaptive Prior Conditioning for Unified Monocular 3D Object Detection
Monocular 3D object detection remains challenging because metric size and depth are underdetermined by single-view evidence, particularly under occlusion, truncation, and projection-induced scale-depth ambiguity. Although recent methods improve depth and geometric reasoning, metric size remains unst...
- EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding
Understanding human--environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still struggle with accurate interaction reasoning and fine-grained pixel grounding. To this end, this paper...
- Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
Recent unified models integrate multimodal understanding and generation within a single framework. However, an "understanding-generation gap" persists, where models can capture user intent but often fail to translate this semantic knowledge into precise pixel-level manipulation. This gap results in ...
- StyleTextGen: Style-Conditioned Multilingual Scene Text Generation
Style-conditioned scene text generation faces unique challenges in extracting precise text styles from complex backgrounds and maintaining fine-grained style consistency across characters, especially for multilingual scripts. We propose StyleTextGen, a novel framework that learns to perceive and rep...
- EponaV2: Driving World Model with Comprehensive Future Reasoning
Data scaling plays a pivotal role in the pursuit of general intelligence. However, the prevailing perception-planning paradigm in autonomous driving relies heavily on expensive manual annotations to supervise trajectory planning, which severely limits its scalability. Conversely, although existing p...
- Are Candidate Models Really Needed for Active Learning?
Deep learning has profoundly impacted domains such as computer vision and natural language processing by uncovering complex patterns in vast datasets. However, the reliance on extensive labeled data poses significant challenges, including resource constraints and annotation errors, particularly in t...
- Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging
Self-supervised pre-training methods in medical imaging typically treat each individual as an isolated instance, learning representations through augmentation-based objectives or masked reconstruction. They often do not adequately capitalize on a key characteristic of physiological features: anatomi...
- Deep Image Segmentation via Discriminant Feature Learning
Accurate image segmentation remains challenging, particularly in generating sharp, confident boundaries. While modern architectures have advanced the field, many of them still rely on standard loss functions like Cross-Entropy and Dice, which often neglect the discriminative structure of learned fea...
- Towards Accurate Single Panoramic 3D Detection: A Semantic Gaussian Centric Approach
Three-dimensional object detection in panoramic imagery is crucial for comprehensive scene understanding, yet accurately mapping 2D features to 3D remains a significant challenge. Prevailing methods often project 2D features onto discrete 3D grids, which break geometric continuity and limit represen...
- MechVerse: Evaluating Physical Motion Consistency in Video Generation Models
Text- and image-conditioned video generation models have achieved strong visual fidelity and temporal coherence, but they often fail to generate motion governed by kinematic and geometric constraints. In these settings, object parts must remain rigid, maintain contact or coupling with neighboring co...
- Multi-proposal Collaboration and Multi-task Training for Weakly-supervised Video Moment Retrieval
This study focuses on weakly-supervised Video Moment Retrieval (VMR), aiming to identify a moment semantically similar to the given query within an untrimmed video using only video-level correspondences, without relying on temporal annotations during training. Previous methods either aggregate predi...
- HDRFace: Rethinking Face Restoration with High-Dimensional Representation
Face restoration under complex degradations still remains an ill-posed inverse problem due to severe information loss. Although diffusion models benefit from strong generative priors, most methods still condition only on low-quality inputs, making it difficult to recover identity-critical details un...
- The Velocity Deficit: Initial Energy Injection for Flow Matching
While Flow Matching theoretically guarantees constant-velocity trajectories, we identify a critical breakdown in high-dimensional practice: the Velocity Deficit. We show that the MSE objective systematically underestimates velocity magnitude, causing generated samples to fail to reach the data manif...
- Probing into Camera Control of Video Models
Video is a rich and scalable source of 3D/4D visual observations, and camera control is a key capability for video generation models to produce geometrically meaningful content. Existing approaches typically learn a mapping from camera motion to video using additional camera modules and paired data....
- SuperADD: Training-free Class-agnostic Anomaly Segmentation -- CVPR 2026 VAND 4.0 Workshop Challenge Industrial Track
Visual anomaly detection (AD) for industrial inspection is a highly relevant task in modern production environments. The problem becomes particularly challenging when training and deployment data differ due to changes in acquisition conditions during production. In the VAND 4.0 Industrial Track, mod...
- COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking
Referring Multi-Object Tracking (RMOT) faces a fundamental structural contradiction between the high-discriminability demand and the sparse semantic supervision. This mismatch is particularly acute in highly homogeneous scenarios that require fine-grained discrimination over complex compositional se...
- BioHuman: Learning Biomechanical Human Representations from Video
Understanding human motion beyond surface kinematics is crucial for motion analysis, rehabilitation, and injury risk assessment. However, progress in this domain is limited by the lack of large-scale datasets with biomechanical annotations, and by existing approaches that cannot directly infer inter...
- Video-Zero: Self-Evolution Video Understanding
Self-evolution offers a promising path for improving reasoning models without relying on intensive human annotation. However, extending this paradigm to video understanding remains underexplored and challenging: videos are long, dynamic, and redundant, while the evidence needed for reasoning is ofte...
- UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars
Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of massive human motion...
- Deepheem
Evidence-backed AML & compliance investigations
- AirDoc AI
What if invoices just.. showed up in Xero
- Nexalearn
AI Mentor for UPSC & Competitive Exam Success
- LangXP
Your free AI- language exam grader for multiple languages
- CoverRoute
Generate tailored cover letters from your CV
- Lawcrative
All in one Legal software for AI-powered practice management
- LangProtect Guardia
Prompt injection’s least favorite product.
- Tokavy — Less Tokens. Better Answers.
Your AI prompts, refined in a keystroke
- Selint
Sales meeting assistant with real-time suggestions for teams
- Penta-V Kernel
Sub-nanosecond (845ps) Rust safety kernel for AI alignment.
- Visual Prompt Builder for GPT — EmojiPT
Build better prompts with clicks, not typing in ChatGPT
- OpenInterpretability
Open-source toolkit to audit what your LLM knows
- overlandai
Generate real-world adventures with AI in seconds
- RedactEngine
Batch Blur sensitive information in images with offline AI
- X402AgentPay
x402 commerce layer for AI agents — beyond raw HTTP payments
- Mindmesh
AI workspace for focused work and modern teams
- BrandAuditor
Brand auditor for India - TM, Domain, Name, Social, SEO, AI
- Cloe
A realistic AI character living on your desktop
- BAIN tools
Finding best ais
- AI Prism
Gathering the latest AI news from around the world
- TubeFilter
AI reads YouTube so you don't have to
- MiVE: Multiscale Vision-language features for reference-guided video Editing
Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and unedited content. Existing methods fall into two paradigms, each with inherent limitations: deco...