AI News Archive: May 13, 2026 — Part 23

Sourced from 500+ daily AI sources, scored by relevance.

Harnessing Agentic Evolution
Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed proced...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13821v1
Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion
Digital phenotyping enables continuous passive monitoring of behavior and physiology, offering a promising paradigm for early detection of psychotic relapse. In this work, we develop and systematically study two smartwatch-based frameworks for daily relapse detection. The first forecasts cardiac dyn...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13816v1
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
Partial differential equations (PDEs) are fundamental for modeling complex natural and physical phenomena. In many real-world applications, however, observational data are extremely sparse, which severely limits the applicability of both classical numerical solvers and existing neural approaches. Wh...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13790v1
Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs
Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13788v1
Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers
Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-driven computational model centred on stateful s...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13784v1
Toward AI-Driven Digital Twins for Metropolitan Floods: A Conditional Latent Dynamics Network Surrogate of the Shallow Water Equations
AI-driven flood digital twins demand fast hydrodynamic surrogates for ensemble forecasting and observation assimilation. Yet even GPU-accelerated two-dimensional shallow water equation (SWE) solvers still require $\sim 55$ minutes per $96$-hour run on a $\sim 4.2$-million-active-cell metropolitan ba...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13761v1
Min Generalized Sliced Gromov Wasserstein: A Scalable Path to Gromov Wasserstein
We propose min Generalized Sliced Gromov--Wasserstein (min-GSGW), a sliced formulation for the Gromov--Wasserstein (GW) problem using expressive generalized slicers. The key idea is to learn coupled nonlinear slicers that assign compatible push-forward values to both input measures, so that monotone...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13753v1
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-yea...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13743v1
Learning POMDP World Models from Observations with Language-Model Priors
Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world mo...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13740v1
Robust and Explainable Bicuspid Aortic Valve Diagnosis Using Stacked Ensembles on Echocardiography
Transthoracic echocardiography (TTE) is the first-line imaging modality for diagnosing bicuspid aortic valve (BAV), yet diagnostic performance varies with operator expertise and image quality. We developed an explainable AI model that distinguishes BAV from tricuspid aortic valves (TAV) using routin...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13730v1
Polyhedral Instability Governs Regret in Online Learning
Many online decision problems over combinatorial actions are addressed via convex relaxations, leading to online convex optimization with piecewise linear objectives and induced polyhedral structure. We show that regret in such problems is governed by \emph{polyhedral instability}: the number of cha...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13692v1
MedCore: Boundary-Preserving Medical Core Pruning for MedSAM
Medical segmentation foundation models such as SAM and MedSAM provide strong prompt-driven segmentation, but their image encoders are still too large for many clinical settings. Compression is also risky in medicine because a model can keep high Dice while losing boundary fidelity. We propose MedCor...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13688v1
Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale
We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $γ>0$, uniform convergence at scale $γ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13684v1
Sampling from Flow Language Models via Marginal-Conditioned Bridges
Flow Language Models (FLMs) are a recently introduced class of language models which adapt continuous flow matching for one-hot encoded token sequences. Their denoisers have a special structure absent from generic continuous diffusion models: each block of the denoising mean is a posterior marginal ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13681v1
Three-Stage Learning Unlocks Strong Performance in Simple Models for Long-Term Time Series Forecasting
Recent studies on long-term time series forecasting have shown that simple linear models and MLP-based predictors can achieve strong performance without increasingly complex architectures. However, many competitive baselines still rely on structural priors such as frequency-domain modeling, explicit...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13678v1
Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles
Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13751v1
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
End-to-end autonomous driving, which bypasses traditional modular pipelines by directly predicting future trajectories from sensor inputs, has recently achieved substantial progress. However, existing methods often overlook the causal inter-dependencies in ego-vehicle planning, ignoring the reciproc...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13646v1
RotVLA: Rotational Latent Action for Vision-Language-Action Model
Latent Action Models (LAMs) have emerged as an effective paradigm for handling heterogeneous datasets during Vision-Language-Action (VLA) model pretraining, offering a unified action space across embodiments. However, existing LAMs often rely on discrete quantization encode and decode pipelines, whi...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13403v1
Trajectory-Level Data Augmentation for Offline Reinforcement Learning
We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that explo...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13401v1
ERPPO: Entropy Regularization-based Proximal Policy Optimization
Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by employing a centralized critic with decentralized actors. Howeve...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13131v1
What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models
Reinforcement learning (RL) fine-tuning has shown promise for Vision-Language-Action (VLA) models in robotic manipulation, but deployment-time visual shifts pose practical challenges. A key difficulty is that standard task rewards supervise task success, but offer limited guidance on whether a visua...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13105v1
OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation
LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13815v1
LMPath: Language-Mediated Priors and Path Generation for Aerial Exploration
Traditional autonomous UAV search missions rely on geometric coverage patterns that ignore the semantic context of the target, leading to significant time waste in large-scale environments. In this paper we present LMPath, a pipeline for generating language-mediated exploration priors for Unmanned A...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13782v1
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
Diffusion-based vision-language-action models (dVLAs) are promising for embodied intelligence but are fundamentally limited in real-time deployment by the high latency of full inference. We propose Realtime-VLA FLASH, a speculative inference framework that eliminates most full inference calls during...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13778v1
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
The scalability of robotic manipulation is fundamentally bottlenecked by the scarcity of task-aligned physical interaction data. While vision-language models (VLMs) and video generation models (VGMs) hold promise for autonomous data synthesis, they suffer from semantic-spatial misalignment and physi...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13775v1
FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long l...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13757v1
TinySDP: Real Time Semidefinite Optimization for Certifiable and Agile Edge Robotics
Semidefinite programming (SDP) provides a principled framework for convex relaxations of nonconvex geometric constraints in motion planning, yet existing solvers are too computationally expensive for real-time control, particularly on resource-constrained embedded systems. To address this gap, we in...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13748v1
LEXI-SG: Monocular 3D Scene Graph Mapping with Room-Guided Feed-Forward Reconstruction
Scene graphs are becoming a standard representation for robot navigation, providing hierarchical geometric and semantic scene understanding. However, most scene graph mapping methods rely on depth cameras or LiDAR sensors. In this work, we present LEXI-SG, the first dense monocular visual mapping sy...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13741v1
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
In this paper, we propose GTA-VLA(Guide, Think, Act), an interactive Vision-Language-Action (VLA) framework that enables spatially steerable embodied reasoning by allowing users to guide robot policies with explicit visual cues. Existing VLA models learn a direct "Sense-to-Act" mapping from multimod...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13632v1
AttenA+: Rectifying Action Inequality in Robotic Foundation Models
Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm, inherited from language modeling, remains indifferent to the underlying physical hiera...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13548v1
Integration of an Agent Model into an Open Simulation Architecture for Scenario-Based Testing of Automated Vehicles
Simulative and scenario-based testing are crucial methods in the safety assurance for automated driving systems. To ensure that simulation results are reliable, the real world must be modeled with sufficient fidelity, including not only the static environment but also the surrounding traffic of a ve...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13539v1
Uncertainty-Aware 3D Position Refinement for Multi-UAV Systems
Reliable real-time 3D localization is essential for multi-UAV navigation, collision avoidance, and coordinated flight, yet onboard estimates can degrade under GNSS multipath, non-line-of-sight reception, vertical drift, and intentional interference. This paper presents a decentralized, lightweight 3...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13500v1
CUBic: Coordinated Unified Bimanual Perception and Control Framework
Recent advances in visuomotor policy learning have enabled robots to perform control directly from visual inputs. Yet, extending such end-to-end learning from single-arm to bimanual manipulation remains challenging due to the need for both independent perception and coordinated interaction between a...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13452v1
SID: Sliding into Distribution for Robust Few-Demonstration Manipulation
Generalizing robotic manipulation across object poses, viewpoints, and dynamic disturbances is difficult, especially with only a few demonstrations. End-to-end visuomotor policies are expressive but data-hungry, while planning and optimization satisfy explicit constraints but do not directly capture...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13428v1
BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning
While autoregressive (AR) Vision-Language-Action (VLA) models have demonstrated formidable reasoning capabilities in robotic tasks, their sequential decoding process often incurs high inference latency and may amplify error accumulation during long-horizon execution. Discrete Diffusion Language Mode...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13382v1
HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation
VLN has achieved remarkable progress by scaling data and model capacity. However, the assumption of a static environment breaks down in real-world indoor scenarios, where robots inevitably encounter dynamic pedestrians. Existing human-aware approaches typically treat humans merely as moving obstacle...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13321v1
SECOND-Grasp: Semantic Contact-guided Dexterous Grasping
Achieving reliable robotic manipulation, such as dexterous grasping, requires a synergy between physically stable interactions and semantic task guidance, yet these objectives are often treated as separate, disjoint goals. In this paper, we investigate how to integrate dexterous grasping techniques,...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13117v1
TouchAnything: A Dataset and Framework for Bimanual Tactile Estimation from Egocentric Video
Egocentric human video data, which captures rich human-environment interactions and can be collected at scale, has become a key driver of embodied intelligence research. However, existing egocentric datasets typically lack tactile sensing, a critical modality that provides direct cues about contact,...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13083v1
When Absolute State Fails: Evaluating Proprioceptive Encodings for Robust Manipulation
As end-to-end robotic policies are progressively deployed in the real world to solve real tasks, they face a gap between the training and inference conditions. Scaling the amount and diversity of the training data has shown some success in improving zero-shot generalization, yet robots still fail wh...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13067v1
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
Wheeled-legged robots hold promise for traversing complex terrains and offer superior mobility compared to legged robots. However, wheeled-legged robots must effectively balance both wheeled driving and legged control. Furthermore, due to noisy proprioceptive sensing and real-world motor constraints...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13058v1
Local Conformal Calibration of Dynamics Uncertainty from Semantic Images
We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discrim...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13028v1
Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12974v1
DynoJEPP: Joint Estimation, Prediction and Planning in Dynamic Environments
DynoJEPP is a factor-graph-based framework that jointly formulates and simultaneously optimizes estimation, prediction, and planning in dynamic environments. In conventional factor-graph-based approaches that jointly formulate estimation, prediction, and planning, information from prediction and pla...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12897v1
Learning Perturbations to Extrapolate Your LLM
Recent advancements in large language models demonstrate that injecting perturbations can substantially enhance extrapolation performance. However, current approaches often rely on discrete perturbations with fixed designs, which limits their flexibility. In this work, we propose a framework where t...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13284v1
Unified generalization analysis for physics informed neural networks
Physics-Informed Neural Networks (PINNs) and their variational counterparts (VPINNs) are neural networks that incorporate physical laws, making them useful for scientific problems. Existing generalization analyses for PINNs and VPINNs remain limited, often requiring restrictive assumptions such as s...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13260v1
Amortized Neural Clustering of Time Series based on Statistical Features
This paper introduces an algorithm-agnostic approach to feature-based time series clustering via amortized neural inference. By training neural networks to approximate the optimal partitioning rule from simulated data, the proposed framework reduces reliance on conventional clustering methods, such ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13128v1
Adaptive Kernel Density Estimation with Pre-training
Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13092v1
Coreset-Induced Conditional Velocity Flow Matching
We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transpor...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12951v1
When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems
LLM-enabled AI workflows increasingly produce outputs through iterative generate-evaluate-revise loops. Each iteration can improve the candidate, but it also creates a release decision: when to stop and output the current result? This raises a statistical challenge because deployment-time evaluator ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12947v1
The Sample Complexity of Multiple Change Point Identification under Bandit Feedback
We study multiple change point localization under bandit feedback. An unknown piecewise-constant function on a compact interval can be queried sequentially at adaptively chosen inputs, and each query returns a noisy evaluation of the function. The goal is to identify a prescribed number of discontin...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13252v1