AI News Archive: May 19, 2026 — Part 28

Sourced from 500+ daily AI sources, scored by relevance.

RECIPE: Procedural Planning via Grounding in Instructional Video
Visual planning asks a model to generate the remaining steps of a procedure in natural language given a partial video context and a goal. Progress on this task is bottlenecked by annotation: clean labeled datasets are small, domain-narrow, and encode a single execution trajectory per example, even t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19976v1
SphericalDreamer: Generating Navigable Immersive 3D Worlds with Panorama Fusion
The generation of immersive and navigable 3D environments is increasingly prevalent with the growing adoption of virtual reality and 3D content. However, recent methods face a fundamental limitation: they cannot produce 3D worlds that simultaneously (i) are navigable over long-range spatial extents ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19974v1
Feed-Forward Gaussian Splatting from Sparse Aerial Views
Reconstructing large-scale urban scenes from sparse aerial views is a crucial yet challenging task. Due to biased top-down and shallow-oblique camera poses, sparse aerial captures exhibit strong evidence imbalance: roofs and open regions are repeatedly observed, while facades, distant buildings, and...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19949v1
Motion
A video agent for tasteful motion design
🧰 ToolsMay 19, 2026https://www.producthunt.com/products/motion-8?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
GoTTA be Diverse: Rethinking Memory Policies for Test-Time Adaptation
Test-time adaptation (TTA) enables a pre-trained model to adapt online to an unlabeled test stream under distribution shift. While most TTA research focuses on the adaptation objective, practical streams also depend critically on the memory used to select which test samples drive adaptation. Existin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19890v1
GLUT: 3D Gaussian Lookup Table for Continuous Color Transformation
3D Lookup Tables (3D LUTs) are widely used for color mapping, but their grid-based representation requires discretizing the RGB space, leading to a capacity-memory trade-off that becomes prohibitive when storing large numbers of LUTs. Recent approaches adopt implicit neural representations to improv...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19889v1
Structural Energy Guidance for View-Consistent Text-to-3D Generation
Text-to-3D generation based on diffusion models often suffers from the Janus problem, leading to inconsistent geometry across viewpoints. This work identifies viewpoint bias in 2D diffusion priors as the main cause and proposes Structural Energy-Guided Sampling (SEGS), a training-free and plug-and-p...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19876v1
Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding
Vision-Language Models (VLMs) parse documents end-to-end but frequently break down on layouts unlike those seen in training. We attribute this to a two-hop bottleneck: before the decoder can extract content (Hop 2), it must first classify and localize the enclosing layout entity (Hop 1), and when th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19866v1
When Preference Labels Fall Short: Aligning Diffusion Models from Real Data
Preference alignment aims to guide generative models by learning from comparisons between preferred and non-preferred samples. In practice, most existing approaches rely on preference pairs constructed from model-generated images. Such supervision is inherently relative and can be ambiguous when bot...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19839v1
Stitched Value Model for Diffusion Alignment
For practical use, diffusion- or flow-based generative models must be aligned with task-specific rewards, such as prompt fidelity or aesthetic preference. That alignment is challenging because the reward is defined for clean output images, but the alignment procedure requires value function estimate...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19804v1
Synergistic Foundation Models for Semi-Supervised Fetal Cardiac Ultrasound Analysis: SAM-Med2D Boundary Refinement and DINOv3 Semantic Enhancement
We present a semi-supervised framework for joint segmentation and classification of fetal cardiac ultrasound images. Built upon the EchoCare multi-task backbone, our method integrates SAM-Med2D for boundary refinement and leverages DINOv3 to enhance pseudo-label quality. We introduce view-specific h...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19799v1
Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth
Monocular depth estimation has improved significantly in recent years, driven by increasingly powerful models and large-scale training data. Predicted depth is increasingly used as an input signal for downstream tasks such as Structure-from-Motion (SfM), visual localization, and SLAM. However, monoc...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19797v1
Mechanisms of Object Localization in Vision-Language Models
Visually-grounded language models (VLMs) are highly effective in linking visual and textual information, yet they often struggle with basic classification and localization tasks. While classification mechanisms have been studied more extensively, the processes that support object localization remain...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19792v1
When Does Model Collapse Occur in Structured Interactive Learning?
The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major c...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20151v1
TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning
Learning generalizable trajectory representations from raw GPS traces remains difficult because the data is continuous, noisy, and irregularly sampled. Spatial tokenization is also challenging: fine grids yield sparse cells with weak embeddings, while coarse grids merge heterogeneous movement patter...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20134v1
Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing
Learning to generalise from limited data is a fundamental challenge for both artificial and biological systems. A common strategy is to extract reusable structure from abundant unlabelled data, enabling efficient adaptation to new tasks from limited labelled data. This two-stage paradigm is now stan...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20105v1
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates
Fine-tuning large language models on new data improves task performance but degrades capabilities learned during pretraining, a phenomenon known as catastrophic forgetting. Existing methods mitigate this by modifying the fine-tuning objective to suppress high-loss tokens or sequences, but these toke...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20005v1
Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning
Decentralized learning (DL) is an emerging machine learning paradigm where nodes collaboratively train models without a central server. However, the collaborative nature of DL makes it vulnerable to backdoor attacks, where a model is taught to behave normally on standard inputs while executing hidde...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19969v1
Exploiting Non-Negativity in DAG Structure Learning
This work addresses the problem of learning directed acyclic graphs (DAGs) from nodal observations generated by a linear structural equation model. DAG learning is a central task in signal processing, machine learning, and causal inference, but it remains challenging because acyclicity is a global c...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19947v1
Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation
Uniform sampling on implicitly defined manifolds is a core primitive in motion planning, constrained simulation, and probabilistic machine learning. MASEM addresses this problem by entropy-maximizing resampling, but its resampling weights depend on a local k-nearest-neighbour density estimate whose ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19938v1
JAXenstein: Accelerated Benchmarking for First-Person Environments
The progression of reinforcement learning algorithms have been driven by challenging benchmarks. The rate in which a researcher can iterate on a problem setting directly impacts the speed of algorithm development. Modern machine learning has produced tools that allow for fast and scalable algorithm ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19926v1
Hierarchical Contrastive Learning for Multi-Domain Protein-Ligand Binding
Predicting protein-ligand binding affinity remains intractable for multi-domain proteins, where inter-domain dynamics govern molecular recognition. Existing geometric deep learning methods typically treat proteins as monolithic static graphs, suffering from rigid-body assumptions and aleatoric noise...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19902v1
Auditing Privacy in Multi-Tenant RAG under Account Collusion
Multi-tenant retrieval-augmented generation (RAG) services advertise per-account differential privacy as the operative leakage boundary: each account's queries are guaranteed to satisfy $(\varepsilon_{\text{acc}}, δ_{\text{acc}})$-DP with respect to the index. We identify same-index multi-account co...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19847v1
Multi-axis Analysis of Image Manipulation Localization
Advanced image editing software enables easy creation of highly convincing image manipulations, which has been made even more accessible in recent years due to advances in generative AI. Manipulated images, while often harmless, could spread misinformation, create false narratives, and influence peo...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20174v1
SAGE: Scalable Automatic Gating Ensemble for Confident Negative Harvesting in Fraud Detection
Music streaming fraud, where bad actors artificially inflate stream counts to manipulate chart rankings and royalty payments, poses a significant threat to streaming services and legitimate content creators. Traditional fraud detection approaches struggle with a critical challenge: many legitimate e...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20157v1
Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization
Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For min...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20145v1
Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20122v1
Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization
Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying pr...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20074v1
Tail Annealing for Heavy-Tailed Flow Matching
Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $φ(x) = \mathrm{sign}(x) \cdot \log(1 ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20068v1
Active Context Selection Improves Simple Regret in Contextual Bandits
We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instanc...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20040v1
When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System
Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intellige...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20037v1
D$^3$-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing Market
Ride-hailing platforms like DiDi Chuxing operate in highly dynamic environments where balancing driver supply and passenger demand is critical. Although driver-side subsidies serve as a primary lever to align these forces and improve key KPIs like completed rides (\texttt{Rides}) and gross merchandi...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20036v1
CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection
Text-attributed graph fraud detection (TAGFD) plays a critical role in preventing fraudulent activities on online social and e-commerce platforms. However, to evade detection, fraudsters continuously evolve their camouflaging strategies by deliberately mimicking textual responses of benign users, th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20032v1
Take It or Leave It: Intent-Controlled Partial Optimal Transport
While optimal transport (OT) enforces a rigid constraint by requiring two measures to be matched exactly, partial optimal transport relaxes this requirement by allowing mass to remain unmatched through a global budget, scalar rebate, or uniform rejection rule. However, many applications call for mor...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20030v1
Training-Free Bayesian Filtering with Generative Emulators
Bayesian filtering is a well-known problem that aims to estimate plausible states of a dynamical system from observations. Among existing approaches to solve this problem, particle filters are theoretically exact for non-linear dynamics and observations, but suffer from poor scalability in high dime...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20028v1
Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation
Fine-grained manipulation marks a regime where global scene context no longer suffices, and success hinges on the tight coupling of local attribute grounding, high-fidelity spatial perception, and constraint-respecting motor execution. However, current embodied AI benchmarks collapse these capacitie...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19986v1
Normative Networks for Source Separation via Local Plasticity and Dendritic Computation
Blind source separation (BSS) is a natural framework for studying how latent causes may be recovered from sensory mixtures, but deriving online and biologically plausible algorithms for structured (i.e., constrained to known domains) and potentially correlated sources remains challenging. Recent wor...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19965v1
Learning Orthonormal Bases for Function Spaces
Infinite-dimensional orthonormal basis expansions play a central role in representing and computing with function spaces due to their favorable linear algebraic properties. However, common bases such as Fourier or wavelets are fixed and do not adapt to the structure of a given problem or dataset. In...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19959v1
RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations
Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-colle...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19924v1
Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning
Pretrained imitation policies have become a strong foundation for robot manipulation, but they often require online improvement to overcome execution errors, limited dataset coverage, and deployment mismatch. A central question is therefore how reinforcement learning (RL) should adapt policies after...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19919v1
Trajectory Planning and Control near the Limits: an Open Experimental Benchmark on the RoboRacer Platform
We present a modular framework to benchmark new and existing methods for trajectory planning and control in high-acceleration maneuvers that push autonomous driving to the limits. Our framework includes time-optimal raceline generation, online time-optimal velocity replanning, geometric path trackin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19881v1
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives
Existing imitation learning methods for end-to-end autonomous driving predominantly learn from successful demonstrations by minimizing geometric deviations from expert trajectories. This paradigm implicitly assumes that spatial proximity implies behavioral safety, leading to a critical objective mis...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19771v1
D-CLING: Prior-Preserving Depth-Conditioned Fine-Tuning for Navigation Foundation Models
Navigation Foundation Models (NFMs) trained on large cross-embodied datasets have demonstrated powerful generalizability in various scenarios. Adopting in-domain fine-tuning for an NFM efficiently calibrates the visuomotor policy, promising further improvement even in a novel scenario. However, the ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19690v1
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models
End-to-end autonomous driving has emerged as a compelling alternative to traditional modular pipelines by directly mapping raw sensor data to driving actions. While recent approaches achieve strong performance on single-domain datasets, their performance degrades significantly when trained jointly a...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19631v1
Implicit Action Chunking for Smooth Continuous Control
Reinforcement learning often produces high-frequency oscillatory control signals that undermine the safety and stability required for physical deployment. Explicit action chunking addresses this by predicting fixed-horizon trajectories but scales the policy output dimension proportionally with the h...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19592v1
SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving
End-to-end autonomous driving systems excel in common scenarios but struggle with safety-critical long-tail cases. Vision-Language-Action (VLA) models are promising due to their strong reasoning capabilities. However, most VLA-based approaches rely on positive expert demonstrations, rarely exploitin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19524v1
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19503v1
Sampling-Based Safe Reinforcement Learning
Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constr...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19469v1
Domain-Adaptive Communication-Rate Optimization for Sim-to-Real Humanoid-Robot Wireless XR Teleoperation
Wireless extended reality (XR) teleoperation provides embodied interaction capability for collecting humanoid robot demonstrations, but the large-scale adoption is restricted by the overhead of high-frequency motion transmission. This paper develops a system framework that integrates sampling, trans...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19293v1
Graph Neural Planning and Predictive Control for Multi-Robot Communication-Constrained Unlabeled Motion Planning
The multi-robot unlabeled motion planning problem of concurrently assigning robots to goals and generating safe trajectories is central in many collaborative tasks. Recent Graph Neural Network methods offer scalable decentralized solutions but rely on simplified dynamics and simulation environments,...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19209v1