AI News Archive: May 18, 2026 — Part 15

Sourced from 500+ daily AI sources, scored by relevance.

Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory
Autoregressive video generation has improved rapidly in visual fidelity and interactivity, but it still suffers from long-term inconsistency and memory degradation. Most existing solutions either compress historical frames using predefined strategies or retrieve keyframes based on coarse implicit at...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18733v1
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction
The ability to navigate and interact with complex environments is central to real-world embodied agents, yet navigation in unseen environments remains challenging due to "experiential amnesia," where existing trajectory-driven or reactive policies fail to synthesize generalizable strategies from pas...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18729v1
CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation
Metaverse platforms rely on creator-driven marketplaces where avatars are assembled from discrete, taxonomy-labeled 3D assets (e.g., tops, bottoms, shoes, accessories) under strict category and topology constraints. While users increasingly expect free-form text control, text-only retrieval is britt...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18680v1
Dance Across Shifts: Forward-Facilitation Continual Test-Time Adaptation through Dynamic Style Bridging
Continual Test-Time Adaptation (CTTA) aims to empower perception systems to handle dynamic distribution shifts encountered after deployment. Existing methods predominantly follow a backward-alignment paradigm, which rigidly aligns incoming data with supervisory surrogates derived from the source dom...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18608v1
Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
Vision-Language Models (VLMs) deployed as situated agents in high-resolution visual environments require active perception -- the ability to dynamically decide where to look through operations like zooming, cropping, and panning. However, current training paradigms produce models that mimic the surf...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18603v1
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
Modern interactive video world models have achieved impressive visual fidelity, yet lack fine-grained multi-entity control and cross-entity, cross-world generalization. We trace this gap to the action interface: standard control protocols (e.g. animation IDs, device inputs, scene-level captions) bin...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18601v1
Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling
Transformer-based models have advanced feedforward novel view synthesis (NVS). Current architectures such as GS-LRM and LVSM mix semantic information (e.g., RGB) and spatial information (e.g., Plücker rays) into a shared feature space. Since Plücker rays naturally carry lattice-like spatial structur...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18599v1
OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding
Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, ado...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18577v1
Beyond Morphology: Quantifying the Diagnostic Power of Color Features in Cancer Classification
In histopathology, human experts primarily rely on color as a means of enhancing contrast to interpret tissue morphology, whereas machine vision models process color as raw statistical information. This distinction raises a fundamental question: to what extent can pixel intensity alone, independent ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18522v1
Weakly Supervised Cross-Modal Learning for 4D Radar Scene Flow Estimation
Due to the difficulty of obtaining ground-truth data for 4D radar scene flow estimation, previous methods typically rely on either self-supervised losses or cross-modal supervision using 3D LiDAR data, 2D images, and odometry. However, self-supervised approaches often yield suboptimal results due to...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18507v1
InstructAV2AV: Instruction-Guided Audio-Video Joint Editing
Recent diffusion-based methods have achieved impressive progress in video content manipulation. However, they typically ignore the accompanying audio, leaving the audio disjointed from the edited results. In this paper, we propose InstructAV2AV, the first end-to-end framework for instruction-guided ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18467v1
Perceptron Mk1
Frontier video reasoning for the physical world
🧰 ToolsMay 18, 2026https://www.producthunt.com/products/perceptron-mk1?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, te...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18451v1
NeRF-based Spacecraft Reconstruction from Close-Range Monocular Imagery Under Illumination Variability and Pose Uncertainty
Autonomous rendezvous and proximity operations around uncooperative, unknown spacecraft are critical for active debris removal and on-orbit servicing missions. A key component of such operations is the offline reconstruction of a 3D model of the target from a set of 2D images. This task is challengi...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18447v1
Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity o...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18413v1
NEWTON: Agentic Planning for Physically Grounded Video Generation
Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitting the parameters t...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18396v1
GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation
Generating geometrically consistent videos remains an open challenge: text-to-video diffusion models trained on web-scale data treat geometry only implicitly, leading to object deformation, texture drift, and non-rigid backgrounds under camera motion. Existing solutions either improve consistency as...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18365v1
Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad
Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the convergence of first-o...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18694v1
Can machine learning for quantum-gas experiments be explainable?
Virtually all aspects of many-body atomic physics are challenging: experiments are technically demanding, datasets have become enormous, and the memory and CPU requirements for classical simulation of generic quantum systems often scale exponentially with system size. Machine learning (ML) methods a...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18689v1
A No-Defense Defense Against Gradient-Based Adversarial Attacks on ML-NIDS: Is Less More?
Gradient-based adversarial attacks subtly manipulate inputs of Machine Learning (ML) models to induce incorrect predictions. This paper investigates whether careful architectural choices alone can yield an inherently robust Deep Neural Network (DNN)-based Network Intrusion Detection Systems (NIDS), ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18666v1
Efficient and Noise-Tolerant PAC Learning of Multiclass Linear Classifiers
Noise-tolerant PAC learning of linear models has been of central interests in machine learning community since the last century. In recent years, many computationally-efficient algorithms have been proposed for the problem of learning linear threshold functions under multiple noise models. Yet, when...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18662v1
Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)
Sparse autoencoders (SAEs) are one of the main methods to interpret the inner workings of deep neural networks (DNNs), decomposing activations into higher-dimensional features. However, they exhibit critical shortcomings where a large fraction of features are never activated and are unstable. Despit...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18629v1
Learning to Look Benign: Targeted Evasion of Malware Detectors via API Import Injection
Machine learning-based malware detectors are widely deployed in antivirus and endpoint detection systems, yet their reliance on static features makes them vulnerable to adversarial manipulation. This paper investigates whether a malware sample can be intentionally misclassified as a specific benign ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18624v1
Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
Accelerating stochastic gradient methods with classical momentum schemes, such as Polyak's heavy ball, has proven highly successful in training large-scale machine learning models, particularly when combined with the hardware acceleration of large mini-batch computations. Yet, the effect of classica...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18609v1
Pointwise Generalization in Deep Neural Networks
We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear feature-learning regime and builds a new statistical foundati...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18598v1
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs
Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficie...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18579v1
Self-supervised local learning rules learn the hidden hierarchical structure of high-dimensional data
The brain learns abstract representations of high-dimensional sensory input, but the plasticity rules that enable such learning are unknown. We study biologically plausible algorithms on the Random Hierarchy Model (RHM), an artificial dataset designed to investigate how deep neural networks learn th...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18557v1
NonBioS.ai
The AI Software Dev with its own Computer
🧰 ToolsMay 18, 2026https://www.producthunt.com/products/nonbios-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Protein Fold Classification at Scale: Benchmarking and Pretraining
Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification con...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18552v1
SURGE: Approximation-free Training Free Particle Filter for Diffusion Surrogate
Diffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high compu...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18745v1
Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation
Unmanned Aerial Vehicles in dynamic environments face telemetry outages, structural vibrations, and regime-dependent noise that invalidate the stationary covariance assumptions of classical Kalman filters. The Sage-Husa Kalman Filter (SHKF) estimates noise statistics online, but its reliance on a st...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18704v1
Learning Normal Representations for Blood Biomarkers
Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risk...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18701v1
Better Together: Evaluating the Complementarity of Earth Embedding Models
Earth embedding models transform Earth observation data into embeddings uniquely tied to locations on the Earth's surface. These models are typically evaluated in isolation, comparing the downstream task performance across different Earth embeddings. However, spatially aligned embeddings can natural...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18667v1
Physics-Aligned Canonical Equivariant Fourier Neural Operator under Symmetry-Induced Shifts
Neural operators approximate PDE solution maps, but they need not respect the symmetries of the governing equation. In out-of-distribution (OOD) regimes, a standard neural operator must often learn coordinate alignment and physical evolution within a single map, which can hurt generalization. We use...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18606v1
PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference
Single-cell trajectory inference from destructive time-course snapshots is fundamentally ill-posed: neither cross-time cell correspondences nor continuous trajectories are observed, so the snapshot distributions alone do not uniquely determine the underlying dynamics. Existing optimal transport and ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18587v1
scHelix: Asymmetric Dual-Stream Integration via Explicit Gene-Level Disentanglement
A critical challenge in single-cell RNA sequencing (scRNA-seq) integration is resolving the tension between eliminating batch effects and maintaining biological fidelity. While recent evidence indicates that batch effects manifest heterogeneously across genes, most existing methods process the trans...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18576v1
Probing for Representation Manifolds in Superposition
This paper introduces the Manifold Probe, a supervised method for discovering representation manifolds in superposition. The method generalizes linear regression probes by learning the space of features of a concept that can be linearly predicted from the representations, and then learning the direc...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18537v1
Beyond Scaling: Agents Are Heading to the Edge
The bottleneck of useful agentic intelligence has shifted from compressing world knowledge into a single model to executing a coordinated system. This position paper argues that personal-agent architecture must move to the edge because the core properties of agentic intelligence tasks, particularly ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18535v1
Unified Walking, Running, and Recovery for Humanoids via State-Dependent Adversarial Motion Priors
We propose a unified reinforcement learning framework that enables a single policy to perform walking, running, and fall recovery on the Unitree G1 humanoid robot, validated on physical hardware without any explicit mode-switching command at deployment. The framework extends Adversarial Motion Prior...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18611v1
Qumus: Realization of An Embodied AI Quantum Material Experimentalist
While modern Large Language Models (LLMs) and agentic artificial intelligence (AI) have demonstrated transformative capabilities in digital domains, the realization of embodied AI capable of real-world scientific discovery remains a difficult frontier. The advancements are hindered by the inherent c...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18407v1
On Improving Multimodal Pedestrian Trajectory Prediction with CVAE: A Study on Benchmark and Robot Data
Accurate pedestrian trajectory prediction is crucial for autonomous systems operating in complex environments, such as modular buses and delivery robots in suburban or semi-structured areas. Social Spatio-Temporal Graph Convolutional Neural Networks (Social-STGCNN) have shown strong performance by m...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18262v1
4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving
We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocit...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18074v1
Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations
Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18059v1
Transfer Learning for Customized Car Racing Environments
Transfer Learning, a technique where a model/agent can use the knowledge/expertise that it gained from one task and exploit that to solve another closely-related task, is often used in tackling problems in deep learning. Through this project, we explore transfer learning in the purview of deep reinf...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.17928v1
CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization
Recent aerial vision-language navigation (VLN) datasets have grown rapidly, but they primarily address goal-oriented navigation to static destinations, leaving UAV visual tracking -- continuously following a moving target while maintaining visibility -- largely without dedicated training data. We in...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.17776v1
Dexora: Open-source VLA for High-DoF Bimanual Dexterity
Vision-Language-Action (VLA) models have recently become a central direction in embodied AI, but current systems are restricted to either dual-gripper control or single-arm dexterous hand manipulation. While low-dimensional gripper control can often be handled with simpler methods, high-dimensional ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18722v1
REBAR: Reference Ethical Benchmark for Autonomy Readiness
As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of their limitations and ensuring accountability of those who misuse them. Current ethical embodied AI frameworks remain mostly qualitative, focusing on sys...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18423v1
PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics
World models built on recurrent state space architectures enable efficient latent imagination, yet remain physically unstructured, producing dynamics that violate conservation and dissipative principles. We introduce a unified Port-Hamiltonian framework that remedies this through three synergistic m...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18303v1
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, ...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18287v1
RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots
Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Ex...
📄 ResearchMay 18, 2026http://arxiv.org/abs/2605.18197v1