AI News Archive: May 19, 2026 — Part 26

Sourced from 500+ daily AI sources, scored by relevance.

Google increases SynthID access to identify AI-generated content
Google increases SynthID access to identify AI-generated content The National
🌐 MovesMay 19, 2026https://www.thenationalnews.com/future/technology/2026/05/19/google-synthid-identify-ai-content/
Google SynthID comes to Chrome, Search, and ChatGPT. Users can right-click to check for AI content.
At Google I/O 2026, the company announced it's expanding its SynthID digital watermark, and OpenAI announced it's adopting it, too
🌐 MovesMay 19, 2026https://mashable.com/article/google-openai-synthid-google-io-2026
Atoms of Thought: Universal EEG Representation Learning with Microstates
Learning universal representations from electroencephalogram (EEG) signals is a cutting-edge approach in the field of neuroinformatics and brain-computer interfaces (BCIs). Conventionally, EEG is treated as a multivariate temporal signal, where time- or frequency-domain features are extracted for re...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20182v1
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
Production LLM agents combine stochastic model outputs with deterministic software systems, yet the boundary between the two is rarely treated as a first-class architectural object. This paper names that boundary the stochastic-deterministic boundary (SDB): a four-part contract among a proposer, ver...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20173v1
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20164v1
Less Back-and-Forth: A Comparative Study of Structured Prompting
Large language models (LLMs) are widely used for open-ended tasks, but underspecified prompts can lead to low-quality answers and additional interaction. This paper studies whether structured prompt design improves response quality while reducing user effort. We compare three prompt conditions: a ra...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20149v1
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding
Speculative decoding (SD) accelerates large language model inference by leveraging a draft-then-verify paradigm. To maximize the acceptance rate, recent methods construct expansive draft trees, which unfortunately incur severe VRAM bandwidth and computational overheads that bottleneck end-to-end spe...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20104v1
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sendin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20087v1
What Do Evolutionary Coding Agents Evolve?
Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typi...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20086v1
BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
Large language models (LLMs) can enhance factuality via retrieval-augmented generation (RAG), but applying RAG to every query is unnecessary when the model-only answer is reliable. This motivates cascaded RAG: each query is first handled by an LLM-only branch, escalated to a RAG fallback only if the...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20084v1
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
The rapid growth of autonomous driving datasets has enabled the scaling of powerful motion forecasting models. While large-scale pretraining provides strong performance, the standard imitation objective may not fully capture the complex nuances of human driving preferences. Meanwhile, recent advance...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20082v1
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning
Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20075v1
Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20072v1
Towards LLM-Assisted Architecture Recovery for Real-World ROS~2 Systems: An Agent-Based Multi-Level Approach to Hierarchical Structural Architecture Reconstruction
Explicit software architecture models are essential artifacts for communicating, analyzing, and evolving complex software-intensive systems. In ROS~2-based robotic systems, however, structural (de-)composition and integration semantics are often only implicitly encoded across distributed artifacts s...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20055v1
PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling
Automatic report labeling facilitates the identification of clinical findings from unstructured text and enables large-scale annotation for medical imaging research. Existing rule-based labelers struggle with the diverse descriptions in clinical reports, while fine-tuning pre-trained language models...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20052v1
When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity
Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task pass rates by an average of 16.2~percentage points across diverse domains. Yet the same benchmarks show wide variance, with 16 of 84 tasks suffering negative delt...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20023v1
Training Neural Networks with Optimal Double-Bayesian Learning
Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is cruci...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20009v1
LLM Benchmark Datasets Should Be Contamination-Resistant
Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pretraining corpora, i.e., $\textit{contaminated}$, which diminishes their value as reliable measures of model generalization...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19999v1
Block-Sphere Vector Quantization
Vector quantization is a fundamental primitive for scalable machine learning systems, enabling memory-efficient storage, fast retrieval, and compressed inference. Recent rotation-based quantizers such as EDEN, RabitQ, and TurboQuant have introduced strong guarantees and empirical performance, but th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19972v1
Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes
Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed perplexity-based detectors. We cast adversarial suffix detection as an online change-point detection problem over the token-level next-token entropy stream...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19966v1
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits
While empirical scaling laws for LLM reasoning are well-documented, the theoretical mechanisms governing out-of-distribution (OOD) generalization remain elusive. We formalize reasoning via optimal transport, projecting discrete trajectories into a continuous metric space to quantify domain shifts us...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19944v1
Probabilistic Tiny Recursive Model
Tiny Recursive Models (TRM) solve complex reasoning tasks with a fraction of the parameters of modern large language models (LLMs) by iteratively refining a latent state and final answer. While powerful, their deterministic recursion can lead to convergence at suboptimal solutions, without escape me...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19943v1
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preser...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19932v1
Fast and Featureless Node Representation Learning with Partial Pairwise Supervision
We introduce Contrastive FUSE, a fast and unified framework for scalable node representation learning in graphs with partially available pairwise node labels and no available node features. Unlike existing methods, we directly optimize a spectral contrastive objective that integrates community-aware...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19916v1
StableGrad: Backward Scale Control without Batch Normalization
Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often mitigate this problem through Batch Normalization...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19856v1
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
Concept-based Explainable Artificial Intelligence (XAI) interprets deep learning models using human-understandable visual features (e.g., textures or object parts) by linking internal representations to class predictions, thereby bridging the gap between low-level image data and high-level semantics...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19855v1
HaorFloodAlert: Deseasonalized ML Ensemble for 72-Hour Flood Prediction in Bangladesh Haor Wetlands
Flash floods in Bangladesh's haor wetlands show up with almost no warning. They wreck the annual boro rice harvest. Current setups, built for riverine floods, miss backwater dynamics entirely. These basins are flat. Water does not behave like it does on the Brahmaputra. We built HaorFloodAlert, a ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20167v1
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models
Large Vision Language Models (LVLMs) show promise in medical applications, but their inability to faithfully ground responses in visual evidence raises serious concerns about clinical trustworthiness. While visual attribution methods are widely used to explain LVLM predictions, whether these explana...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20158v1
Beyond Prediction Accuracy: Target-Space Recovery Profiles for Evaluating Model-Brain Alignment
Artificial vision models are often evaluated against the human visual cortex by measuring how accurately their internal representations predict brain responses. However, prediction accuracy alone does not indicate which dimensions of the target brain's response space are recovered. Here, we introduc...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20127v1
Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem
AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20120v1
Toto 2.0: Time Series Forecasting Enters the Scaling Era
We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20119v1
k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics
While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while m...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20108v1
Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction
JEPAs often regularize one-view embeddings toward an isotropic Gaussian, implicitly baking Euclidean symmetry into the representation. We show that this is not merely a benign default. For a known structured downstream geometry $H\succ0$, the minimax and maximum-entropy covariance under a Hamiltonia...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20107v1
Neurosymbolic Learning for Inference-Time Argumentation
Claim verification is an important problem in high-stakes settings, including health and finance. When information underpinning claims is incomplete or conflicting, uncertain answers may be more appropriate than binary true or false classifications. In all cases, faithful explanations of the conside...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20098v1
INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification
Discovering shapelets -- i.e., discriminative temporal patterns within time series -- has been widely studied to address the inherent complexity of time-series classification (TSC) and to make model decision-making processes more transparent. However, existing methods primarily focus on population-l...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20088v1
Probability-Conserving Flow Guidance
Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generati...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20079v1
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20025v1
GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through e...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20006v1
A Case for Agentic Tuning: From Documentation to Action in PostgreSQL
Documentation has long guided computer system tuning by distilling expert knowledge into per-parameter recommendations. Yet such guides capture only what experts conclude, discarding how they reason. This fundamental gap manifests in three concrete deficiencies: documentation grows stale as software...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19988v1
Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction
Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decisi...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19975v1
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
World models are widely explored in embodied intelligence, yet they typically predict distinct evolutions of the world and the ego within a single stream, where the world captures persistent instruction-agnostic scene regularities and the ego captures robot-centric instruction-conditioned dynamics. ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19957v1
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
Mixture-of-Expert (MoE) models enable efficient inference by employing smaller experts and activating only a subset of them per token. MoE serving engines distribute experts across multiple GPUs and route tokens to appropriate GPUs at inference time based on experts activated. They process tokens in...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19945v1
Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains
Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-ho...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19940v1
StruMPL: Multi-task Dense Regression under Disjoint Partial Supervision and MNAR Labels
Estimating forest aboveground biomass (AGB) from Earth observation combines two structurally incompatible label sources: spaceborne lidar provides canopy structure at millions of locations but no biomass estimate, and ground-based plots provide biomass at thousands of biased locations but no metrics...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19931v1
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
Low-bit post-training quantization (PTQ) is a pivotal technique for deploying Vision-Language Models (VLMs) on resource-constrained devices. However, existing PTQ methods often degrade VLMs' accuracy due to the heterogeneous activation distributions of text and vision modalities during quantization....
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19929v1
Real-Time Parallel Counterfactual Regret Minimization
Counterfactual Regret Minimization (CFR) is the dominant algorithmic family for solving large imperfect-information games, underpinning breakthroughs such as Libratus and Pluribus in No-Limit Texas Hold'em poker. In real-time game-playing systems, the solver must compute a near-equilibrium strategy ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19928v1
Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions
Constraint programming practitioners accelerate hard problems through a layered set of techniques applied in order of risk. Standard hardening (symmetry-breaking and implied constraints) is applied first and preserves satisfiability. Streamliner constraints, which restrict search to a structural sub...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19895v1
Deep Tech to Space: Space Data Centers and AI Revolution at the Edge
Dramatic cost reductions driven by private sector innovations have led to a rapid increase in the number of satellites in orbit and a corresponding surge in space-generated data. As this trend continues, transmitting large volumes of data to Earth for processing may become increasingly costly and ch...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19892v1
Passive Construction Site Safety Monitoring via Persona-Scaffolded Adversarial Chain-of-Thought VLM Verification
Construction remains the deadliest industry sector in the United States, with 1,055 fatal worker injuries recorded in 2023, and the majority preventable. Existing monitoring approaches are expensive, require real-time human operators, or address only a narrow subset of violations. This paper present...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19869v1
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload
Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (MoE) architectures, t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20179v1