AI News Archive: May 12, 2026 — Part 17

Sourced from 500+ daily AI sources, scored by relevance.

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal executi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12481v1
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly produced keys and values...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12471v1
Solve the Loop: Attractor Models for Language and Reasoning
Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurre...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12466v1
Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance
To address the issues of high interruption time and measurement report overhead under user equipment (UE) mobility especially in high speed 5G use cases the use of AI/ML techniques (AI/ML beam management and mobility procedures) have been proposed. These techniques rely heavily on data that are most...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12453v1
Classifier Context Rot: Monitor Performance Degrades with Context Length
Monitoring coding agents for dangerous behavior using language models requires classifying transcripts that often exceed 500K tokens, but prior agent monitoring benchmarks rarely contain transcripts longer than 100K tokens. We show that when used as classifiers, current frontier models fail to notic...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12366v1
$δ$-mem: Efficient Online Memory for Large Language Models
Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $δ$-mem, a lightweight memory mechanism that augments a ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12357v1
A New Technique for AI Explainability using Feature Association Map
Lack of transparency in AI systems poses challenges in critical real-life applications. It is important to be able to explain the decisions of an AI system to ensure trust on the system. Explainable AI (XAI) algorithms play a vital role in achieving this objective. In this paper, we are proposing a ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12350v1
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
Catastrophic forgetting remains the central obstacle in continual learning (CL): parameters shared across tasks interfere with one another, and existing regularization methods such as EWC and SI apply uniform penalties without awareness of which input region a parameter serves. We propose KAN-CL, a ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12306v1
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching
Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions. Existing token-level extensions typically decompose a sequence-level Bra...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12288v1
Why Conclusions Diverge from the Same Observations: Formalizing World-Model Non-Identifiability via an Inference
When people share the same documents and observations yet reach different conclusions, the disagreement often shifts into a judgment that the other party is cognitively defective, irrational, or acting in bad faith. This paper argues that such divergence is better described as a form of non-identifi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12255v1
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular valu...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12492v1
Task-Adaptive Embedding Refinement via Test-time LLM Guidance
We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of d...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12487v1
MEME: Multi-entity & Evolving Memory Evaluation
LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12477v1
Approximation Theory of Laplacian-Based Neural Operators for Reaction-Diffusion System
Neural operators provide a framework for learning solution operators of partial differential equations (PDEs), enabling efficient surrogate modeling for complex systems. While universal approximation results are now well understood, approximation analysis specific to nonlinear reaction-diffusion sys...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12025v1
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs
The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12460v1
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizat...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12456v1
ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models
Large language models (LLMs) often produce answers with high certainty even when they are incorrect, making reliable confidence estimation essential for deployment in real-world scenarios. Verbalized confidence, where models explicitly state their confidence in natural language, provides a flexible ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12446v1
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging
Despite the rapid advancements in large language model (LLM) development, fine-tuning them for specific tasks often results in the catastrophic forgetting of their general, language-based reasoning abilities. This work investigates and addresses this challenge in the context of the Generative Retrie...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12419v1
Aligning Flow Map Policies with Optimal Q-Guidance
Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulatin...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12416v1
Model-based Bootstrap of Controlled Markov Chains
We propose and analyze a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) with possibly nonstationary or history-dependent control policies, a setting that arises naturally in offline reinforcement learning (RL) when the behavior policy generating the data is un...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12410v1
Trajectory-Agnostic Asteroid Detection in TESS with Deep Learning
We present a novel method for extracting moving objects from TESS data using machine learning. Our approach uses two stacked 3D U-Nets with skip connections, which we call a W-Net, to filter background and identify pixels containing moving objects in TESS image time-series data. By augmenting the tr...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12391v1
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12388v1
A Semi-Supervised Framework for Speech Confidence Detection using Whisper
Automatic detection of speaker confidence is critical for adaptive computing but remains constrained by limited labelled data and the subjectivity of paralinguistic annotations. This paper proposes a semi-supervised hybrid framework that fuses deep semantic embeddings from the Whisper encoder with a...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12387v1
MetaColloc: Optimization-Free PDE Solving via Meta-Learned Basis Functions
Solving partial differential equations (PDEs) with machine learning typically requires training a new neural network for every new equation. This optimization is slow. We introduce MetaColloc. It is an optimization-free and data-free framework that removes this bottleneck completely. We decouple bas...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12368v1
From Message-Passing to Linearized Graph Sequence Models
Message-passing based approaches form the default backbone of most learning architectures on graph-structured data. However, the rapid progress of modern deep learning architectures in other domains, particularly sequence modeling, raises the question of how graph learning can benefit from these adv...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12358v1
In-context learning to predict critical transitions in dynamical systems
Critical transitions - abrupt, often irreversible changes in system dynamics - arise across human and natural systems, often with catastrophic consequences. Real-world observations of such shifts remain scarce, preventing the development of reliable early warning systems. Conventional statistical an...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12308v1
Elastic Attention Cores for Scalable Vision Transformers
Vision Transformers (ViTs) achieve strong data-driven scaling by leveraging all-to-all self-attention. However, this flexibility incurs a computational cost that scales quadratically with image resolution, limiting ViTs in high-resolution domains. Underlying this approach is the assumption that pair...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12491v1
Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts
Sparse Mixture-of-Experts (SMoE) models enable scaling language models efficiently, but training them remains challenging, as routing can collapse onto few experts and auxiliary load-balancing losses can reduce specialization. Motivated by these hurdles, we study how routing decisions in SMoEs are f...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12476v1
Search Your Block Floating Point Scales!
Quantization has emerged as a standard technique for accelerating inference for generative models by enabling faster low-precision computations and reduced memory transfers. Recently, GPU accelerators have added first-class support for microscaling Block Floating Point (BFP) formats. Standard BFP al...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12464v1
Environment-Adaptive Preference Optimization for Wildfire Prediction
Predicting rare extreme events such as wildfires from meteorological data requires models that remain reliable under evolving environmental conditions. This problem is inherently long-tailed: wildfire events are rare but high-impact, while most observations correspond to non-fire conditions, causing...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12435v1
Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries
Agentic AI governance is a critical component of agentic AI infrastructure ensuring that agents follow their owner's communication and interaction policies, and providing protection against attacks from malicious agents. The state-of-the-art solution, SAGA, assumes a logically centralized point of t...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12364v1
Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale
Most learned PDE solvers follow a global-surrogate paradigm: a neural operator is trained to map full problem descriptions to full solution fields for a prescribed distribution of geometries, boundary conditions, and coefficients. This has enabled fast inference within fixed problem families, but li...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12343v1
Grid Games: The Power of Multiple Grids for Quantizing Large Language Models
A major recent advance in quantization is given by microscaled 4-bit formats such as NVFP4 and MXFP4, quantizing values into small groups sharing a scale, assuming a fixed floating-point grid. In this paper, we study the following natural extension: assume that, for each group of values, we are free...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12327v1
Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds
We study the fundamental and timely problem of learning long sequences in autoregressive modeling and next-token prediction under model misspecification, measured by the joint Kullback--Leibler (KL) divergence. Our goal is to characterize how the sequence horizon $H$ affects both approximation and...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12316v1
Random-Set Graph Neural Networks
Uncertainty quantification has become an important factor in understanding the data representations produced by Graph Neural Networks (GNNs). Despite their predictive capabilities being ever useful across industrial workspaces, the inherent uncertainty induced by the nature of the data is a huge mit...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11987v1
LOFT: Low-Rank Orthogonal Fine-Tuning via Task-Aware Support Selection
Orthogonal parameter-efficient fine-tuning (PEFT) adapts pretrained weights through structure-preserving multiplicative transformations, but existing methods often conflate two distinct design choices: the subspace in which adaptation occurs and the transformation applied within that subspace. This ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11872v1
A Composite Activation Function for Learning Stable Binary Representations
Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neu...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11558v1
TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: crit...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11473v1
Causal Bias Detection in Generative Artifical Intelligence
Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reason...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11365v1
Causal Fairness for Survival Analysis
In the data-driven era, large-scale datasets are routinely collected and analyzed using machine learning (ML) and artificial intelligence (AI) to inform decisions in high-stakes domains such as healthcare, employment, and criminal justice, raising concerns about the fairness behavior of these system...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11362v1
A proximal gradient algorithm for composite log-concave sampling
We propose an algorithm to sample from composite log-concave distributions over $\mathbb{R}^d$, i.e., densities of the form $π\propto e^{-f-g}$, assuming access to gradient evaluations of $f$ and a restricted Gaussian oracle (RGO) for $g$. The latter requirement means that we can easily sample from ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12461v1
Online Learning-to-Defer with Varying Experts
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the f...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12340v1
Optimal Policy Learning under Budget and Coverage Constraints
We study optimal policy learning under combined budget and minimum coverage constraints. We show that the problem admits a knapsack-type structure and that the optimal policy can be characterized by an affine threshold rule involving both budget and coverage shadow prices. We establish that the line...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12235v1
Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification
Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on appro...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12208v1
Information-Theoretic Generalization Bounds for Sequential Decision Making
Information-theoretic generalization bounds based on the supersample construction are a central tool for algorithm-dependent generalization analysis in the batch i.i.d.~setting. However, existing supersample conditional mutual information (CMI) bounds do not directly apply to sequential decision-mak...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12190v1
Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions
For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods, but most SBI methods assume a black-box data generating pro...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12118v1
QDSB: Quantized Diffusion Schrödinger Bridges
Learning generative models in settings where the source and target distributions are only specified through unpaired samples is gaining in importance. Here, one frequently-used model are Schrödinger bridges (SB), which represent the most likely evolution between both endpoint distributions. To accel...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11983v1
Variance-aware Reward Modeling with Anchor Guidance
Standard Bradley--Terry (BT) reward models are limited when human preferences are pluralistic. Although soft preference labels preserve disagreement information, BT can only express it by shrinking reward margins. Gaussian reward models provide an alternative by jointly predicting a reward mean and ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11865v1
Minimax Rates and Spectral Distillation for Tree Ensembles
Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive mi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11841v1
One-Step Generative Modeling via Wasserstein Gradient Flows
Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distributi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11755v1