AI News Archive: May 7, 2026 — Part 18

Sourced from 500+ daily AI sources, scored by relevance.

Operator-Guided Invariance Learning for Continuous Reinforcement Learning
Reinforcement learning (RL) with continuous time and state/action spaces is often data-intensive and brittle under nuisance variability and shift, motivating methods that exploit value-preserving structures to stabilize and improve learning. Most existing approaches focus on special cases, such as p...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06500v1
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for ins...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06490v1
3D MRI Image Pretraining via Controllable 2D Slice Navigation Task
Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We ask whether there exists an intrinsic form of self-supervis...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06487v1
Litespark Inference on Consumer CPUs: Custom SIMD Kernels for Ternary Neural Networks
Large language models (LLMs) have transformed artificial intelligence, but their computational requirements remain prohibitive for most users. Standard inference demands expensive datacenter GPUs or cloud API access, leaving over one billion personal computers underutilized for AI workloads. Ternary...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06485v1
Patch-Effect Graph Kernels for LLM Interpretability
Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare syst...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06480v1
Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems
LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (ASR), a trajectory-fidelity metric that ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06457v1
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06455v1
ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation
For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new s...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06667v1
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, re...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06665v1
BAMI: Training-Free Bias Mitigation in GUI Grounding
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06664v1
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06651v1
Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcom...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06647v1
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based methods outperform vanill...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06615v1
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06601v1
Ex Ante Evaluation of AI-Induced Idea Diversity Collapse
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level c...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06540v1
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06535v1
Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State
Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-reference revenue per available room (RevPAR) w...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06529v1
From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features
Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight vectors, leaving the higher-order co-occ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06494v1
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their r...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06483v1
Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural n...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06475v1
Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching
We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-dow...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06474v1
EMO: Pretraining Mixture of Experts for Emergent Modularity
Large language models are typically deployed as monolithic systems, requiring the full model even when applications need only a narrow subset of capabilities, e.g., code, math, or domain-specific knowledge. Mixture-of-Experts (MoEs) seemingly offer a potential alternative by activating only a subset...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06663v1
Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients
Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06650v1
Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents
Large language models (LLMs) power deep research agents that synthesize information from hundreds of web sources into cited reports, yet these citations cannot be reliably verified. Current approaches either trust models to self-cite accurately, risking bias, or employ retrieval-augmented generation...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06635v1
Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance
As large language models (LLMs) increasingly mediate both content generation and moderation, linguistic evasion strategies known as Algospeak have intensified the coevolution between evaders and detectors. This research formalizes the underlying dynamics grounded in a joint action model: when Algosp...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06619v1
Automated Clinical Report Generation for Remote Cognitive Remediation: Comparing Knowledge-Engineered Templates and LLMs in Low-Resource Settings
The growing demand for cognitive remediation therapy, combined with limited speech therapist availability, has accelerated the adoption of remote rehabilitation tools. These systems generate large volumes of interaction data that are difficult for clinicians to review efficiently. This paper investi...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06594v1
Long Context Pre-Training with Lighthouse Attention
Training causal transformers at extreme sequence lengths is bottlenecked by the quadratic time and memory of scaled dot-product attention (SDPA). In this work, we propose Lighthouse Attention, a training-only symmetrical selection-based hierarchical attention algorithm that wraps around ordinary SDP...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06554v1
Efficient Pre-Training with Token Superposition
Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training (TST), a simple drop-in method that significantly improves the...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06546v1
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?
Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failur...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06527v1
Cubit: Token Mixer with Kernel Ridge Regression
Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing mechanism in Transformers remains attentio...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06501v1
Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts
In this work, we conduct an analysis to examine the consistency of Large Language Models (LLMs) with respect to their own generated responses in an emotionally-driven conversational context. Specifically, the text generated by LLM is framed as a query to the same model, and its responses are subsequ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06476v1
bazarr.ai
Search deals anytime with Baz
🧰 ToolsMay 7, 2026https://www.producthunt.com/products/bazarr-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
COVID-19 Infodemic. Understanding content features in detecting fake news using a machine learning approach
The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigr...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06435v1
From 124 Million Tokens to 1,021 Neologisms: A Large-Scale Pipeline for Automatic Neologism Detection
We present a scalable, modular pipeline for automatic neologism detection that combines rule-based filtering with LLM classification. The pipeline is grounded in two complementary word-formation frameworks, grammatical and extra-grammatical morphology, which jointly define the scope of what counts a...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06426v1
GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
Zero-shot single-cell cell-type annotation aims to determine a cell's type from a given set of expressed genes without any training. Existing knowledge-graph-based RAG approaches retrieve evidence by expanding from source entities and relying on iterative LLM reasoning. However, in this setting each...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06403v1
Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
Model cascades, in which a cheap LLM defers to an expensive one on low-confidence queries, are widely used to navigate the cost-quality tradeoff at deployment. Existing approaches largely treat the deferral threshold as an empirical hyperparameter, with limited guidance on the geometry of the result...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06350v1
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections
Activation steering controls LLM behaviour towards target behaviour by intervening in internal representations, yet it often degrades reasoning and retrieval performance. We argue that a primary cause of this trade-off is attention rerouting: steering vectors alter query-key matching, shifting atten...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06342v1
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
Tool-using large language model (LLM) agents are increasingly deployed in settings where their reliable behavior is governed by strict procedural manuals. Ensuring that such agents comply with the rules from these manuals is challenging, as they are typically written for humans in natural language w...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06334v1
Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity
Safety benchmarks are routinely treated as evidence about how a language model will behave once deployed, but this inference is fragile if behavior depends on whether a prompt looks like an evaluation. We define evaluation-context divergence as an observable within-item change in behavior induced by...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06327v1
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs
Large language models (LLMs) are increasingly deployed in teams, yet existing coordination approaches often occupy two extremes. Highly structured methods rely on fixed roles, pipelines, or task decompositions assigned a priori. In contrast, fully unstructured teams enable adaptability and explorati...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06320v1
Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation
Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggregated datasets, harmful language being mos...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06318v1
Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text
The ability to reliably distinguish human-written text from that generated by large language models is of profound societal importance. The dominant approach to this problem exploits the likelihood hypothesis: that machine-generated text should appear more probable to a detector language model than ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06294v1
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06285v1
Quantifying the Statistical Effect of Rubric Modifications on Human-Autorater Agreement
Autoraters, also referred to as LLM-as-judges, are increasingly used for evaluation and automated content moderation. However, there is limited statistical analysis of how modifications in a rubric presented to both humans and autoraters affect their score agreement. Rubrics that ask for an overall ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06283v1
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this work, we ask: if RL merely steers the m...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06241v1
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification
As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context lengths. To improve the inference efficiency of long-context processing, several novel low-complexity hybrid architectures have recently been propose...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06221v1
TIDE: Every Layer Knows the Token Beneath the Context
We revisit a universally accepted but under-examined design choice in every modern LLM: a token index is looked up once at the input embedding layer and then permanently discarded. This single-injection assumption induces two structural failures: (i) the Rare Token Problem, where a Zipf-type distrib...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06216v1
Above Security
AI agents for insider risk detection and response
🧰 ToolsMay 7, 2026https://www.producthunt.com/products/above-security?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation
We propose a simplified human-in-the-loop workflow for second language (L2) Korean morphosyntactic annotation by leveraging agreement between two domain-adapted parsers. We first evaluate whether parser agreement can serve as a proxy for annotation correctness by comparing it with independent human ...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06625v1
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstructi...
📄 ResearchMay 7, 2026http://arxiv.org/abs/2605.06582v1