AI News Archive: May 12, 2026 — Part 24

Sourced from 500+ daily AI sources, scored by relevance.

Minora AI
Agentic AI for performance marketing and SEO/GEO/AEO
🧰 ToolsMay 12, 2026https://www.producthunt.com/products/minora-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Rezivo AI
Never miss a customer call again — 24/7 AI receptionist
🧰 ToolsMay 12, 2026https://www.producthunt.com/products/rezivo-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning
TD($λ$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic programming, which effectively addresses the inherent bias-variance tr...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11880v1
Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity
Uncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi-UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi-UAV system operating in an integrated terrestrial and non-terrestrial ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11509v1
Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies
Practitioners deploying multi-agent large language model (LLM) systems must currently choose between communication topologies such as chain, star, mesh, and richer variants without any pre-inference diagnostic for which topology will amplify drift, converge to consensus, or remain robust under pertu...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11453v1
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems
Many AI systems are organized around loops in which models reason, call tools, observe results, and continue until a task is complete. These systems often produce final artifacts such as memos, plans, recommendations, and analyses, while the intermediate work that shaped those outputs remains epheme...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12087v1
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11732v1
A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar
The rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11720v1
Shaping Zero-Shot Coordination via State Blocking
Zero-shot coordination (ZSC) aims to enable agents to cooperate with independently trained partners without prior interaction, a key requirement for real-world multi-agent systems and human-AI collaboration. Existing approaches have largely emphasized increasing partner diversity during training, ye...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11688v1
GeomHerd: A Forward-looking Herding Quantification via Ricci Flow Geometry on Agent Interactive Simulations
Herding -- where agents align their behaviors and act collectively -- is a central driver of market fragility and systemic risk. Existing approaches to quantify herding rely on price-correlation statistics, which inherently lag because they only detect coordination after it has already moved realise...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11645v1
Distance-Constrained Unlabeled Multi-Agent Pathfinding
We study a graph pathfinding problem Distance-$r$ Independent Unlabeled Multi-Agent Pathfinding, finding a set of collision-free paths between two sets where agents must stay at pairwise distance at least $r+1$ at all times. This additional constraint, generalizing collision modeling for classical M...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11503v1
Digital Identity for Agentic Systems: Toward a Portable Authorization Standard for Autonomous Agents
Enterprise AI is shifting from copilots to autonomous agents capable of executing workflows, negotiating outcomes, and making decisions with limited human oversight. As these systems extend across organizational boundaries, identity alone is insufficient: an agent's authority must also be explicit, ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11487v1
From Model Uncertainty to Human Attention: Localization-Aware Visual Cues for Scalable Annotation Review
High-quality labeled data is essential for training robust machine learning models, yet obtaining annotations at scale remains expensive. AI-assisted annotation has therefore become standard in large-scale labeling workflows. However, in tasks where model predictions carry two independent components...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12303v1
MindMirror: A Local-First Multimodal State-Aware Support System for Digital Workers
Digital workers often experience fatigue, anxiety, reduced attention, and task blockage during prolonged computer-based work. Existing productivity tools mainly focus on task completion, while general-purpose AI chatbots require users to formulate clear prompts before receiving useful help. This pap...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11700v1
A Generative AI Driven Interactive Narrative Serious Fame for Stress Relief and Its Randomized Controlled Pilot Study
Background: Stress has become a widespread phenomenon, and serious games are increasingly recognized as engaging tools for stress relief. However, despite the rapid advancement of Generative Artificial Intelligence (Gen-AI), its integration into stress-relief serious games remains insufficiently exp...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11562v1
UNIPO: Unified Interactive Visual Explanation for RL Fine-Tuning Policy Optimization
Reinforcement learning has emerged as a dominant technique for fine-tuning the behavior of large language models, with policy optimization (PO) algorithms such as GRPO, DAPO, and Dr. GRPO emerging in rapid succession to advance state-of-the-art reasoning and alignment performance. However, the modul...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11549v1
Reimagining Assessment in the Age of Generative AI: Lessons from Open-Book Exams with ChatGPT
Generative AI systems such as ChatGPT challenge traditional assumptions about academic assessment by enabling students to generate explanations, code, and solutions in real time. Rather than attempting to restrict AI use, this study investigates how students actually interact with such systems durin...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12363v1
Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive
As AI tools become embedded in productivity and self-improvement contexts, a pressing question emerges: what happens when AI does the goal-setting for us? While large language models can generate goals that are objectively well-formed, the motivational consequences of delegating this cognitively and...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12344v1
COSMIC 1001: Engaging Future Speculation on Space Exploration with Generative AI
Cosmic 1001 is an interactive installation that transforms space exploration history into a speculative news experience. Participants first browse a news-based archive of major space events, then pose future-oriented questions or specify conditions such as year, celestial body, or mission name. In r...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11827v1
Psychological Benefits and Costs of Diversifying Algorithmic Recourse
Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11793v1
The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested
Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencode...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11496v1
Hedwig: Dynamic Autonomy for Coding Agents Under Local Oversight
Despite coding agents' advances in handling increasingly complex tasks, their continued tendency to introduce unintended edits, subtle bugs, and scope drift that slip past code review means developers must still decide how much autonomy to grant them. However, existing approaches for setting an agen...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11495v1
Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems
Agent skills extend LLM agents with reusable instructions, tool interfaces, and executable code, and users increasingly install third-party skills from marketplaces, repositories, and community channels. Because a skill exposes both executable behavior and context-setting documentation, its deployme...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11891v1
Behavioral Integrity Verification for AI Agent Skills
Agent skills extend LLM agents with privileged third-party capabilities such as filesystem access, credentials, network calls, and shell execution. Existing safety work catches malicious prompts and risky runtime actions, but the skill artifact itself goes unverified. We formalize this as the behavi...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11770v1
Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation
Automated red-teaming for LLMs often discovers narrow attack slices, missing diverse real-world threats, and yielding insufficient data for safety fine-tuning. We introduce Persona-Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on diverse attacker personas (e.g., docto...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11730v1
Cochise: A Reference Harness for Autonomous Penetration Testing
Recent work on LLM-driven autonomous penetration testing reports promising results, but existing systems often combine many architectural, prompting, and tool-integration choices, making it difficult to tell what is gained over a simple agent scaffold. We present cochise, a 597 LOC Python reference ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11671v1
Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark
With LLM watermarking already being deployed commercially, practical applications increasingly require multibit watermarks that encode more complex payloads, such as user IDs or timestamps, into the generated text. In this work, we propose a fundamentally new approach for multibit watermarking: intr...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11653v1
Convolutional-Neural-Networks for Deanonymisation of I2P Traffic
This study investigates the potential for deanonymizing services within the Invisible Internet Project (I2P) network through passive traffic analysis and machine learning techniques. The primary objective is to identify distinctive patterns in I2P traffic despite the encryption of its payload. To ac...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11606v1
SoK: Unlearnability and Unlearning for Model Dememorization
Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce le...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11592v1
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner--executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow format...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11514v1
CTFusion: A CTF-based Benchmark for LLM Agent Evaluation
Recent advances in Large Language Models (LLMs) have enabled agentic systems for complex, multi-step tasks; cybersecurity is emerging as a prominent application. To evaluate such agents, researchers widely adopt Capture The Flag (CTF) benchmarks. However, current CTF benchmarks reuse existing challe...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11504v1
Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection
Large Language Model (LLM) agents have emerged as key intermediaries, orchestrating complex interactions between human users and a wide range of digital services and LLM infrastructures. While prior research has extensively examined the security of LLMs and agents in isolation, the systemic risk of ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11442v1
More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting
Deep learning-based website fingerprinting has emerged as an effective technique for inferring the websites users visit. Although existing methods achieve strong performance on closed-world datasets, they often fail to generalize to real-world environments, especially under geographic and temporal s...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11402v1
Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization
As Model Context Protocol adoption grows, securing tool invocations via meaningful user consent has become a critical challenge, as existing methods, broad always allow toggles or opaque LLM-based decisions, fail to account for dangerous call arguments and often lead to consent fatigue. In this work...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11360v1
IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection
Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-inj...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11868v1
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
Large Reasoning Models (LRMs) improve performance on complex tasks, but they also make safety control harder at deployment time. In black-box settings, defenders cannot modify model weights and must instead intervene at inference time. This setting creates three practical challenges: harmful intent ...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11664v1
FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models
Diffusion models are the leading approach for tabular data synthesis and are increasingly used to share sensitive records. Whether they actually protect privacy has become a pressing question. Membership inference attacks are the standard tool for this purpose, yet existing attacks assume a single-t...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11527v1
Decaf: Improving Neural Decompilation with Automatic Feedback and Search
Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are generally lost as the compiler translates human-readable code to low...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11501v1
Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in whic...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11418v1
The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
Over the past two decades, the task of musical beat tracking has transitioned from heuristic onset detection algorithms to highly capable deep neural networks (DNN). Although DNN-based beat tracking models achieve near-perfect performance on mainstream, percussive datasets, the SMC dataset has stubb...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12287v1
Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate (WER). However, WER scores depend heavily on the choice of AS...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12107v1
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
While speech Large Language Models (LLMs) excel at conventional tasks like basic speech recognition, they lack fine-grained, multi-dimensional perception. This deficiency is evident in their struggle to disentangle complex features like micro-acoustic cues, acoustic scenes, and paralinguistic signal...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12036v1
Chunkwise Aligners for Streaming Speech Recognition
We propose the Chunkwise Aligner, a novel architecture for streaming automatic speech recognition (ASR). While the Transducer is the standard model for streaming ASR, its training is costly due to the need to compute all possible audio-label alignments. The recently introduced Aligner reduces this c...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.11422v1
STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
We present STRUM (Spectral Transcription and Rhythm Understanding Model), an audio-to-chart pipeline that converts raw recordings into playable Clone Hero / YARG charts for drums, guitar, bass, vocals, and keys without any oracle metadata. STRUM is a multi-stage hybrid: a two-stage CRNN onset detect...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12135v1
Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
Estimating question difficulty is a critical component in evaluating and improving large language models (LLMs) for question answering (QA). Existing approaches often rely on readability formulas, retrieval-based signals, or popularity statistics, which may not fully capture the reasoning challenges...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12398v1
Context Convergence Improves Answering Inferential Questions
While Large Language Models (LLMs) are widely used in open-domain Question Answering (QA), their ability to handle inferential questions-where answers must be derived rather than directly retrieved-remains still underexplored. This study investigates how the structure and quality of passages influen...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12370v1
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
Multi-hop question answering (QA) remains a significant challenge in the biomedical domain, requiring systems to integrate information across multiple sources to answer complex questions. To address this problem, the BioCreative IX MedHopQA shared task was designed to benchmark in multi-hop reasonin...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12313v1
BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework
Autoscaling has become a baseline expectation for cloud-native big data processing, and the design space has expanded beyond rule-based heuristics to include learned controllers and, most recently, large language model (LLM) agents. Yet despite a growing body of work spanning these paradigms, the co...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12272v1
Unlocking Crowdsourcing for Ontology Matching Validation
Recent advances in large language models (LLMs) pose new challenges for ontology matching (OM). While OM systems built on LLMs have shown remarkable capabilities in discovering more mappings, traditional OM validation that relies on domain experts has become overwhelming. In this study, we explore t...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12226v1
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
Generating realistic and user-preferred advertisements is a key challenge in e-commerce. Existing approaches utilize multiple independent models driven by click-through-rate (CTR) to controllably create attractive image or text advertisements. However, their pipelines lack cross-modal perception and...
📄 ResearchMay 12, 2026http://arxiv.org/abs/2605.12138v1