AI News Archive: May 27, 2026 — Part 13

Sourced from 500+ daily AI sources, scored by relevance.

BatchlyAI
Batch-generate AI images from multi-variable prompts.
🧰 ToolsMay 27, 2026https://theresanaiforthat.com/ai/batchlyai/
SharePost AI
Automate Your Social Media. Grow Your Business.
🧰 ToolsMay 27, 2026https://theresanaiforthat.com/ai/sharepost-ai/
FN2 AI
AI agents that watch your portfolio and deliver briefings.
🧰 ToolsMay 27, 2026https://theresanaiforthat.com/ai/fn2-ai/
The Influencer AI
Create your own consistent AI influencer in minutes.
🧰 ToolsMay 27, 2026https://theresanaiforthat.com/ai/the-influencer-ai/
Llamaroo
Turn any topic into gamified lessons kids want.
🧰 ToolsMay 27, 2026https://theresanaiforthat.com/ai/llamaroo/
Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers
In-context learning (ICL) enables multimodal large language models (MLLMs) to classify images from a few labelled examples. Yet, how these models use the provided context remains opaque. While Chain-of-Thought prompting is widely used, recent work argues that it may not reflect true internal computa...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28215v1
LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning
Graph-based Retrieval-Augmented Generation (GraphRAG) advances flat document retrieval by structuring knowledge as relational graphs, enabling more coherent and effective reasoning. However, applying it to specific domains like legal reasoning faces critical challenges. (i) Legal corpora are heterog...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28120v1
Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems
Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection....
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28214v1
Long Live the Librarian! A Persistent Search Sub-Agent for Energy-Efficient Multi-Agent Software Engineering Systems
Multi-agent systems (MAS) have substantially advanced autonomous software engineering (SWE), but their growing inference energy demands raise sustainability concerns. In this paper, we demonstrate that this cost is concentrated in an overlooked source: redundant output tokens generated across agents...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27787v1
Not All Uncertainty Is Equal: How Uncertainty Granularity Shapes Human Verification in LLM-Assisted Decision Making
Despite warnings that LLMs can make mistakes, users often develop inappropriate trust and accept incorrect answers without critical evaluation. Uncertainty quantification (UQ), displaying LLMs' confidence, has emerged as a promising approach to calibrate user trust. However, prior empirical studies ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28571v1
The Decision to Verify: How Warmth and User Characteristics Shape Reliance on Conversational Agents for Information Search
Conversational artificial intelligence (AI) provides an efficient and convenient gateway to information access. However, it can cause overreliance when users blindly trust AI and accept its answers without fact-checking. Information search increasingly follows a hybrid interaction paradigm that comb...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28498v1
AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?
AI systems are fallible, and humans can make mistakes in deciding whether to trust AI over their own judgment. Thus, improving human-AI collaboration requires understanding when, why, and how humans decide to rely on AI. We study two distinct reliance decisions: the delegation choice -- deciding whe...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28255v1
Building Community-Centred NLP Resources for Puno Quechua
The preservation of under-resourced languages requires digital tools and resources shaped by and for their speakers. We present the first dedicated ASR resources for Puno Quechua (ISO 639-3: qxp): (1) the largest speech corpus for any single Quechua variety, consisting in 66 hours of recordings for ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28253v1
Why Meditation Wearables Fail: Reward Misspecification in Closed-Loop EEG and Biofeedback Systems
Consumer EEG headbands, HRV biofeedback devices, and closed-loop neurostimulation systems share a fundamental design flaw: they reward measurable proxy signals rather than the outcomes they claim to produce. When a user optimises for calm EEG, HRV coherence, or breathing resonance, their brain learn...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28223v1
SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping
Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28219v1
The Illusion of Opting in AI-Mediated Consequential Decisions
Drawing on Ullmann-Margalit's concept of opting (transformative, irrevocable, and shadowed by foreclosed alternatives), we show that current AI systems raise a profound ethical problem that existing AI ethics has not fully captured: the illusion of opting, in which persons and groups encounter the d...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28210v1
I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors
Automatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually encounter synthetic speech remains poorly understood. We investigate voice deepfake detection as a perceptual and contextual process, presenting a localization task...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28064v1
Learning to Assign Prediction Tasks to Agents with Capacity Constraints
We address the problem of learning to assign prediction tasks to one agent from a set of available human or AI agents. In particular, we focus on the sequential learning of agent expertise and assignment policies where each agent is constrained to handle a fraction of tasks. We provide a general the...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27999v1
EyeSpy: Inferring Eye Gaze via Side-Channel Attacks Against Foveated Rendering
While eye tracking provides valuable capabilities for virtual reality, such as gaze interaction and dynamic foveated rendering (DFR), eye-tracking data can inadvertently reveal sensitive user information if not properly protected. Current protections, such as adding permission prompts or gatekeeping...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27939v1
Show, Don't TELL: Explainable AI-Generated Text Detection
Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who a...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27921v1
Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations
In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that safe...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28553v1
Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?
What is the best harness for cybersecurity AI? Cybersecurity systems are converging on a single execution scaffold per agent, an iterative shell loop driven by a Large Language Model (LLM). However, scaffolds are not interchangeable, rarely interoperable, and no single scaffold dominates across all ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28334v1
Cybersecurity AI (CAI) Dataset
We present CAI Dataset, a fourteen-month corpus of cybersecurity LLM trajectories collected through the open-source CAI agent framework, built in response to PentestGPT's finding that expert operator trajectories, not base-model capability, are the bottleneck for cybersecurity LLM performance. CAI D...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28146v1
A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG
Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result, routing must rely on client-provided semantic profiles, creating a new opportunity for manipulation. We introduce Routing Hijacking, a routing-stage attack i...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28112v1
SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning
Retrieval-Augmented Generation (RAG) mitigates LLM hallucinations but introduces a critical vulnerability: corpus integrity. We present SilentRetrieval, a two-stage data poisoning attack that hijacks RAG systems through adversarially crafted yet fluent documents. Stage 1 uses Coordinated Beam Search...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28074v1
AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent
LLM-based agents have recently attracted significant attention due to their ability to autonomously invoke relevant tools to accomplish complex tasks. However, recent studies have shown that these agents face severe security risks, which may lead to privacy leakage, financial loss, or even full syst...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28071v1
SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection
Fine-tuning large language models often undermines their safety alignment, a problem further amplified by harmful fine-tuning attacks in which adversarial data removes safeguards and induces unsafe behaviors. We propose SPARD, a defense framework that integrates Safety-Projected Alternating optimiza...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28030v1
Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings
Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's recommendation list, with the strongest attacks reporting around $80\%$ success and raising serious security concerns about RAG-based recommendation. However,...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28017v1
When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?
Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly understood. Existing systems already span multiple process designs, including direct response generation, text-only prior turn, visual-state manipulation, an...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27932v1
MRMMIA: Membership Inference Attacks on Memory in Chat Agents
Membership inference attacks (MIAs) test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent m...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27825v1
Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security
Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms, resulting in harmful or inappropriate outputs. Such attacks, including jailbreaking and prompt injection, pose significant risks to the integrity and availab...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27823v1
Density-aware Sample-specific Attack
Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive principled criteria characterizing optimal sample-specific trigger construc...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27809v1
Revisiting ML Training under Fully Homomorphic Encryption: Convergence Guarantees, Differential Privacy, and Efficient Algorithms
We present the first theoretical convergence analysis of machine learning training under fully homomorphic encryption (FHE), combined with a differentially private (DP) training algorithm tailored to encrypted computation. Our approach improves computational efficiency over standard differentially p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27782v1
Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem
We analyzed 3,984 AI agent skills from major marketplaces and found 76 confirmed malicious payloads, including credential theft, backdoor installation, and data exfiltration. 13.4% of all skills contain at least one critical-level security issue and at least 8 manually confirmed malicious skills rem...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28588v1
SNARE: Adaptive Scenario Synthesis for Eliciting Overeager Behavior in Coding Agents
A coding agent executes a benign task as a sequence of shell, file, and network actions, any of which can quietly exceed the authorized scope while the task still completes. We call this overeager behavior: the prompt is not adversarial and the run succeeds, yet an out-of-scope step can leak credent...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28122v1
MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content
Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from user-generated content. We present MIRAGE (Mobile Injection of Realistic...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28116v1
An Empirical Audit of k-NAF Budget Accounting for Anchored Decoding
We empirically audit the k-NAF budget-accounting mechanism in Anchored Decoding using (i) a fixed, class-stratified workload (approximately 8,500 randomized executions across six prompt classes) and (ii) an adaptive prompt-search procedure targeting high proxy spend ratios. On the fixed workload, me...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28001v1
Patchlings: Safety-Preserving Flash-Based Hotpatching for Automotive Microcontrollers
The increasing presence of software in modern automobiles has created a growing need to deliver software updates throughout a vehicle's entire lifespan. Traditional update methods are slow and require months of re-validation to comply with stringent safety standards like ISO 26262. Although hotpatch...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27804v1
Do LLMs Favor Their Providers? Measuring Vertical Integration Bias in Code Generation
Large Language Models (LLMs) have become an integral part of software development, especially with the advent of agentic capabilities. Yet, many frontier LLMs are affiliated with specific providers. This raises the question of whether generated code favors the provider's own ecosystem over comparabl...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28515v1
Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets
Large language models (LLMs) for code completion and generation are increasingly used in software development, yet they may reproduce training examples verbatim and without authorship attribution, raising legal and ethical concerns around plagiarism and license compliance. Classical fingerprint-base...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28510v1
From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence
Industrial Prognostics and Health Management (PHM) provides a representative case study for a broader challenge in applied machine learning: translating published papers into executable, benchmark-ready implementations. Reproducing under-specified methods in PHM is particularly difficult due to rest...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28371v1
Multi-Agent LLM-based Metamorphic Testing for REST APIs
As REST APIs become an increasingly significant part of software systems, their validation is becoming more critical. Hence, testing and uncovering underlying issues are of utmost importance for improving software quality. However, testing REST APIs is challenging mainly due to the difficulty of ass...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28321v1
DeltaMCP: Incremental Regeneration via Spec-Aware Transformation for MCP servers
The rapid development of LLMs coupled with the introduction of Model Context Protocol (MCP) has revolutionized how intelligent agents interact with APIs through deterministic and structured methods \cite{ModelContextProtocolIntro2025}. While some existing systems like AutoMCP attempt to automate a p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28148v1
Beyond pass@k: Redundancy-Aware RLVR for Multi-Sample Code Generation
LLMs for code generation are commonly evaluated in repeated-sampling settings using Pass@k, where multiple candidate programs are executed against unit tests under a finite sampling budget. While recent verifier-based reinforcement learning (RLVR) methods improve executable correctness, how these ob...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28022v1
Tool Forge: A Validation-Carrying Toolchain for Governed Agentic Execution
Large language model agents are increasingly expected to perform operational work: calling APIs, manipulating files, assembling workflows, and acting inside enterprise systems. Yet the tool layer on which this execution depends is still commonly treated as either a hand-written integration artifact ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28000v1
Confident Learning-based Network for Detecting Bug-Inducing Commits on SZZ with Noisy Labels
The Just-In-Time (JIT) defect prediction model serves as a critical tool for ensuring the quality of software development and enhancing software performance. It assists development teams in promptly identifying and addressing potential issues by predicting whether code submissions may introduce defe...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27880v1
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios
Recent advances in speech generation have enabled high-fidelity synthesis, yet systematic evaluation of models under long-context conditions remains largely underexplored. A comprehensive evaluation benchmark for long-form speech is indispensable for two reasons: 1) existing test scenarios are often...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28618v1
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Audio agents extend large audio-language models (LALMs) by decomposing audio questions into tool calls, intermediate evidence, and iterative reasoning steps. However, as LALMs become stronger, the key challenge shifts from enabling tool use to determining when agentic evidence acquisition genuinely ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28480v1
LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation
Audio tokenizers are fundamental to unifying audio understanding and generation. Understanding requires high-level semantics, while generation demands semantic and acoustic details. Existing unified tokenizers jointly encode both in high-dimensional continuous latents, which increases the modeling b...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.27840v1
Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA
In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28641v1