AI News Archive: May 19, 2026 — Part 18

Sourced from 500+ daily AI sources, scored by relevance.

Adryxa AI Studio
AI-powered workflows and content automation
🧰 ToolsMay 19, 2026https://www.producthunt.com/products/adryxa-ai-studio?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions
Classical models of opinion dynamics assume human participants with bounded rationality and limited coordination. The rise of LLM-based agents introduces a qualitative shift: agents can now participate in online discussions at scale, maintain consistent persuasion strategies, and coordinate systemat...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19915v1
Memory-Augmented Reinforcement Learning Agent for CAD Generation
Automatic generation of computer-aided design (CAD) models is a core technology for enabling intelligence in advanced manufacturing. Existing generation methods based on large language models (LLMs) often fall short when handling complex CAD models characterized by long operation sequences, diverse ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19748v1
EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design
Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combine simulation, retrieval, and manufacturing preparation. We introduce a benchmark suite with three evaluation dimensions:...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19743v1
STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
Frontier AI models and multi-agent systems have led to significant improvements in mathematical reasoning. However, for problems requiring extended, long-horizon reasoning, existing systems continue to suffer from fundamental reliability issues: hallucination accumulation, memory fragmentation, and ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19338v1
AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iteration step. However, these screenshots exhibit highly non-uniform spatial information density: large regions may carry lit...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19260v1
CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring
Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction cha...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19240v1
PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies
Generative agents based on large language models reproduce believable human behavior in cooperative settings, but how they should reason in situations where rule-breaking may be required, such as fire evacuation or authority-supervised emergency, remains poorly characterized. We propose PAVE (Percep...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19351v1
AffectAI-Capture: A Reproducible Multimodal Protocol for Small-Group Meeting Research
We present AffectAI-Capture, a protocol for collecting synchronized multimodal data in four-person meeting-like interactions, combining eye tracking, wearable physiology, close-talk and room audio, multi-view video, event logging, and structured self-report. Sessions use fixed task blocks grounded i...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19794v1
TombWriter: Scaffolding Story Archeology through Beat-Level Interaction in Human-AI Co-Writing
The dominant paradigm for LLM interaction in AI co-writing uses disposable prompts that vanish after use. This may lead to imprecise results, cumbersome workflows, and diminished author agency and ownership. We propose LLM-based story archeology, where prompts serve as a hierarchical story instrumen...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19681v1
The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems
As large language models (LLMs) demonstrate increasing competence in synthesizing functional user interfaces, a fundamental question emerges in accessibility computing: \textit{how far can AI-driven accessibility systems go?} This paper introduces the \textit{Accessibility Capability Boundary} (ACB)...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19638v1
Toward User Comprehension Supports for LLM Agent Skill Specifications
Users often interpret and select agent skills through their \texttt{SKILL.md} specifications. To protect users, existing audits mainly focus on malicious or unsafe skills. We study the complementary question of whether specifications help users form bounded expectations about what a skill consumes, ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19362v1
From Role to Person: Trust Calibration Challenges in Twin Agents
Agentic AI has taken on the role of assistant, collaborator, and decision-support tool. We argue the next role on that list is more personal: you. These are digital twins of each individual -- twin agents -- representing their knowledge, perspective, and communicative style to colleagues when they a...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19838v1
Material for Thought: Generative AI as an Active Creative Medium
Human-AI collaboration research has largely positioned the human as a judge of AI output, centering effort on evaluating whether rec- ommendations are reliable enough to accept. This decision-support framing leaves little room for the human as creator. We argue that for creative work, this framing m...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19832v1
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
While GUI agents have made significant progress in web navigation and basic operating system tasks, their capabilities in professional creative workflows remain largely underexplored. To bridge this gap, we introduce Cutverse, a benchmark designed to systematically evaluate autonomous GUI agents in ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19484v1
Platform architecture determines whether recommendation algorithms can shape information quality on social media
Social media platforms shape public discourse through two fundamental design choices that naturally co-occur in any field investigation: platform architecture, which defines what types of actors exist and how they interact, and recommendation algorithm, which determines what content is surfaced to u...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19204v1
BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation
The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic analysis or voting, face a trade-off between high computational cost and limited robustness under strong poisoning attacks. Their fundamental limitation is th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20123v1
Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models
Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19698v1
Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures
Existing ViT backdoor attacks based on backbone-overwriting full-tuning are computationally expensive and inflict performance degradation. This has forced adversaries towards the Visual Parameter-Efficient Fine-Tuning (PEFT) paradigm, dominated by adapter-based (e.g., LoRA) and prompt-based (e.g., V...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19478v1
XAI FL-IDS: A Federated Learning and SHAP-Based Explainable Framework for Distributed Intrusion Detection Systems
An Intrusion Detection System (IDS) is vital in cybersecurity, detecting unauthorized activity across networks. With attacks on network layers increasing, stronger IDSs are needed. Yet most IDSs rely on centralized detection, forcing IoT nodes to ship data to a server, adding overhead and offering n...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19448v1
Exploring and Developing a Pre-Model Safeguard with Draft Models
Large Language Model (LLM) alignment remains vulnerable to jailbreak attacks that elicit unsafe responses, motivating pre-model and post-model guards. Pre-model guards audit the safety of prompts before invoking target models. However, relying solely on the prompt often leads to high false-negative ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19321v1
Backdooring Masked Diffusion Language Models
Masked diffusion language models (MDLMs) are emerging as a compelling new paradigm for text generation, but their training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs r...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19262v1
Detecting and Mitigating Backdoor Attacks in OTA-FL Systems: A Two-Stage Robust Aggregation Scheme
Over-the-air federated learning (OTA-FL) improves communication efficiency by exploiting the superposition property of wireless channels, but this same property also creates a critical security vulnerability: the parameter server (PS) cannot access individual local updates, making it difficult to id...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19253v1
Quantum Machine Learning for Cyber-Physical Anomaly Detection in Unmanned Aerial Vehicles: A Leakage-Free Evaluation with Proxy-Audited Feature Sets
Unmanned aerial vehicles (UAVs) are cyber-physical systems whose attack surface spans networked avionics and on-board sensor fusion: a compromised GPS or battery module can mimic a benign mission segment and evade naive anomaly detectors. We present a leakage-free evaluation of quantum machine learn...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19233v1
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models
Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared parameters and a multimodal vocabulary simplify the training pipeline and facilitate flexible multimodal generation, yet might introduce new vulnerabilit...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19227v1
Hunting Vulnerability Variants in AI Infra: Measurement and Reference-Driven Detection
AI infra has become a shared execution layer for model training, deployment, and agent orchestration. Because many projects reimplement similar model-centric workflows, a vulnerability disclosed in one repository can recur as a variant in another repository with a related design. Yet the prevalence ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20051v1
DASM: Domain-Aware Sharpness Minimization for Multi-Domain Voice Stream Steganalysis
The growing use of information hiding in network streaming media for covert communication poses a significant security threat, necessitating the development of robust detection technologies. However, existing steganalysis methods for network voice streams mostly rely on data distributions in specifi...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19955v1
Measuring Safety Alignment Effects in Autonomous Security Agents
Do stock safety-aligned language models and their uncensored or abliterated derivatives behave differently when run as autonomous security agents? Single-turn refusal benchmarks cannot answer this question: security agents must inspect repositories, call tools, and produce vulnerability evidence ins...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19722v1
SCARA: A Semantics-Constrained Autonomous Remediation Agent for Opaque Industrial Software Vulnerabilities
Critical-infrastructure operators are increasingly expected to assess and remediate vulnerabilities in deployed industrial software. However, much of this software exists as opaque industrial software (OIS), including stripped firmware, proprietary protocol handlers, and compiled control logic witho...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19668v1
Inferring Sensitive Attributes from Knowledge Graph Embeddings: Attack and Defense Strategies
Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment and reasoning. They help data owners organize and exploit heterogeneous data to provide insightful services (e.g., recommendations), yet real-world KGs a...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19644v1
Devilray: A Systematic Adversarial Model Revealing Blind Spots in Fake Base Station Detection
Fake Base Station (FBS) detection has been a critical focus of cellular security research for over two decades. However, significant financial and regulatory barriers to accessing commercial FBS (C-FBS) devices have limited direct visibility into real-world operations, forcing detection systems to b...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19232v1
Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?
Background: Large Language Models (LLMs) are increasingly used for code generation. However, their ability to generate multi-class projects that require object-oriented design (OOD) remains unclear, especially relative to projects developed with human involvement. Aims: The primary objective of this...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19901v1
Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization
LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment. However, as modern LLM agents are increasingly complex in t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19782v1
CriterAlign: Criterion-Centric Rationale Alignment for Code Preference Judging
Pairwise human preference prediction is central to evaluating code-generation systems, where quality often depends on task-specific trade-offs beyond functional correctness. While rubric-based LLM judges improve interpretability by decomposing evaluation into explicit criteria, most existing pipelin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19665v1
Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection
Tile-based programming frameworks are increasingly adopted to write high-performance GPU kernels in domains such as deep learning and scientific computing. While these frameworks enhance productivity and hardware utilization, their multi-stage compilation pipelines introduce distinct code generation...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19652v1
Provable Fairness Repair for Deep Neural Networks
Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often l...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19549v1
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instructio...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19330v1
Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study
As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the target codebase fixed. This leaves a critical question unanswered: does the structural and stylistic quality, or ``cleanliness'' of the underlying code affect an agent's abilit...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20049v1
OpenComputer: Verifiable Software Worlds for Computer-Use Agents
We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification l...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19769v1
When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions
Code language models are increasingly adopted for both understanding and generative tasks. Despite their success, these models frequently produce overconfident incorrect predictions and underconfident correct predictions, undermining their reliability in deployment. Practical deployment demands thre...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19369v1
On-the-Fly Input Adaptation for Reliable Code Intelligence
Code language models (CLMs) play a central role in software engineering across both generation and classification tasks. However, these models still exhibit notable mispredictions in real-world applications, even when trained on up-to-date data. Existing solutions address this by retraining the mode...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19365v1
MuMuTestUp: Mutation-based Multi-Agent Test Case Update
Modern software systems evolve rapidly under CI/CD practices, where tests are critical for quality. However, substantial code changes often render existing test cases obsolete, causing pipeline disruptions, reduced productivity, and compromised quality. Recent automatic test update approaches levera...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19265v1
When Web Apps Heal Themselves: A MAPE-K Based Approach to Fault Tolerance and Adaptive Recovery
Ensuring the reliability and resilience of modern web applications remains a critical challenge due to increasing system complexity and dynamic runtime environments. This study proposes a modular self-healing framework based on the monitor-analyze-plan-execute over a shared knowledge base (MAPE-K) m...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19261v1
Fast Multichannel NMF with Block-Diagonal Spatial Covariance Matrices for Efficient Blind Source Separation Using Distributed Microphone Arrays
Distributed microphone arrays composed of multiple subarrays enable blind source separation over a wide spatial area. Directly applying fast multichannel nonnegative matrix factorization (FastMNMF) to all subarrays can exploit observations from all subarrays, but it requires repeated inversions of l...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19388v1
Divergence Meets Consensus: A Multi-Source Negative Sampling Framework for Sequential Recommendation
Negative sampling is significant for training sequential recommendation models under implicit feedback. The predominant strategy, self-guided hard negative sampling, selects negatives based on the model's current state but suffers from three limitations: (1) the coupling between sampling and model u...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19651v1
Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance
Learned sparse retrieval models such as SPLADE combine the effectiveness of neural architectures with the efficiency of inverted indices. As these models assign weights to terms from a fixed vocabulary, interpretability is often touted as a major benefit of these models. However, the emergence of wa...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19628v1
The cost of efficiency in flexible neural representations
Working memory depends on the flexible representation of stimulus information in neural activity, which changes dynamically depending on task. Stimulus transformations are thought to be efficient in use of neural resources and optimal for task performance. However, these transformations are often opaque, and efficiency may conflict with optimal performance. Here we show that in a working memory task requiring selective recall of one of two stimuli based on a context cue, the prefrontal cortex of two male monkeys prioritized efficiency by overwriting information within a shared neural subspace rather than maintaining distinct subspaces for each stimulus. In neural activity and recurrent neural networks such efficiency incurs a cost, in that efficient representations are more prone to errors. Conversely, stimulation of the cholinergic forebrain which improves behavior altered this default mechanism by encoding distinct contexts in higher dimensions. These findings demonstrate a fundament
📄 ResearchMay 19, 2026https://www.biorxiv.org/content/10.64898/2026.05.18.722885v1?rss=1
Developing a multi-modal neuroimaging-based BrainAge model across childhood
BrainAge models hold promise as a clinical biomarker for developmental brain health, especially in childhood when there is the potential for early intervention. To distinguish between normative developmental variance and pathological divergence, BrainAge models should reflect the dynamic and diverse neurodevelopmental processes that occur in distinct developmental windows across childhood. We utilized multi-modal neuroimaging data from three pediatric cohorts covering ages 4 to 13 years (n = 1005, 2126 scans), split into Train and Test datasets. Twelve sex-stratified BrainAge models were built stratified by type and different combinations of neuroimaging features. Model types were 'Full-Span' models covering the full age range, and 'Phase-Specific' models split into early- and late-childhood. We first compared BrainAge estimates in the Test dataset amongst our candidate models, then benchmarked the best-performing model against published pre-trained models and DNA-based biological age
📄 ResearchMay 19, 2026https://www.biorxiv.org/content/10.64898/2026.05.19.725847v1?rss=1
ProtmRNA: Cross-Modal Knowledge Transfer from Proteins to Messenger RNA
Motivation According to the central dogma of molecular biology, messenger RNA (mRNA) sequences are directly translated into amino acid sequences, positioning mRNA as the fundamental intermediary between genetic information and functional proteins. This natural correspondence suggests that mRNA sequence analysis could greatly benefit from the rich evolutionary and functional representations learned by large-scale protein language models. Results ProtmRNA repurposes the pre-trained ESM-2 protein language model for mRNA sequence processing via cross-modal transfer learning. Evaluated on mRNA- and protein-related datasets, along with eight additional benchmarks compiled in this study, ProtmRNA achieves performance comparable or superior to state-of-the-art mRNA language models while using less than half the pre-training computational resources. This work establishes the potential of cross-modal transfer learning between biological sequences by demonstrating that protein-derived knowledge c
📄 ResearchMay 19, 2026https://www.biorxiv.org/content/10.64898/2026.05.19.726141v1?rss=1
Real-World Validation of Machine Learning Models for HIV Treatment Adherence Prediction and Care Gap Quantification: A Multi-Country Analysis of 192,732 Clinical Records
Delayed diagnosis and poor antiretroviral therapy (ART) adherence remain primary drivers of HIV-related morbidity in low-resource settings, yet real-world AI validation at scale is lacking. We conducted a retrospective validation study using two publicly available, de-identified datasets: a Quality of Care cohort of 27,288 HIV-positive patients on ART across multiple healthcare facilities, and the CEPHIA multi-country assay database comprising 165,444 specimen records from six countries. Four machine learning classifiers were evaluated using 10-fold stratified cross-validation with SMOTE applied strictly to training folds. Explicit data leakage prevention, ablation analysis, calibration assessment, and bootstrap confidence intervals were applied. Economic projections used one-way sensitivity analysis. This study adheres to TRIPOD reporting guidelines. Random Forest achieved AUC-ROC of 0.9753 (95% CI: 0.970-0.975), sensitivity 87.3% (95% CI: 86.4-88.2%), specificity 95.7% (95% CI: 95.2-
📄 ResearchMay 19, 2026https://www.medrxiv.org/content/10.64898/2026.05.15.26353325v1?rss=1