AI News Archive: May 12, 2026 — Part 24
Sourced from 500+ daily AI sources, scored by relevance.
- Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation
Automated red-teaming for LLMs often discovers narrow attack slices, missing diverse real-world threats, and yielding insufficient data for safety fine-tuning. We introduce Persona-Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on diverse attacker personas (e.g., docto...
- Cochise: A Reference Harness for Autonomous Penetration Testing
Recent work on LLM-driven autonomous penetration testing reports promising results, but existing systems often combine many architectural, prompting, and tool-integration choices, making it difficult to tell what is gained over a simple agent scaffold. We present cochise, a 597 LOC Python reference ...
- Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark
With LLM watermarking already being deployed commercially, practical applications increasingly require multibit watermarks that encode more complex payloads, such as user IDs or timestamps, into the generated text. In this work, we propose a fundamentally new approach for multibit watermarking: intr...
- Convolutional-Neural-Networks for Deanonymisation of I2P Traffic
This study investigates the potential for deanonymizing services within the Invisible Internet Project (I2P) network through passive traffic analysis and machine learning techniques. The primary objective is to identify distinctive patterns in I2P traffic despite the encryption of its payload. To ac...
- SoK: Unlearnability and Unlearning for Model Dememorization
Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce le...
- FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner--executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow format...
- CTFusion: A CTF-based Benchmark for LLM Agent Evaluation
Recent advances in Large Language Models (LLMs) have enabled agentic systems for complex, multi-step tasks; cybersecurity is emerging as a prominent application. To evaluate such agents, researchers widely adopt Capture The Flag (CTF) benchmarks. However, current CTF benchmarks reuse existing challe...
- Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection
Large Language Model (LLM) agents have emerged as key intermediaries, orchestrating complex interactions between human users and a wide range of digital services and LLM infrastructures. While prior research has extensively examined the security of LLMs and agents in isolation, the systemic risk of ...
- More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting
Deep learning-based website fingerprinting has emerged as an effective technique for inferring the websites users visit. Although existing methods achieve strong performance on closed-world datasets, they often fail to generalize to real-world environments, especially under geographic and temporal s...
- Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization
As Model Context Protocol adoption grows, securing tool invocations via meaningful user consent has become a critical challenge, as existing methods, broad always allow toggles or opaque LLM-based decisions, fail to account for dangerous call arguments and often lead to consent fatigue. In this work...
- IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection
Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-inj...
- Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
Large Reasoning Models (LRMs) improve performance on complex tasks, but they also make safety control harder at deployment time. In black-box settings, defenders cannot modify model weights and must instead intervene at inference time. This setting creates three practical challenges: harmful intent ...
- FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models
Diffusion models are the leading approach for tabular data synthesis and are increasingly used to share sensitive records. Whether they actually protect privacy has become a pressing question. Membership inference attacks are the standard tool for this purpose, yet existing attacks assume a single-t...
- Decaf: Improving Neural Decompilation with Automatic Feedback and Search
Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are generally lost as the compiler translates human-readable code to low...
- Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
Autonomous AI agents increasingly extend their capabilities through Agent Skills: modular filesystem packages whose SKILL.md files describe when and how agents should use them. While this design enables scalable, on-demand capability expansion, it also introduces a semantic supply-chain risk in whic...
- The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
Over the past two decades, the task of musical beat tracking has transitioned from heuristic onset detection algorithms to highly capable deep neural networks (DNN). Although DNN-based beat tracking models achieve near-perfect performance on mainstream, percussive datasets, the SMC dataset has stubb...
- Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate (WER). However, WER scores depend heavily on the choice of AS...
- Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
While speech Large Language Models (LLMs) excel at conventional tasks like basic speech recognition, they lack fine-grained, multi-dimensional perception. This deficiency is evident in their struggle to disentangle complex features like micro-acoustic cues, acoustic scenes, and paralinguistic signal...
- Chunkwise Aligners for Streaming Speech Recognition
We propose the Chunkwise Aligner, a novel architecture for streaming automatic speech recognition (ASR). While the Transducer is the standard model for streaming ASR, its training is costly due to the need to compute all possible audio-label alignments. The recently introduced Aligner reduces this c...
- STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
We present STRUM (Spectral Transcription and Rhythm Understanding Model), an audio-to-chart pipeline that converts raw recordings into playable Clone Hero / YARG charts for drums, guitar, bass, vocals, and keys without any oracle metadata. STRUM is a multi-stage hybrid: a two-stage CRNN onset detect...
- Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
Estimating question difficulty is a critical component in evaluating and improving large language models (LLMs) for question answering (QA). Existing approaches often rely on readability formulas, retrieval-based signals, or popularity statistics, which may not fully capture the reasoning challenges...
- Context Convergence Improves Answering Inferential Questions
While Large Language Models (LLMs) are widely used in open-domain Question Answering (QA), their ability to handle inferential questions-where answers must be derived rather than directly retrieved-remains still underexplored. This study investigates how the structure and quality of passages influen...
- Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
Multi-hop question answering (QA) remains a significant challenge in the biomedical domain, requiring systems to integrate information across multiple sources to answer complex questions. To address this problem, the BioCreative IX MedHopQA shared task was designed to benchmark in multi-hop reasonin...
- BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework
Autoscaling has become a baseline expectation for cloud-native big data processing, and the design space has expanded beyond rule-based heuristics to include learned controllers and, most recently, large language model (LLM) agents. Yet despite a growing body of work spanning these paradigms, the co...
- Unlocking Crowdsourcing for Ontology Matching Validation
Recent advances in large language models (LLMs) pose new challenges for ontology matching (OM). While OM systems built on LLMs have shown remarkable capabilities in discovering more mappings, traditional OM validation that relies on domain experts has become overwhelming. In this study, we explore t...
- Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
Generating realistic and user-preferred advertisements is a key challenge in e-commerce. Existing approaches utilize multiple independent models driven by click-through-rate (CTR) to controllably create attractive image or text advertisements. However, their pipelines lack cross-modal perception and...
- From Trajectories to Phenotypes: Disease Progression as Structural Priors for Multi-organ Imaging Representation Learning
Imaging-derived phenotypes (IDPs) summarize multi-organ physiology but provide only static snapshots of diseases that evolve over time. In contrast, longitudinal electronic health records encode disease trajectories through temporal dependencies among past diagnosis events and comorbidity structure....
- RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tas...
- Very Efficient Listwise Multimodal Reranking for Long Documents
Listwise reranking is a key yet computationally expensive component in vision-centric retrieval and multimodal retrieval-augmented generation (M-RAG) over long documents. While recent VLM-based rerankers achieve strong accuracy, their practicality is often limited by long visual-token sequences and ...
- Quality-Aware Collaborative Multi-Positive Contrastive Learning for Sequential Recommendation
The effectiveness of contrastive learning in sequential recommendation hinges on the construction of contrastive views, which ideally should be both semantically consistent and diverse. However, most existing CL-based methods rely on heuristic augmentations that are prone to removing crucial items o...
- HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment
Large language model (LLM)-enhanced sequential recommendation typically aims to improve two core components: user semantic embedding extraction and utilization. Despite promising results, existing methods still have two limitations: 1) In the extraction stage, most methods directly input long intera...
- FedMM: Federated Collaborative Signal Quantization for Multi-Market CTR Prediction
Online platforms such as Amazon and Netflix serve users across multiple countries and regions, underscoring the importance of multi-market recommendation (MMR). Most MMR methods adopt a pre-training and fine-tuning paradigm, in which a unified model is first trained on centralized, global data and s...
- Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models
Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation space; a frozen embedding model should therefore benefit from extra inf...
- Caraman at SemEval-2026 Task 8: Three-Stage Multi-Turn Retrieval with Query Rewriting, Hybrid Search, and Cross-Encoder Reranking
We describe our system for SemEval-2026 Task 8 (MTRAGEval), participating in Task A (Retrieval) across four English-language domains. Our approach employs a three-stage pipeline: (1) query rewriting via a LoRA-fine-tuned Qwen 2.5 7B model that transforms context-dependent follow-up questions into st...
- TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning
Generative recommendation with Semantic IDs (SIDs) has emerged as a promising paradigm, yet existing methods apply a fixed inference strategy, either fast direct generation or slow chain-of-thought reasoning, uniformly across all user histories. This approach creates a trade-off: fast recommendation...
- Conditional Memory Enhanced Item Representation for Generative Recommendation
Generative recommendation (GR) has emerged as a promising paradigm that predicts target items by autoregressively generating their semantic identifiers (SID). Most GR methods follow a quantization-representation-generation pipeline, first assigning each item a SID, then constructing input representa...
- Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
During disasters, extracting causal relations from social media can strengthen situational awareness by identifying factors linked to casualties, physical damage, infrastructure disruption, and cascading impacts. However, disaster-related posts are often informal, fragmented, and context-dependent, ...
- Google finds first AI-developed zero-day that bypasses 2FA — self-morphing malware and Gemini-powered backdoors signal a new era of cybercrime
Google cybersecurity boffins found at least one AI-developed zero-day exploit
- Environmental Volatility Shifts Visual Search from Capture to Caution
Real-world distractors occur in environments whose states change at different rates. We asked whether such volatility alters early attentional gating or instead changes the criterion for committing to a response. Observers performed an additional-singleton search task with concurrent eye tracking while distractor presence followed high- or low-volatility sequences, with overall distractor prevalence held constant. Trial-pooled oculomotor capture was higher under high volatility, a pattern that appears to indicate altered filtering. That inference did not survive repetition-aware analysis: once the same-location run position was matched, capture did not detectably differ across volatility regimes. The pooled capture effect was therefore consistent with a structural consequence of the volatility manipulation, which enriched high-volatility blocks with early-run positions where capture is intrinsically high. The positive volatility signature appeared on distractor-absent trials, where hig
- A computational model reveals that spatial localization of cancer stem cells increases radioresistance in tumorspheres
Cancer stem cells (CSCs) exhibit increased resistance to radiotherapy, contributing to tumor recurrence and progression. While CSCs are known for their intrinsic resistance, the role of their spatial organization remains poorly understood. We extend a computational model of tumorsphere growth to investigate how the spatial distribution of CSCs influences radiation response. The model explicitly tracks cell lineages and spatial positions, revealing a preferential accumulation of CSCs in the spheroid interior. Because radiosensitivity increases with oxygen availability, and oxygen levels are lowest in the tumor core, this spatial organization confers a protective advantage to the CSC population. We find that this effect is negligible in small, well-oxygenated tumorspheres but becomes pronounced as growth leads to the emergence of hypoxic regions. To isolate the role of spatial structure, we compare these results with control simulations in which CSC positions are randomly reassigned. In
- Spurious correlation inflates performance in single-cell perturbation prediction
The increasing number of computational methods designed to predict the effects of genetic perturbations on cellular gene expression profiles has led to a need for rigorous evaluation metrics. Recent benchmarking studies rely on correlation or cosine similarity of differential expression relative to a shared population of control cells. We show that these metrics are systematically inflated by statistical bias induced by reusing the same control population to define both quantities being compared. As a result, even non-informative methods can appear to perform well, particularly in datasets with limited numbers of control cells. Reanalysis of published datasets using a simple control-splitting procedure that removes this bias leads to a substantial reduction in performance previously attributed to biological signal.
- Temporal-deviation-driven community detection uncovers early-warning signals for critical transitions in complex diseases
Early detection of critical transitions in complex diseases is crucial for timely clinical intervention. However, as patients often provide only a single snapshot, identifying sample-specific early-warning signals (EWS) from a dynamical evolution perspective remains challenging, coupled with high-dimensional noise amplification. Here, we present TD-COM, a framework for detecting personalized EWS of critical transitions via single-sample community detection. By constructing a temporal perturbation map STDN, TD-COM captures latent dynamical perturbations inferred from static individual profiles. Synergizing these temporal-deviation signals with static topological features, TD-COM implements a multi-level node filtering strategy during community detection, effectively suppressing single-sample noise. Validated on hour-scale, multi-year, and multi-decade transcriptomic data, TD-COM robustly detects critical states preceding clinical deterioration and uncovers their underlying molecular mec
- Dual-view Guided Context-aware Network for Automated Bone Lesion Segmentation and Quantification in Whole-body SPECT
Whole-body SPECT bone scintigraphy reflects skeletal metabolic activity throughout the body and plays an indispensable role in the screening, treatment evaluation, and prognostic assessment of bone metastases in tumors. However, the automatic detection and segmentation of hypermetabolic bone lesions remain challenging due to low contrast, limited spatial resolution, and complex lesion distributions. In this study, we proposed Bone-Segnet, a dual-view guided automatic segmentation network for hypermetabolic bone lesions that integrated multi-scale feature modeling, global context modeling, and view-conditioned modulation. Pixel-level annotated anterior and posterior whole-body bone scintigraphy images were used for model training and prediction. The proposed network enhanced the recognition of low-contrast and small-scale lesions through small-lesion enhancement and multi-scale contextual modeling. A Transformer module was further introduced to strengthen global feature representation,
- A Three-Layered Agent-Based Model of Adult Hippocampal Neurogenesis (HANG-AB3L) with Stochastic Cell Fate Determination
Hippocampal adult neurogenesis (HANG) is a highly regulated process where neural stem cells progress through distinct stages, from Type 1 radial glia-like cells to mature neurons, via a complex series of proliferative and differentiative divisions. While recent in vivo imaging has challenged the classical paradigm of asymmetric division, the exact relationship between individual cell-fate decisions and long-term population stability remains difficult to quantify empirically. In this study, we utilized an agent-based (AB) model to simulate the stochastic dynamics of the hippocampal neurogenic niche. Our results demonstrate that while individual progenitor lineages exhibit high variability and probabilistic division symmetries (proliferative symmetric, asymmetric, and differentiative symmetric), the system achieves deterministic stability as the initial progenitor density increases. We found that the Type 1 progenitor pool follows a negative exponential decay profile, with its longevity
- Musk Lawyer’s Question for Sam Altman on the Stand: Are You Trustworthy?
Mr. Altman, the C.E.O. of OpenAI, said on Tuesday that he worried Elon Musk wanted control of the A.I. lab.
- A Multimodal Framework for Organ- and Cell-Resolved Biological Aging and Longevity Intervention Discovery
Aging is the primary driver of chronic disease and mortality, requiring comprehensive frameworks for quantification of aging and nomination of longevity interventions. We developed mAge (multimodal age), a biological aging framework that integrates plasma proteomics, wearables, and mortality hazard to predict biological age, intrinsic capacity, and mortality risk. By combining proteomic and wearable data in UK Biobank samples, mAge exceeds unimodal baseline age prediction to 0.87 test R2 and 2.3 years mean error, and reduces unimodal baseline mortality prediction error by 21%. We further constructed organ- and cell type-specific biological clocks that quantify aging across 49 distinct subsystems, revealing that cardiac, immune, and intracellular protein signatures benefit most from wearable integration. By mapping data to FDA-approved drug targets, we identified interventions, such as GLP-1 receptor agonists, gabapentin, and ACE inhibitors, that are associated with lower overall and su
- Evaluating Genomic Surveillance Methods for Shigella sonnei in a High-Income Setting
Shigella sonnei is a human-adapted enteric pathogen with a very low infectious dose and increasing antimicrobial resistance. In high-income settings, transmission is multimodal including sporadic cases/outbreaks associated with food and travel, as well as sustained transmission among sexual networks of men who have sex with men (MSM). Whole-genome sequencing (WGS) now underpins national shigellosis surveillance in the United Kingdom. Hence, consistent, communicable genotyping is essential for case linkage and trend detection across heterogeneous transmission modes. Here, we evaluate the performance of WGS genotyping approaches for granulating outbreaks of S. sonnei shigellosis, particularly considering differential performance in dense sexual transmission where highly clonal MSM-associated sublineages pose distinct clustering challenges. Specifically, we compare performance of the current practice approach (10 SNP-distance clustering based on SNP address [t10]), allele-based methods (E
- Development and Validation of a Multimodal Clinical, Pathologic, and Genomic Model for Breast Cancer Recurrence
Purpose: To develop and validate a multimodal recurrence-risk model integrating histology, genomic testing, and clinical variables. Methods: We developed AI-Path, a whole-slide image biomarker for recurrence prediction trained in CALGB 9344, and validated it in three independent cohorts: TAILORx, a multi-site Chicago cohort, and the MDX-BRCA cohort. We then integrated AI-Path with Oncotype DX Recurrence Score (RS), tumor size, and nodal status into a Cox model, PathClinRS, fit using 60% of cases from TAILORx, with the remaining 40% held out for validation. The primary end point was distant recurrence-free interval. Performance was assessed using Harrell's concordance index (C-index) and Kaplan-Meier analyses. Results: A total of 12,418 patients were included. In TAILORx, AI-Path outperformed RS for distant recurrence (C-index, 0.682 vs 0.647; P = .038), driven by superior prediction of late recurrence (0.656 vs 0.567; P < .001). In node-negative disease, PathClinRS outperformed RSClin
- Machines with the ability to ‘feel’ currently in development as we enter next frontier of AI
Machines with the ability to ‘feel’ currently in development as we enter next frontier of AI EurekAlert!
- 60% of U.S. teens have tried AI chatbots, 11.4% use them almost daily
60% of U.S. teens have tried AI chatbots, 11.4% use them almost daily EurekAlert!