AI News Archive: May 18, 2026 — Part 23
Sourced from 500+ daily AI sources, scored by relevance.
- Contextualized Dynamic Explanations: A Vision
Asynchronous data-driven explanations often fail because the content and presentation are not tailored to the target audience, and they provide limited opportunities for active audience engagement. We present a vision for Contextualized Dynamic Explanations (CODEX), an agentic approach to dynamicall...
- In-Vehicle Human-Machine Interface to Support Drivers in Conditionally Automated Platooning
Vehicle platooning enables close-gap driving and offers potential benefits for traffic efficiency and safety. In conditionally automated platooning, drivers remain responsible for supervising the system and intervening when necessary, making effective Human-Machine Interfaces (HMIs) critical for mai...
- Exploring Trust Calibration in XAI - The Impact of Exposing Model Limitations to Lay Users
Trust calibration -- aligning user trust judgment with model capability -- is crucial for safe deployment of explainable AI (XAI), yet is often evaluated via global trust ratings detached from objective performance evidence. We present a preregistered, incentivized between-subject online study (N=41...
- See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervis...
- Low Latency Gaze Tracking via Latent Optical Sensing
We present a real-time gaze tracking system that directly acquires task-relevant latent features using a fully passive optical encoder. Instead of forming and processing full-resolution images, our approach leverages a microlens array with a co-designed binary chromium mask to perform spatially mult...
- Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap
We automatically generate feedback causal fuzzy cognitive maps (FCMs) from text by teaching large-language-model agents to break the text into overlapping chunks of text. Convex mixing of these chunk FCMs gives a representative cyclic FCM knowledge graph. The text chunks can have different levels of...
- Towards SocratiCode: Designing a Generative AI-Based Programming Tutor for K-12 Students through a 4-Week Participatory Design Study
Generative AI creates new opportunities for programming education, but many existing systems remain overly directive, producing lengthy explanations and premature solutions that can overwhelm K-12 novices. In this paper, we present a participatory design study of how an adaptive tutorial system, Soc...
- Agents for Experiments, Experiments for Agents: A Design Grammar for AI-Enabled Experimental Science
AI systems are becoming active participants in organizational and knowledge work. They increasingly interact with humans, coordinate workflows, and operate in multi-agent arrangements. Understanding their effects therefore requires more than measuring output accuracy; it requires evidence about mech...
- UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation
Manually annotating accurate 3D hand poses is extremely time-consuming and labor-intensive. Existing self-supervised hand pose estimation methods leverage the discrepancy between input images and rendered outputs, or multi-view consistency constraints, as the driving force to optimize networks and p...
- Federated Naive Bayes with Real Mixture of Gaussians and Institutional Governance Regularization for Network Intrusion Detection
Federated learning for intrusion detection rests on a flawed premise: that every participating institution contributes equally to the shared model. In practice, a financial institution with mature security controls and low vulnerability exposure produces fundamentally different data than a governmen...
- Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control
Large language models increasingly operate as autonomous agents that select and invoke tools from large registries. We identify a critical gap: when unauthorized tools are visible in an agent's context, models select them in adversarial scenarios -- even when explicitly instructed otherwise. We prop...
- Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction
Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text inputs, revealing a persistent multimodal safety gap. We study this gap from a representation-geometric perspective by analyzing a text-aligned refusal...
- Babel: Jailbreaking Safety Attention via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models (LLMs) remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic vulnerabili...
- From Detection to Response: A Deep Learning and Retrieval-Augmented Generation Framework for Network Intrusion Mitigation
Machine-learning-based Intrusion Detection Systems (IDS) have achieved impressive accuracy in classifying network attacks, yet they consistently fall short on the question that matters most to a security analyst: what should I do next? This paper presents a unified, end-to-end framework that closes ...
- Explainable Machine Learning for Phishing Detection on Heterogeneous Datasets with MCP-Enabled Deployment
With the growth in digital transformation and Internet usage, the Social Engineering techniques such as Phishing have become a major concern for the users and the organizations. Phishing attacks involve deceptive techniques to trick users into revealing confidential information that causes financial...
- Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models
The integration of audio modality into Large Audio Language Models (LALMs) significantly expands their attack surface. Existing jailbreak paradigms predominantly treat audio as a carrier for malicious payloads, relying on semantic optimization, acoustic parameter control, or additive perturbation to...
- LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio
AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email, downloaded files, webpages, repositories, or group-chat mess...
- Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators
AI Accelerator (AIA) are specialized hardware e.g., Tensor Processing Unit (TPU), that enable optimal and efficient execution of AI applications and on-device inference. The growing demand for AI applications has led to the widespread adoption of AIAs on Edge or embedded devices on Edge or embedded ...
- Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents
Behavioral studies of LLM-based software engineering agents extract operational rules about which trajectory shapes correlate with higher resolution rates: that a test step follows a code modification, that error cascades are short, or that trajectories are compact. Each rule is typically derived fr...
- CommitDistill: A Lightweight Knowledge-Centric Memory Layer for Software Repositories
Software repositories accumulate large amounts of unstructured knowledge in commit messages, pull-request discussions, and issue threads, but developers and AI coding assistants rarely reuse this history effectively. Recent work on typed-memory architectures for LLM agents (MemGPT, generative agents...
- Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection
Automated vulnerability detection is crucial for enhancing software security by identifying potential flaws that attackers could exploit, thereby reducing the reliance on labor-intensive manual code audits. Recent advancements have shifted towards leveraging large language models (LLMs) for vulnerab...
- A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback
Large Language Models (LLMs) demonstrate strong potential for automated code generation, yet their ability to iteratively refine solutions using execution feedback remains underexplored. Competitive programming offers an ideal testbed for this investigation, as it demands end-to-end algorithmic reas...
- LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems
Production log analytics in self-hosted, resource-constrained environments requires natural-language access to massive log streams without the cost of routing every query through a large language model. We present LogRouter, an end-to-end log question-answering system deployed on TUBITAK BILGEM's na...
- BLAgent: Agentic RAG for File-Level Bug Localization
Bug localization remains a key bottleneck in downstream software maintenance tasks, including root cause analysis, triage, and automated program repair (APR), despite recent advances in large language model (LLM)-based repair systems. File-level bug localization is especially critical in hierarchica...
- Contextualized Code Pretraining for Code Generation
As code generation becomes increasingly central to improving software development efficiency, modern code models are largely trained and evaluated on code with natural-language descriptions. In real projects, developers often implement missing functions under limited project-specific artifacts, whil...
- LLM-Based Static Verification of Code Against Natural-Language Requirements: An Industrial Experience Report
Large language models (LLMs) are increasingly used to generate requirements specifications, design documents, code, and test cases. In contrast, much less attention has been given to a more difficult assurance problem: statically verifying whether implemented code satisfies requirements written in n...
- One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise
AI tools are enabling engineers to absorb roles previously distributed across cross-functional squads, yet there is little structured evidence on how to design or evaluate such a one-person squad in a regulated enterprise setting. Without that evidence, organizations adopting this model lack guidanc...
- Verify-Gated Completion as Admission Control in a Governed Multi-Agent Runtime: A Bounded Architecture Case Study
As multi-agent systems move from short interactions to tool-using workflows with specialized roles and persistent state, completion becomes a runtime-control problem rather than a purely generative one. This preprint studies verify-gated completion as an admission-control pattern for governed multi-...
- Contextual Biasing for Streaming ASR via CTC-based Word Spotting
Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ...
- Sometin Beta Pass Notin (SBPN): Improving Multilingual ASR for Nigerian Languages via Knowledge Distillation
Although modern multilingual Automatic Speech Recognition (ASR) systems support several Nigerian languages, their performance consistently lags behind high-resource languages like English and French. Nigerian languages present unique modelling hurdles, including acute data scarcity, inconsistent ort...
- Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters
Recently, a spatially selective non-linear filter (SSF) has been proposed for target speaker extraction, using the target direction-of-arrival (DOA) as a spatial cue. Since learned intermediate features are tied to the microphone geometry, the performance of the SSF degrades significantly when evalu...
- PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries
The rapid growth of tabular datasets in data lakes, data spaces, and open data portals makes effective dataset search essential for reuse and analysis. Existing search systems rely mainly on metadata, which is often incomplete or low quality, especially for tables whose meaning depends on both schem...
- Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-Free Multimodal Recommendation
Multimodal recommendation has attracted extensive attention by leveraging heterogeneous modality information to alleviate data sparsity and improve recommendation accuracy. Existing methods have attempted to replace ID embeddings with multimodal features and have achieved promising preliminary resul...
- Traditional statistical representations outperform generative AI in identifying expert peer reviewers
The exponential growth of scientific submissions has strained the peer review system. Despite the rapidly expanding global pool of researchers, this unprecedented scale has rendered the previous approach of manual expert identification unfeasible. Therefore, institutions have naturally turned to Lar...
- RCTEA: Richness-guided Co-training for Temporal Entity Alignment
Temporal Entity Alignment (TEA), which aims to identify equivalent entities across Temporal Knowledge Graphs (TKGs), is crucial for integrating knowledge facts from multiple sources. However, existing TEA models often fail to capture the orthogonal yet complementary effects between structural and te...
- Text-Video Retrieval With Global-Local Contrastive Consistency Learning
Text-video retrieval aims to find the most semantically similar videos with given text queries. However, since videos contain more diverse content than texts, the main semantics expressed by each text-video pair is often partially relevant. The primary methods involve the utilization of language-vid...
- DADF: A Distribution-Aware Debiasing Framework for Watch-Time Regression in Recommender Systems
Watch-time prediction is a central regression task in short-video recommender systems, where labels are highly long-tailed and residual errors vary systematically across observed watch-time regions. In practice, a model may appear globally calibrated while still overestimating short views and undere...
- Uncertainty-Calibrated Recommendations for Low-Active Users
A fundamental challenge in recommender systems is balancing reliability for Low-Active Users (LAUs) with diversity for High-Active Users (HAUs). The key to this balance lies in quantifying model uncertainty, which approximates the risk of prediction errors and reveals the limits of the model's curre...
- Transcript architecture predetermines m6A remodeling and sensory neuron vulnerability in chemotherapy-induced peripheral neuropathy
Whether individual transcripts carry intrinsic features that predetermine their response to external perturbations is unknown. Here we used nanopore direct RNA sequencing of male mouse dorsal root ganglia (DRG) to simultaneously profile N6-methyladenosine (m6A) modifications, poly(A) tail dynamics, and full-length isoform identity on individual RNA molecules from mice treated with bortezomib, a proteasome inhibitor that causes painful peripheral neuropathy. Machine learning revealed that transcript-intrinsic features predetermine the magnitude of perturbation-induced m6A loss (R2 = 0.983). Motif composition and spatial distribution alone, without any modification measurement, predicted 80-88% of m6A erosion variance, establishing that perturbation response is encoded in transcript architecture before any modification is deposited. Expression level contributed just 2.6% of predictive importance. Bortezomib removed a fixed ~73.5% fraction of m6A marks, meaning absolute loss scaled linear
- Structure and Dynamics of the HIV-1 Envelope Protein on the Virion Envelope
HIV-1 buds from infected cells as immature virion particles with a scattered envelope glycoprotein (Env) distribution on their envelope. It then undergoes maturation, during which the viral protease cleaves the Gag polyprotein at multiple sites, leading to structural reorganization of the viral particle and lateral redistribution of Env proteins, ultimately rendering the virion infectious. However, the underlying mechanism of maturation-induced Env reorganization remains elusive. In this study, we combine microsecond-long all-atom (AA), bottom-up coarse-grained (CG) molecular dynamics simulations, and diffusion model-based backmapping to investigate the structural organization and key interactions of Env in viral membranes. AA simulations of fully glycosylated Env embedded in HIV-1 mimetic asymmetric bilayers were first performed to characterize its conformational dynamics and Env-lipid interactions. We then developed a bottom-up CG model of glycosylated Env from that AA data and simul
- Deep Learning Structural Ensembles as Proxies for Protein Flexibility
Protein dynamics are essential to biological function, yet understanding whether deep learning models contain information about these dynamics remains an open question. In this study, we quantitatively investigate the capacity of deep learning structure generation methods to predict protein flexibilities by directly comparing residue-level mean squared fluctuation (MSF) profiles derived from structural ensembles with experimental or simulation-informed flexibility profiles. We assembled four diverse benchmark datasets representing different types of structural information, including 70 NMR ensembles, 43 X-ray crystallographic protein pairs in two distinct conformational states, 82 high-resolution cryo-EM structures, and molecular dynamics simulations of 10 proteins. Utilizing AlphaFold3, AlphaFold2, and RosettaFold to generate multiple structural models, we applied ranksort normalization to place the profiles on a comparable scale and quantified similarity primarily using cosine and Pe
- Factors Influencing Vitamin D Status in Guiyang, China: A Random Forest and SHAP Analysis
Objective To assess serum 25-hydroxyvitamin D [25(OH)D] levels in a health examination population in Guiyang, a low-latitude, high-altitude, and cloudy city in southwestern China, and to identify key determinants using machine learning. Methods This retrospective study included 10,931 adults (>20 years) who underwent health checkups at Guiyang First People's Hospital between February 2019 and April 2025. Beyond conventional statistical comparisons, a two-stage machine learning approach was applied: LASSO regression for feature selection, followed by an optimized Random Forest regression model (mtry = 2). SHapley Additive exPlanations (SHAP) were used to quantify variable importance. Results The median serum 25(OH)D level was 36.63 (IQR 24.77,53.17) nmol/L. Vitamin D deficiency (<50 nmol/L) was present in 70.98% of participants, while sufficiency (>75 nmol/L) was only 7.35%. Significantly lower levels were observed in females, in adults aged <30 years (deficiency rate 85.6%), and during
- Assessing the reliability of immunofluorescence image analysis with artificial intelligence
In view of the outstanding progress of machine learning (ML) and growing cost of health systems, it is a current challenge to incorporate artificial intelligence tools into actual medical practice. Here we explored the feasibility and reliability of using machine learning to perform an important immunological investigation that currently requires experienced biologists : Anti-nuclear cytoplasmic antibodies (ANCAs) are important markers for vasculitis and they may be evidenced by microscopic examination of cells labeled with patients' sera. The use of a reliable ML classifier to discriminate between positive and negative samples would increase the rapidity and decrease the cost of immunofluorescence-based ANCA detection. Here, we tested seven well-documented ML algorithms, ranging from simple models such as k nearest neighbors to more complex convolutional neural networks involving millions of adjustable parameter. We studied the feasibility and reliability of classifying 1114 serum sam
- Large Language Model Performance in UK Advice & Guidance: A Pilot Study in Neurology
Background: Large language models (LLMs) demonstrate strong performance in controlled medical environments such as multiple choice exams, but their utility in real-world clinical workflows remains unproven. The NHS Advice & Guidance (A&G) service, where Primary Care clinicians can submit text-based queries to specialists, provides an environment for evaluating the clinical performance of LLMs as a specialist. Methods: We compared responses from MedGemma 4B-IT, an open-weight model deployed locally on hospital infrastructure, against specialist neurologist responses across 50 adult neurology A&G cases from University College London Hospital. Two neurologists and two GPs rated 80 blinded and 20 unblinded responses for outcome, safety, efficacy, and feasibility using standardised criteria; outcome was a binary correct/incorrect, while other domains were scored 1-5. Inter-rater reliability was assessed using intraclass correlation coefficients. Results: Although there were no statistically
- A clinically integrated, frameless human Neuropixels workflow
High-density electrophysiological recording using Neuropixels probes enables single-unit resolution of human neural activity. However, integrating these systems into clinical environments remains challenging. Reported human recordings have been limited to a few centres in the United States utilising variable regulatory, sterilisation and operative techniques. Here, we present human Neuropixels recordings under a nationally managed ethical and regulatory framework in the United Kingdom. We provide a reproducible roadmap to overcome regulatory and equipment constraints. Guided by the IDEAL Stage 2a (Development) framework, we established a frameless intraoperative workflow utilising manufacturer-sterilised probes and a commercially available, clinical-grade setup for Neuropixels insertion including micromanipulator and endoscope holder. We prospectively evaluated this workflow across six participants (mean age 62.5 years) undergoing elective ventriculoperitoneal shunt surgery. Iterative
- Wearable EEG during gameplay captures a robust P300 cognitive signal in unsupervised home settings
Objective. Continuous, unsupervised monitoring of cognitive brain responses has long been constrained by the demands of laboratory EEG. Whether the P300 event-related potential, an established marker of attention and cognitive processing, can be elicited as an incidental byproduct of genuine gameplay, recorded with a minimal wearable EEG system under unsupervised home conditions, has not been established. Approach. Ten healthy adults played a gamified visual oddball task in which infrequent target stimuli (green gates) were embedded among frequent non-targets (red gates) within a continuous third-person running game. EEG was recorded with a four-channel dry-electrode headband (EEG channels: O1, O2, T3, T4; forehead reference; 250Hz) with self-mounted electrodes in a home setting, without experimenter supervision. Group-level effects were assessed with cluster-based permutation tests and peak-amplitude tests. Single-trial classification used linear discriminant analysis (LDA) with four
- Clinical Note Comparison and Data Retrieval Via Embedding Vectors: Model Selection, Metrics, and Convergence
Background: Embedding models are an integral part of generative AI architectures, transforming text into embedding vectors that represent semantic content in numerical form. Despite their central role, their performance in clinical settings remains underexplored. We evaluate embedding models across two tasks: semantic difference detection in clinical texts, and data retrieval from patient records. Methods: Eight models were applied to synthetic discharge summaries in English, Finnish, and Swedish. Semantic sensitivity was assessed by introducing controlled perturbations (deletion, modification, and paraphrasing) at three levels of severity; cosine similarity, and L1 and Euclidean distances were computed between the vectors of the original and perturbed texts. Partial vectors were compared to explore dimensionality reduction. Two models with the biggest contrast in semantic difference detection were evaluated on retrieval of relevant information from real Finnish vascular surgery record
- Improved prostate cancer prediction by combining Prostate-Specific Antigen (PSA) test results with Genetic Risk Scores (GRS/PRS)
Background: Prostate cancer is the second most common cancer in men worldwide. The Prostate Specific Antigen (PSA) blood test is widely used for prostate cancer detection but suffers from high false-positive rates (up to 80%). Genetic risk scores (GRS/PRS) have a similar performance to PSA testing in predicting prostate cancer risk. Method: GRS269 for prostate cancer was derived using 269 known risk variants and applied to UK Biobank participants. We assessed whether GRS269 improved power to predict prostate cancer diagnosis on top of age and pre-prostatectomy PSA level among 17,380 cases. Longitudinal PSA measurements were processed as median, first, last (most recent), and random PSA. All models were adjusted for age. Results: Across all PSA measures, the integrated model combining GRS269, PSA, and age consistently outperformed models using GRS269 or PSA alone. The highest predictive performance was observed using the last PSA value combined with GRS269 (AUC = 0.82, 95% CI: 0.81-0.82
- NextEra’s $67 billion deal pokes the AI bear
Regulators’ approval will test Americans’ anger against AI.
- The US megadeal set to spark a fight over the cost of the AI boom
Proposed deal between NextEra and Dominion would cement control of US ‘data centre alley’