AI News Archive: May 26, 2026 — Part 17

Sourced from 500+ daily AI sources, scored by relevance.

The Compressive Knowledge Graph Hypothesis: Which Graph Facts Matter for Scientific Hypothesis Generation?
Knowledge graphs (KGs) can provide structured scientific context to language models, but it remains unclear which graph facts actually shape the generated hypotheses. We study KG-guided hypothesis generation for battery materials across Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. We perturb loc...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27176v1
An investigation of AI integration in sound designer workflows and experiences
Artificial intelligence is increasingly being integrated into professional audio production workflows, yet a gap persists between the tools developers produce and the requirements of practising sound designers. This paper investigates this gap through a mixed-methods study comprising a survey of 76 ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27174v1
Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering
Retrieval-Augmented Generation (RAG) systems for question answering typically retrieve evidence by semantic similarity between the query and document chunks. While effective for unstructured text, this approach is less reliable on semi-structured corpora where answering may require exact filtering, ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27164v1
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs
Retrieval-augmented LLMs are deployed for tasks where evidence quality determines action safety, yet evaluation protocols assume that single-turn robustness predicts robustness when evidence accumulates across turns. We show this assumption is fundamentally incorrect. Models exhibit a monitoring-con...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27157v1
Mira Ai
An AI companion that remembers what you're going through
🧰 ToolsMay 26, 2026https://www.producthunt.com/products/mira-ai-2?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
LitSeg: Narrative-Aware Document Segmentation for Literary RAG
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external knowledge, particularly for long-tail domains such as literary works. However, the critical step of document segmentation in RAG remains largely underexplored. Existing strategies are typically seman...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27156v1
Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection
Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic robustness probing: users upload deployment images, create masks manually or automatically, select operational design domain-derived fa...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27155v1
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interacti...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27141v1
StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning
Reinforcement learning for multi-turn agents suffers from a credit-assignment mismatch: rewards are sparse and trajectory-level, while success often hinges on a few local decisions. Existing online policy distillation (OPD) provides denser token-level supervision, but typically treats heterogeneous ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27140v1
ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules
Machine unlearning aims to remove the influence of specific data from trained language models. In real-world deployments, unlearning requests often arrive sequentially, which challenges existing fine-tuning-based methods: fine-tuning each request is costly, accumulates utility loss, and may cause cr...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27138v1
Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems
Deep unfolding neural networks derived from iterative optimization schemes and numerical ordinary/partial differential equations (ODEs/PDEs) have attracted much attention in data science over the last decade. Therein, numerous important network architectures were constructed from the basic forward-b...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27133v1
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27130v1
Position: AI Safety Requires Effective Controllability
AI safety is still largely framed as alignment: training models to follow human preferences, safety policies, and normative constraints. That framing has improved the behavior of modern language models, but aligned behavior does not by itself guarantee that a deployed agent can be stopped, overridde...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27117v1
Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model. Recent Multi-Teacher On-Policy Distillation (MOPD) pipelines recover model capabilities by supervising student-generated trajectories with teacher feedbac...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27115v1
ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference
Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in the cache must be fet...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27081v1
Algorithmic Monocultures in Hiring
Many employers screen job applicants with algorithms built by the same few algorithm vendors. We hypothesize that algorithmic monoculture leads to the same individuals and members of the same racial groups facing rejection. We acquire and analyze a novel dataset of 3 million applicants submitting 4 ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27371v1
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27365v1
GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing
Cellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration: (i) synthesizing new features from standards or research papers into production code; (ii) conformance and interoperability testing; (iii) hardening aga...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27360v1
MobileMoE: Scaling On-Device Mixture of Experts
Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billio...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27358v1
When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection
Recent generative models have largely closed the gap on low-level artifacts - pixel fingerprints, frequency anomalies, upsampling traces - particularly in person-centric and partial-edit settings where the manipulated region is small and surrounded by photometrically authentic content. We introduce ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27348v1
Tracing Computation Density in LLMs
Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Trace method to efficiently estimate the subgraph of size s that best ap...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27033v1
EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering
Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show promise in the conversion of these flowcharts into machine-readable models for RE activities, yet, when directly applied to flowchart conversion, they often fail on...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27332v1
Maat: The Agentic Legal Research Assistant for Competition Protection
Competition law experts conducting legal research must review extensive volumes of cases, decisions, and judicial reports to identify precedents and assess key elements in competition and merger cases. Although general research assistants such as Claude and ChatGPT and legal assistants such as SaulL...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27331v1
Governed Evolution of Agent Runtimes through Executable Operational Cognition
Recent advances in agentic systems increasingly treat code as an executable operational substrate rather than as a disposable output artifact. Prior work such as \emph{Code as Agent Harness} frames validated agent-generated artifacts as runtime entities that can be created, executed, revised, persis...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27328v1
Risk Averse Alert Prioritization for IDS Using Subnormal Gaussian Fuzzy Models
Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness due to too many false positives or low-impact events. We address this by proposing a principled framework for alert prioritization based on subnormal Gaussian fu...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27299v1
FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies
Vision-Language-Action (VLA) models are increasingly expected to not only complete robot tasks, but also follow human instructions about how those tasks should be executed. However, existing robot datasets usually pair trajectories with coarse goal-level language, leaving execution-critical details ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27284v1
SIA: Self Improving AI with Harness & Weight Updates
Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by people. The long-horizon goal of an AI that can figure out how to improve itself remains open. Two largely disjoint research lines attack this bottleneck. The ha...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27276v1
LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models
Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, context selection directly determines predictive performance. Supervised oracle experiments show that carefully chosen labeled context sets can strongly outperfor...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27254v1
Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering
An effective method of teaching across disciplines is to provide examples of high-quality work. However, an example may be significantly different from a student's current work, making it challenging for them to emulate. An ideal learning demonstration is a counterfactual version of the student work...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27249v1
Generative Animations: A Multi-Model Pipeline for Prompt-Driven Motion Synthesis
Animation elevates digital documents into immersive experiences, yet creating custom motion paths remains cumbersome, requiring designers to manually select presets, plot Bézier points, and configure timing properties. We introduce Generative Animations, a system that transforms natural language pro...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27203v1
Learning When to Think While Listening in Large Audio-Language Models
Recent advances in Large Audio-Language Models (LALMs) have made real-time, streaming spoken interaction increasingly practical. In this setting, reasoning quality and responsiveness are tightly coupled: delaying reasoning until the speech endpoint can improve answer quality but moves deliberation i...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27190v1
FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process....
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27178v1
Grounding Text Embeddings in Stakeholder Associations
Text embeddings are widely used to analyse large corpora of complex texts. However, it is unclear whether the embeddings capture the same semantic distances as the human experts using them. Ensuring alignment between embedding representations and human intentions is essential for valid analyses. We ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27168v1
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27134v1
Beyond the Data Mesh Illusion: Designing Modern AI-augmented Lakehouses to Bridge the Gap Between Theory and Practice
Enterprise data platforms face an enduring tension between domain self-service and holistic governance. The data mesh paradigm proposed decentralized domain ownership as a remedy, but pure implementations frequently underdeliver: teams inherit new responsibilities without the platform maturity, tool...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27131v1
High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework
In recent years, financial institutions and firms have increasingly adopted synthetic data to address data scarcity and to generate counterfactual market scenarios. However, reproducing all the statistical properties of financial time series, commonly known as stylized facts, remains an open challen...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27113v1
Model Merging on Loss Landscape: A Geometry Perspective
Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either ignore the geometry of the loss landscape or rely on intractable full-space Hessian approximations. We propose EpiMer, a framework that casts model merging as s...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.26693v1
Can Broad Biomedical Knowledge be Contextualized into Scenario-Grounded Propositions?
Biomedical discovery often requires connecting broad biomedical knowledge with specific experimental or clinical data. Background knowledge suggests relevant mechanisms but is usually too general to map directly onto dataset variables, while data-driven patterns can be dataset-specific and hard to i...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27082v1
MATCHA: Matching Text via Contrastive Semantic Alignment
Reliable evaluation is essential for understanding large language model (LLM) performance, yet today's go-to metrics, namely token-overlap scores (e.g., ROUGE) and embedding-based measures (e.g., BERTScore), often misjudge semantic similarity of documents. Our study shows that both token-overlap met...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27345v1
FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents
Finance LLM agents must simultaneously block prompt-induced unauthorized actions and approve legitimate multi-step business workflows. However, boundary filters often miss irreversible mid-trajectory tool calls, while post-hoc LLM judges perform auditing only after termination -- too late for interv...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27333v1
Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery
Visual inputs are often assumed to improve language understanding in multimodal models. We examine this assumption by asking whether vision-language models (VLMs) can distinguish useful visual evidence from incidental image context in lexical judgments. We use human concreteness and imagery ratings ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27315v1
Probing Cultural Awareness in LLMs: A Case Study of Cross-Culture Aesthetic Stylistics
Large Language Models (LLMs) are increasingly deployed in diverse cultural contexts, yet their ability to master aesthetic stylistics, i.e., the strategic use of language to evoke cultural resonance, remains underexplored. We curate C4STYLI, a benchmark of highly stylized translated movie titles and...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27296v1
Separating Semantic Competition from Context Length in RAG Reading
Retrieval-augmented generation (RAG) systems can respond incorrectly even when the correct passage was retrieved. The model must still read the retrieved passages and identify which one contains the answer among others that look relevant. This passage-reading model is called the reader. Does it fail...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27294v1
The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System
In modern RAG pipelines, query augmentation methods such as HyDE and query expansion are applied to every query, resulting in substantial LLM inference costs and increased end-to-end latency. The empirical justification for this overhead in real production traffic remains largely unexplored. We pres...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27220v1
GraphReview: Scientific Paper Evaluation via LLM-Based Graph Message Passing
Scientific paper evaluation often involves not only assessing a manuscript itself, but also relating it to contemporaneous research and prior literature. However, existing LLM-based methods typically model these signals separately and lack a unified mechanism for propagating review evidence across p...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27204v1
Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation
Distilling demonstration effects into hidden-space interventions offers a lightweight alternative to full finetuning. However, existing multimodal variants are mostly evaluated on short-form tasks, where outputs end after a few tokens. Extending these methods to long-form generation exposes a fundam...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27194v1
LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring
Aligning LLMs for math tutoring typically requires RL-based training with multi-GPU infrastructure. We investigate whether training-free prompt optimization-evolving only the system prompt via API calls-can serve as a practical alternative. We adapt 7 published methods and propose 5 education-specia...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27088v1
On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning
Counterfactual tuning (CFT) has emerged as a promising paradigm for Large Language Model (LLM) unlearning by training models to generate alternative fictitious knowledge in place of undesired content. However, in this work, we find that this paradigm still underperforms other paradigms in some aspec...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27083v1
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents
Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficul...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27068v1
Large Language Model-Powered Query-Driven Event Timeline Summarization in Industrial Search
Understanding how events evolve over time is essential for search engines handling queries about trending news. We present QDET (Query-Driven Event Timeline Summarization), a production system deployed on Baidu Search that constructs focused event timelines to explain specific query events. Unlike t...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27066v1