AI News Archive: May 13, 2026 — Part 21

Sourced from 500+ daily AI sources, scored by relevance.

"It became a self-fulfilling prophecy": How Lived Experiences are Entangled with AI Predictions in Menstrual Cycle Tracking Apps
In menstrual cycle tracking apps (MCTAs), AI-based predictions and insights have become increasingly popular. These features enable users to receive personalized information about their bodies and mental states. However, there is currently little research on how these predictive AI features and expl...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13261v1
X-Restormer++: 1st Place Solution for the UG2+ CVPR 2026 All-Weather Restoration Challenge
In this work, we present our winning solution for the 8th UG2+ Challenge (CVPR 2026) Track 1: Image Restoration under All-weather Conditions. Our method is built upon the strong baseline framework X-Restormer, which effectively captures both channel-wise global dependencies and spatially-local struc...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13258v1
An Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computing
In cloud manufacturing, unmanned aerial vehicles (UAVs) can support both product collection and mobile edge computing (MEC). This joint operation forms a hybrid scheduling problem, where physical logistics decisions are coupled with computational task scheduling. In this paper, UAVs collect finished...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13221v1
McCast: Memory-Guided Latent Drift Correction for Long-Horizon Precipitation Nowcasting
Existing precipitation nowcasting methods typically adopt an autoregressive formulation, where future states are predicted from previous outputs. However, such an approach accumulates errors over long rollouts, causing forecasts to drift away from physically plausible evolution trajectories. Althoug...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13197v1
N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation
Improving the inference efficiency of autoregressive transformers typically means reducing FLOPs per token, usually through approximations that degrade model quality. We introduce N-vium, a mixture-of-exits transformer that partially parallelizes computation across depth on standard hardware, increa...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13190v1
Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning
Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding eve...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13153v1
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions
Data quality remains a critical bottleneck in developing capable, competitive models. Researchers have explored many ways to generate top quality samples. Some works rely on rejection sampling: generating lots of synthetic samples and filtering out low-quality samples. Other works rely on larger or ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13149v1
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
Proving theorems in Lean 4 often requires identifying a scattered set of library lemmas whose joint use enables a concise proof -- a task we call global premise retrieval. Existing tools address adjacent problems: semantic search engines find individual declarations matching a query, while premise-s...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13137v1
GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation metho...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13130v1
Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing
To achieve deep natural language understanding, syntactic constituent parsing plays a crucial role and is widely required by many artificial intelligence systems for processing both text and speech. A recent approach involves using standard sequence-to-sequence models to handle constituent parsing a...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13373v1
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation
Iterative self-refinement is a simple inference-time strategy for machine translation: an LLM revises its own translation over multiple inference-time passes. Yet document-scale refinement remains poorly understood: 1) which pipelines work best, 2) what quality dimensions improve, and 3) how refiner...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13368v1
LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs
Frontier assistant LLMs ship with strong guardrails: asked directly to write a persuasive essay denying the Holocaust, denying vaccine safety, defending flat-earth cosmology, arguing for racial hierarchies, denying anthropogenic climate change, or replacing evolution with creationism, they refuse. I...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13334v1
FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages
Financial decision-making in multilingual settings demands accurate numerical reasoning grounded in diverse modalities, yet existing benchmarks largely overlook this high-stakes, real-world challenge, especially for Indic languages. We introduce FinVQA, a benchmark for evaluating financial numerical...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13330v1
PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users
Personalisation is a standard feature of conversational AI systems used by millions; yet, the efficacy of personalisation methods is often evaluated in academic research using simulated users rather than real people. This raises questions about how users and their simulated counterparts differ in in...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13307v1
GAGPO: Generalized Advantage Grouped Policy Optimization
Reinforcement learning has become a powerful paradigm for post-training large language model agents, yet credit assignment in multi-turn environments remains a challenge. Agents often receive sparse, trajectory-level rewards only at the end of an episode, making it difficult to determine which inter...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13217v1
LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information
Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literat...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13188v1
GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language
We introduce GeoBuildBench, a benchmark designed to evaluate whether large language models and multimodal agents can ground informal natural-language plane geometry problems into executable geometric constructions. Unlike existing geometry benchmarks that focus on answer correctness or static diagra...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13167v1
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes
Long chain-of-thought (Long CoT) reasoning improves performance on multi-step problems, but it also induces overthinking: models often generate low-yield reasoning that increases inference cost and latency. This inefficiency is especially problematic in low-data fine-tuning regimes, where real appli...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13165v1
GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning
Distilling multi-step reasoning abilities from large language models (LLMs) into compact student models remains challenging due to noisy rationales, hallucinated supervision, and static teacher-student interactions. Existing reasoning distillation methods, including mentor-based approaches, predomin...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13136v1
Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition
Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and Malayalam across fo...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13087v1
TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints
The LLM-based generation of machine-readable outputs such as JSON has attracted significant attention for integration with external systems. However, existing approaches cannot strictly enforce the maximum number of tokens to be generated, leading to infinite generation or truncated outputs that cau...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13076v1
Context Training with Active Information Seeking
Most existing large language models (LLMs) are expensive to adapt after deployment, especially when a task requires newly produced information or niche domain knowledge. Recent work has shown that, by manipulating and optimizing their context, LLMs can be tailored to downstream tasks without updatin...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13050v1
Large Language Models Lack Temporal Awareness of Medical Knowledge
The existing methods for evaluating the medical knowledge of Large Language Models (LLMs) are largely based on atemporal examination-style benchmarks, while in reality, medical knowledge is inherently dynamic and continuously evolves as new evidence emerges and treatments are approved. Consequently,...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13045v1
Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction
BACKGROUND: Coding Motivational Interviewing (MI) sessions is essential for understanding client behaviors and predicting outcomes, but it requires substantial time and labor from trained MI professionals. Recent advances in audio-language models (ALMs) offer new opportunities to automate MI coding ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12987v1
Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2
Do large language models internally encode ontological relations in a formally verifiable algebraic structure? We introduce Algebraic Ontology Projection (AOP), which projects LLM hidden states into the Galois Field F2 under Liskov Substitution Principle constraints, using only 42 relational pairs a...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12968v1
DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging
Towards more general and human-like intelligence, large language models should seamlessly integrate both multilingual and multimodal capabilities; however, extending an existing multimodal model to many languages typically requires expensive multilingual multimodal data construction and repeated end...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12960v1
From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning
Supervised fine-tuning (SFT) data selection is commonly formulated as instance ranking: score each example and retain a top-$k$ subset. However, effective SFT training subsets are often produced through ordered curation recipes, where filtering, mixing, and deduplication operators jointly shape the ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12944v1
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: go...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12922v1
CommonWhy: A Dataset for Evaluating Entity-Based Causal Commonsense Reasoning in Large Language Models
To effectively interact with the real world, Large Language Models (LLMs) require entity-based commonsense reasoning, a challenging task that necessitates integrating factual knowledge about specific entities with commonsense inference. Existing datasets for evaluating LLM entity-based commonsense r...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12918v1
When Do LLMs Generate Realistic Social Networks? A Multi-Dimensional Study of Culture, Language, Scale, and Method
Large language models (LLMs) are increasingly used as substitutes for human subjects in behavioral simulations, including synthetic social network generation. Yet it remains unclear how their relational outputs depend on prompt design, cultural framing, prompt language, and model scale. Building on ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12898v1
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents
Large Language Model (LLM) agents are increasingly deployed in settings where they interact with a wide variety of people, including users who are unclear, impatient, or reluctant to share information. However, collecting real interaction data at scale remains expensive. The field has turned to LLM-...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12894v1
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the supporting evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the correct answer while ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12882v1
Persona-Model Collapse in Emergent Misalignment
Fine-tuning large language models on narrow data with harmful content produces broadly misaligned behavior on unrelated prompts, a phenomenon known as emergent misalignment. We propose that emergent misalignment involves persona-model collapse: deterioration of the model's internal capacity to simul...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12850v1
Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory
For over a decade, explicit memory architectures like the Neural Turing Machine have remained theoretically appealing yet practically intractable for language modeling due to catastrophic gradient instability during Backpropagation Through Time. In this work, we break this stalemate with \textit{Pha...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13370v1
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
Building Information Modeling (BIM) is widely used in the Architecture, Engineering, and Construction (AEC) industry, but the complexity of Industry Foundation Classes (IFC) limits accessibility for non-expert users. To address this, we introduce IfcLLM, a hybrid framework for natural language inter...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13236v1
Does language matter for spoken word classification? A multilingual generative meta-learning approach
Meta-learning has been shown to have better performance than supervised learning for few-shot monolingual spoken word classification. However, the meta-learning approach remains under-explored in multilingual spoken word classification. In this paper, we apply the Generative Meta-Continual Learning ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13084v1
Scaling few-shot spoken word classification with generative meta-continual learning
Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the potential of a spoken word classifier to sequentially ...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13075v1
The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI
The integration of Generative AI (GenAI) into language learning offers second language (L2) writers powerful tools for text optimization. However, pursuing native-like fluency often sacrifices sociopragmatic diversity. Investigating "pragmatic flattening" - the systematic erasure of culturally prefe...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13055v1
RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search
In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent b...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13052v1
Adaptive Steering and Remasking for Safe Generation in Diffusion Language Models
Diffusion Language Models (DLMs) provide a promising alternative to autoregressive language models by generating text through iterative denoising and bidirectional refinement. However, this iterative generation paradigm also introduces unique safety vulnerabilities when harmful tokens generated at i...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13043v1
Understanding and Accelerating the Training of Masked Diffusion Language Models
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling MDMs to larger models. Therefore, we ask the following questio...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13026v1
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
Geographic text, or textual data rich in geographic (geo-) information is a valuable source for various geographic applications, e.g., tourism management. Making such information accessible to speakers of other languages further enhances its utility; thus, accurate machine translation (MT) is essent...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12933v1
Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue
Effective collaboration between embodied agents requires more than acting in a shared environment; it demands communication grounded in each agent's evolving understanding of the world. When agents can only partially observe their surroundings, coordination without communication is provably hard, bu...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.12920v1
Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation
Text-to-image (T2I) generation has advanced rapidly, making reliable evaluation critical as performance differences between models narrow. Existing evaluation practices typically apply uniform annotation mechanisms, such as Likert-scale or binary question answering (BQA), across heterogeneous evalua...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13223v1
Does Engram Do Memory Retrieval in Autoregressive Image Generation?
The Engram module -- a hash-keyed, O(1) associative memory injected into Transformer layers -- was recently shown to improve large language model pretraining, with the appealing interpretation that it provides a content-addressed shortcut to recurring local token patterns. We ask whether this interp...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13179v1
A$_3$B$_2$: Adaptive Asymmetric Adapter for Alleviating Branch Bias in Vision-Language Image Classification with Few-Shot Learning
Efficient transfer learning methods for large-scale vision-language models ($e.g.$, CLIP) enable strong few-shot transfer, yet existing adaptation methods follow a fixed fine-tuning paradigm that implicitly assumes a uniform importance of the image and text branches, which has not been systematicall...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13161v1
Dual-Pathway Circuits of Object Hallucination in Vision-Language Models
Vision-language models (VLMs) have demonstrated remarkable capabilities in bridging visual perception and natural language understanding, enabling a wide range of multimodal reasoning tasks. However, they often produce object hallucinations, describing content absent from the input image, which limi...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13156v1
Understanding Generalization through Decision Pattern Shift
Understanding why deep neural networks (DNNs) fail to generalize to unseen samples remains a long-standing challenge. Existing studies mainly examine changes in externally observable factors such as data, representations, or outputs, yet offer limited insight into how a model's internal decision mec...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13148v1
On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods
Artificial intelligence (AI) has transformed imaging inverse problems, from medical diagnostics to Earth observation. Yet deep neural networks can produce hallucinations, realistic-looking but incorrect details, undermining their reliability, especially when ground truth data is unavailable. We deve...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13146v1
Multi-Modal Guided Multi-Source Domain Adaptation for Object Detection
General object detection (OD) struggles to detect objects in the target domain that differ from the training distribution. To address this, recent studies demonstrate that training from multiple source domains and explicitly processing them separately for multi-source domain adaptation (MSDA) outper...
📄 ResearchMay 13, 2026http://arxiv.org/abs/2605.13140v1