AI News Archive: May 19, 2026 — Part 27

Sourced from 500+ daily AI sources, scored by relevance.

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20176v1
KoRe: Compact Knowledge Representations for Large Language Models
Modern Large Language Models (LLMs) have shown impressive performances in user-facing tasks such as question answering, as well as consistent improvements in reasoning capabilities. Still, the way these models encode knowledge seems inherently flawed: by design, LLMs encode world-knowledge within th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20170v1
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models
Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of \emph{inattentional blindness} in human cognition, we investigate whether LLMs, trained on human-preferred corpora that embed attentional biases, exhibit a similar limitation: \emph{f...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20128v1
Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP
Knowledge graph question answering seeks to translate natural language questions into executable queries over knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcom...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20066v1
Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents
Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscur...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20061v1
Mantle Chat
Collaboration platform where teams work with AI together
🧰 ToolsMay 19, 2026https://www.producthunt.com/products/mantle-chat?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration
Speculative decoding accelerates memory-bound LLM inference without quality degradation by using a fast drafter to propose multiple candidate tokens and the target model to verify them in parallel. However, conventional sequential speculative decoding suffers from mutual waiting between drafting and...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20022v1
Rethinking How to Remember: Beyond Atomic Facts in Lifelong LLM Agent Memory
To enable reliable long-term interaction, LLM agents require a memory system that can faithfully store, efficiently retrieve, and deeply reason over accumulated dialogue history. Most existing methods adopt an extracted fact based paradigm: handcrafted static prompts compress raw dialogues into atom...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19952v1
What Are LLMs Doing to Scientific Communication? Measuring Changes in Writing Practices and Reading Experience
Has the style of scientific communication changed due to the growing use of large language models in the writing process? We address this question in the domain of Natural Language Processing by leveraging two data resources we create: a naturalistic corpus of over 37,000 papers from the ACL Antholo...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19936v1
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning
Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools. We argue that to...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19852v1
CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models
In recent years, the black-box nature of deep learning models has limited their application in high-stakes domains such as medical diagnosis and finance, where interpretability is essential. To address this, we propose a novel approach using influence functions to enhance interpretability in NLP mod...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19848v1
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
Adverse weather (rain, fog, sand, and snow) degrades camera-based object detection in autonomous vehicles. Existing enhancement-then-detect approaches stall the safety-critical perception loop, violating hard real-time requirements. Progress on this problem is also constrained by an under-recognized...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19837v1
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an "acoustic robustness bottleneck": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19833v1
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
Recent attempts to support high-level scene interpretation and planning in Autonomous Vehicles (AVs) using ensembles of Large Language Models (LLMs) and Large Multimodal Models (LMMs) continue to treat time as a secondary property. This lack of temporal grounding leads to inconsistencies in reasonin...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19824v1
LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation
Legal proposition generation is central to legal reasoning and doctrinal scholarship, yet remain under-examined in Legal NLP. This paper investigates the automatic generation and evaluation of legal propositions from decisions of the Court of Justice of the European Union using large language models...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19815v1
Chunking German Legal Code
This paper investigates chunking strategies for retrieval-augmented generation on German statutory law, using the German Civil Code as a structured benchmark corpus. We implement and compare a range of segmentation approaches, including structural units (sections, subsections, sentences, proposition...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19806v1
Towards Trust Calibration in Socially Interactive Agents: Investigating Gendered Multimodal Behaviors Generation with LLMs
As Socially Interactive Agents (SIAs) become increasingly integrated into daily life, the ability to calibrate user trust to an agent's actual capabilities would help ensure appropriate usage of these agents. In this paper, we explore the capacity of Large Language Models (LLMs) to generate multimod...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19798v1
What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code
Code has become a standard component of modern foundation language model (LM) training, yet its role beyond programming remains unclear. We revisit the claim that code improves reasoning through controlled pretraining experiments on a 10T-token corpus with fine-grained domain separation. Our finding...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19762v1
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation
Graph-structured retrieval-augmented generation (RAG) systems can improve answer quality on multi-hop questions, but many current systems rely on large language models (LLMs) to extract entities, relations, and summaries during indexing. These calls add token and wall-clock costs that grow with corp...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19735v1
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges
Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning capabilities, understanding how well they perform mathematical reasoning...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19723v1
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets
Investor sentiment shapes financial markets, yet modeling sentiment in Arabic financial contexts remains challenging due to linguistic complexity and limited resources. We present an Arabic NLP framework for large-scale financial sentiment analysis tailored to the Saudi market, integrating official ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19714v1
CtrlOps
Deploy, Debug & Manage Linux Servers with AI.
🧰 ToolsMay 19, 2026https://www.producthunt.com/products/ctrlops?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Can Large Language Models Reliably Correct Errors in Low-Resource ASR? A Contamination-Aware Case Study on West Frisian
Automatic speech recognition (ASR) has improved substantially in recent years, yet performance remains limited for low-resource languages. Large language models (LLMs) have shown promise for improving ASR through generative error correction (GER), but their effectiveness in low-resource settings rem...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19711v1
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
The rapid advancement toward long-context reasoning and multi-modal intelligence has made the memory footprint of the Key-Value (KV) cache a dominant memory bottleneck for efficient deployment. While the established per-channel quantization effectively accommodates intrinsic channel-wise outliers in...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19660v1
K-Quantization and its Impact on Output Performance
Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP tasks. However, their substantial size often presents challenges for deployment. This necessitates efficient techniques for model compression, with quantization emerging as a prominent solution. De...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19645v1
optimize_anything: A Universal API for Optimizing any Text Parameter
Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19633v1
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require conclusions to follow strictly from stated premises. Many existing logical-reasoning benchmarks are generated by templating natural-language items from sampled formulas, pro...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19597v1
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogene...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19577v1
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
Self-evolving skill libraries face a silent failure mode we term \emph{library drift}: unbounded skill accumulation without outcome-driven lifecycle management causes retrieval degradation, false-positive injections, and performance stagnation. Recent evaluation confirms the symptom--LLM-authored sk...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19576v1
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters
Vision-Language Models (VLMs) have demonstrated remarkable proficiency in general multi-modal understanding; yet they struggle to efficiently acquire continually evolving domain-specific skills. Conventional approaches to enhancing VLM capabilities, such as Supervised Fine-Tuning (SFT), require exte...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19523v1
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception a...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20177v1
Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation
We present an orthography-aware error analysis of Japanese past-tense morphological inflection, treating hiragana not merely as a transcriptional medium, but as a representational system encoding morphophonological distinctions that may influence model generalization. We evaluate two character-level...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20043v1
Where Does Authorship Signal Emerge in Encoder-Based Language Models?
Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and fun...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19908v1
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fine-grained comprehension crucial for real-world applications requiring nuanced interpretation of human actions and interactions. While some recent human-centric ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19846v1
Synthesis and Evaluation of Long-term History-aware Medical Dialogue
An effective healthcare agent must be able to recall and reason over a patient's longitudinal medical history. However, the absence of datasets with realistic long-term dialogue timelines limits systematic evaluation. Real clinical text is constrained by privacy and ethics, while existing benchmarks...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19766v1
TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection
Graph Anomaly Detection (GAD) aims to identify atypical graph entities, such as nodes, edges, or substructures, that deviate significantly from the majority. While existing text-rich approaches typically integrate structural context into the data representation pipeline using raw textual features, t...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19738v1
DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift
We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whos...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19231v1
PollyReach
Give your agent a real number and voice to make calls.
🧰 ToolsMay 19, 2026https://www.producthunt.com/products/pollyreach?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
AffectVerse: Emotional World Models for Multimodal Affective Computing
Humans infer emotions by integrating observed multimodal cues with expectations about how affective states may unfold. Existing multimodal large language models (MLLMs), however, often treat emotion recognition as static fusion over complete audiovisual-text inputs, leaving affective dynamics implic...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19950v1
WoundFormer: Multi-Scale Spatial Feature Fusion for Multi-Class Wound Tissue Segmentation
Chronic wounds such as diabetic foot ulcers and pressure injuries require accurate tissue-level assessment to guide treatment planning and monitor healing progression. While deep learning methods have advanced automated wound analysis, most existing approaches focus on binary segmentation and inadeq...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19868v1
Landscape-Awareness for Geometric View Diffusion Model
Accurate camera viewpoint estimation under sparse-view conditions remains challenging, particularly in two-view scenarios. Recent approaches leverage diffusion models such as Zero123 to synthesize novel views conditioned on relative viewpoint, showing promising results when repurposed for viewpoint ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19865v1
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models
Vision-language models (VLMs) have rapidly evolved into general-purpose multimodal reasoners with strong zero-shot generalization. In this context, VLMs could greatly benefit the analysis of human gaze and attention, a central task in human behavior understanding that requires reasoning about the ph...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19859v1
LaCoVL-FER: Landmark-Guided Contrastive Learning Network with Vision-Language Enhancement for Facial Expression Recognition
Facial Expression Recognition (FER) in the wild is still challenging due to uncontrolled variations in pose, occlusion, and illumination. Most existing attention-based methods primarily rely on visual appearance cues, suffering from attention redundancy and instability, which limits their performanc...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19821v1
PiG-Avatar: Hierarchical Neural-Field-Guided Gaussian Avatars
Existing Gaussian avatar methods typically parameterize geometry on a body-template surface, which entangles the avatar's representation space with the template's deformation space and limits the capture of layered, off-body, and non-rigid clothing geometry. We present PiG-Avatar, which addresses th...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20185v1
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation
Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid eva...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20183v1
TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization
Training 3D Gaussian Splatting (3DGS) at billion-primitive scale is fundamentally memory-bound: each Gaussian primitive carries a large attribute vector, and the aggregate parameter table quickly exceeds GPU capacity, limiting prior systems to tens of millions of Gaussians on commodity single-GPU ha...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20150v1
PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset
Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image ge...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20147v1
Spatially Prompted Visual Trajectory Prediction for Egocentric Manipulation
Robotic manipulation is often specified through language instructions or task identifiers, yet cluttered environments with similar objects are better handled by spatially indicating what to move and where to place it. Addressing the vision-centric challenge of object and goal specification, we prese...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20085v1
OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives
3D Gaussian Splatting (3DGS) provides an explicit and efficient scene representation, but its primitives lack inherent object-level identity, hindering downstream tasks such as open-vocabulary scene understanding. Existing methods typically address this by either distilling high-dimensional feature ...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.20044v1
CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing video generation mod...
📄 ResearchMay 19, 2026http://arxiv.org/abs/2605.19995v1