AI News Archive: May 27, 2026 — Part 18

Sourced from 500+ daily AI sources, scored by relevance.

Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI
Critical decision-making in socially consequential spaces is increasingly involving AI systems at varying capacities. Yet, despite the ubiquity of autonomous systems, most approaches to handling autonomous moral decision-making resort to scalar or binary judgments. These methods are insufficient for...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28707v1
VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora
Existing benchmarks have laid the foundation for travel planning agents by establishing API-centric paradigms. However, as the capabilities of Autonomous Agents continue to advance, their evaluation must evolve beyond simple tool execution toward handling the inherent complexities of the open web. C...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28683v1
Sense Representations Are Inducible Interfaces
Sense representations (explicit, per-token meaning decompositions) are useful for disambiguation, steering, and cross-lingual alignment, but existing approaches require models to be pretrained with sense structure baked in. We introduce ACROS, which induces an explicit sense pathway into a frozen pr...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28669v1
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
Scientific research proceeds through iterative cycles of hypothesis generation, experiment design, execution, and revision. AI agents can automate parts of this process, but existing approaches typically follow a single research trajectory or coordinate through a central planner with fixed objective...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28655v1
The Attentional White Bear Effect in Transformer Language Models
Instruction-based suppression is widely used to prevent language models from generating prohibited content, yet it remains unclear whether suppression reduces internal representation or merely suppresses expression. We investigate this question through representational probing, attention analysis, a...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28639v1
Measuring Form and Function in Language Models
We introduce quantitative metrics for child language acquisition to evaluate language models. Our focus is on the formal syntactic and functional discourse properties of determiners in English, which young children acquire early and accurately. We propose Contextual Alternative Choice (CAC), a new p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28616v1
Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration
Irregular multivariate time series forecasting is critical in many real-world applications, where time series are irregularly sampled and exhibit dynamically evolving missingness patterns. Although existing methods perform well in offline settings, they often suffer from significant performance degr...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28603v1
Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation
This position paper argues that the AI/ML community should stop overclaiming and retire the label "positive backdoor," and instead treat trigger-activated hidden behaviors as Secret Alignment. Crucially, protective claims based on Secret Alignment should be presumed not secure by default unless supp...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28597v1
Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification
Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertainty quantification (UQ) methods have emerged as a promising approach for detecting hallucinations in natural language generation, but their effectiveness for code ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28500v1
Thermodynamic properties of chemically disordered compounds via AI-driven estimation of partition function with the PULSE method
In this article, we present an improved version of the PULSE method (Partition function Unsupervised Learning Sampling and Evaluation) for estimating the thermodynamic properties of chemically disordered compounds. The aim is to reduce the computational cost of Monte Carlo approaches for this type o...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28594v1
VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading
Large language models (LLMs) have become increasingly useful computational models of human language processing, but it remains unclear whether vision-language learning makes text representations more human-like during natural reading. Here, we address this question by comparing tightly matched LLM a...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28818v1
Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization
Free-text explanations extend human label variation (HLV) beyond label disagreement by revealing the reasoning and preferences behind annotators' decisions. We study whether large language models (LLMs) can learn and reproduce such annotator-specific label-explanation behavior. Using two sentence-pa...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28802v1
Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay
Discourse particles, such as \textit{well} and \textit{kind of}, are crucial components that enable LLMs to ``speak'' more like humans. They are used to convey emotions, intentions, and interpersonal meanings. However, existing studies have not yet built a comprehensive understanding of LLMs' capabi...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28782v1
Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
LLMs' linguistically expressed confidence should faithfully reflect their intrinsic uncertainty. While recent work shows LLMs struggle to use epistemic markers (e.g., "it is likely...") in a human-aligned fashion, it remains unclear whether models can apply their own linguistic confidence framework ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28778v1
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
Vision-language models with extended reasoning succeed on complex problems, but many real-world problems require external tools that internal reasoning alone often cannot resolve. Agentic reasoning therefore interleaves two behaviors with a structural asymmetry: thinking (the self-contained default)...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28774v1
Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests
A general-purpose language model that answers a harmful question returns text; a coding model that complies with a malicious request can return a working weapon -- a keylogger, a ransomware stub, an exploit that runs as written. This asymmetry in the severity of a single act of compliance implies co...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28734v1
Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing
LLMs increasingly require surgical model editing to enhance domain-specific capabilities without incurring the computational cost or catastrophic forgetting associated with full fine-tuning. Sparse Autoencoders (SAEs) have emerged as a promising tool in this setting, in principle allowing for featur...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28649v1
GraphSteal: Structural Knowledge Stealing from Graph RAG via Traversal Reconstruction
Retrieval-Augmented Generation (RAG) enhances LLMs by grounding generation in query-relevant external evidence. Beyond unstructured text corpora, Graph RAG integrates knowledge graphs into the retrieval pipeline, enabling LLMs to access entities, relations, and multi-hop dependencies encoded in stru...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28645v1
Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents
Recent advancements in multimodal large language models (MLLMs) have shown exceptional potential in enabling mobile-using agents to autonomously execute human instructions. However, fully automated agents often try to execute tasks even when they are unable to resolve them, leading to the problem of...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28629v1
Verified Misguidance: Measuring Structural Citation Failures in Search-Augmented LLMs
Users of search-augmented LLMs rely on citations as evidence that responses are grounded in real sources, and rarely verify the cited pages themselves. Millions of queries per day now pass through these systems, making citation quality a silent determinant of whether users are informed or misled-yet...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28565v1
Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards
Reinforcement Learning from Verifiable Rewards (RLVR) has improved language models in domains such as mathematics and code, where correctness can be checked automatically. However, many important tasks are only partially verifiable: prompts contain multiple requirements, responses may satisfy some b...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28561v1
Cultural Binding Heads in Language Models
LLMs often default to equal treatment across cultural groups, even though context warrants differentiation: this is a lack of difference awareness. Using mechanistic interpretability and a factorial design on the N4 cultural appropriation benchmark from Wang et al. (2025), we identify 2-3 mid-layer ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28543v1
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world knowledge about GUI operations. Existing solutions typically rely on expensive multi-agent scaffolding o...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28534v1
Entropy-aware Masking for Masked Language Modeling
Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture both syntactic and ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28526v1
Curlo
Local AI search to find SFX and music by describing it
🧰 ToolsMay 27, 2026https://www.producthunt.com/products/curlo?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents
Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28465v1
AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates
DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses dispreferred responses substantially faster than...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28440v1
Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts
Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR h...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28438v1
Roles with Rails: Contract-Preserving Role Evolution in Multi-Agent Structured Reasoning
Role-based LLM multi-agent systems need adaptive role pools, yet adapting such systems is not merely a matter of prompt optimization: roles often carry structural obligations, including capability coverage, message compatibility, validation, final-answer aggregation, and parser-compatible output pro...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28433v1
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skills for broad cognitive transfer and task-specific skills for dynamic execution. However, existing sk...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28424v1
PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be assessed through the stability-plasticity dilemma:...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28819v1
Self-Improving Language Models with Bidirectional Evolutionary Search
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse veri...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28814v1
The Abstraction Gap in Vision-Language Causal Reasoning
Vision-language models (VLMs) generate fluent causal explanations, but current evaluations cannot distinguish linguistic plausibility from faithful causal reasoning. We introduce a dual-probe methodology that isolates these properties. The Text-Only Probe measures linguistic quality. The Chain-Text ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28779v1
Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection
Safety detection models require examples of HHH (Helpful, Harmless, Honest)-violating outputs for robust generalization, however such examples are scarce. Activation Steering (AS) has emerged as a data-efficient method for generating target-concept-aligned responses. We investigate whether AS can ge...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28664v1
MaskClaw: Edge-Side Personalized Privacy Arbitration for GUI Agents with Behavior-Driven Skill Evolution
GUI agents rely on screenshots to infer intent and operate across applications, but these screenshots often contain private messages, medical records, payment credentials, and workplace-specific workflows. Privacy decisions in this setting depend on task, recipient, application state, and user role,...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28646v1
GraphLit: Learning Text-Enriched Dynamic Character Network Representations for Literary Study
Methods to represent literary texts as graphs or sequences of graphs mainly focus on representing character interactions, and often overlook another crucial aspect: the textual context in which characters interact. We introduce Dynamic Heterogeneous Character Networks (DHCNs), which organize long no...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28643v1
ClinicalEncoder26AM: A Multlilingual Diagnosable ColBERT Model; Evidences from the MultiClinNER Shared Task
ClinicalEncoder26AM is a multilingual Diagnosable ColBERT for clinical and biomedical texts, which aligns at multiple levels its token-level semantic with ClinicalMap25, a clinical latent space inspired by BioLORD-2023 and enriched with synthetic and annotated supervision. The post-training recipe b...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28521v1
On Compositional Learning Behaviours in Formal Mathematics
Self-evolving scientific agents capable of conquering the hard tail of formal mathematics require Compositional Learning Behaviours (CLBs) -- the capacity to ground and recombine novel symbolic structures in context, beyond mere recombination of prelearned atoms. We propose \textbf{S2B-LM}, an adapt...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28512v1
AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning
Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similarity between visual and textual embeddings obtained from template prompts, e.g., ``a photo of a [CLASS]''. This seemingly monolithic matching...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28809v1
Self-Prophetic Decoding to Unlock Visual Search in LVLMs
Large Vision-Language Models (LVLMs) are rapidly evolving toward true multimodal reasoning, with visual search representing a concrete instantiation of the thinking-with-images paradigm. However, LVLM visual search faces two key challenges: incompatibility among intrinsic capabilities after post-tra...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28741v1
Harbor
CLI + companion App to spin up complete local LLM stacks
🧰 ToolsMay 27, 2026https://www.producthunt.com/products/habor?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
Diffusion Transformers achieve strong video generation quality, but the quadratic cost of full attention limits efficiency. We introduce OSP-Next, an efficient text-to-video generation model that integrates sparse attention, parallelism, quantization, and reinforcement learning. OSP-Next uses a hybr...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28691v1
EntroAD: Structural Entropy-Guided Prompt Adaptation for Zero-Shot Anomaly Detection
Zero-Shot Anomaly Detection (ZSAD) aims to detect anomalies in unseen domains without target-domain adaptation. Recent CLIP-based methods have shown promising performance by leveraging prompt learning and visual-text alignment. However, most existing approaches rely on a single adaptation pathway, w...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28630v1
A Multiscale Kinetic Framework for Image Segmentation: From Particle Systems to Continuum Models
In this work, we present a multiscale kinetic framework for consensus-based image segmentation. By interpreting an image as a system of interacting particles, each pixel is characterised by its spatial position and an internal feature encoding color information. We introduce a coupled interaction sc...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28619v1
Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation
Understanding dynamic 3D environments is essential for safe autonomous driving, particularly when reasoning about human-centric, nonrigid agents. However, existing weakly supervised occupancy prediction frameworks predominantly assume rigid-body motion and rely on simple frame-to-frame offsets, limi...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28587v1
GEM: Generative Supervision Helps Embodied Intelligence
Embodied Vision-Language Models (VLMs) have demonstrated impressive performance and generalization in robotics, particularly within Vision-Language-Action frameworks. However, a significant gap remains between the high-level semantic focus of standard text-guided pre-training paradigms and the low-l...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28548v1
DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
Pretrained foundation models have become an important basis for end-to-end autonomous driving. In contrast to vision-language models pretrained primarily on static image-text pairs, video generative models capture temporal dynamics and motion priors that are naturally suited for driving. We present ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28544v1
SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs
3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding together with dialog, QA, and captioning, yet many rely on a single pointer-style grounding decision that compresses a relational instruction into one selection...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28490v1
REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection
Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28459v1
Diffusion Large Language Models for Visual Speech Recognition
Existing Visual Speech Recognition (VSR) systems commonly rely on left-to-right autoregressive decoding, which can force premature decisions on visually ambiguous tokens before sufficient context is available. We propose DLLM-VSR, to the best of our knowledge, the first Diffusion Large Language Mode...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28456v1