AI News Archive: May 26, 2026 — Part 18

Sourced from 500+ daily AI sources, scored by relevance.

FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions
State-of-the-art performance for Automatic Speech Recognition (ASR) largely depends on the availability of large-scale labeled corpora. This creates a demand for increased data collection efforts, particularly for under-represented languages and dialectal varieties. Due to having considerably fewer ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27062v1
ExTax: Explainable Disinformation Detection via Persuasion, Emotion, and Narrative Role Taxonomies
The democratization of LLMs has accelerated the generation and circulation of highly fluent disinformation, making traditional syntax-semantic verification increasingly insufficient. Such deception rarely relies solely on surface-level falsity; instead, it often combines persuasive rhetoric, emotion...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27045v1
AI Ad Workflows
Generate 100s of ads a day with AI workflows
🧰 ToolsMay 26, 2026https://www.producthunt.com/products/makeugc-2?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Attribute-Based Diagnosis of LLM Alignment with Hate Speech Annotations
Hate speech annotation is costly, subjective, and prone to annotator disagreement, making large-scale dataset construction challenging. We systematically analyze how well large language models (LLMs) align with human judgments across ten theoretically grounded subjective attributes, such as dehumani...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27025v1
Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination
Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to quantify model confidence and are often implicitly treated as proxies f...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27016v1
PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
Despite impressive multilingual capabilities, large language models (LLMs) remain poorly evaluated on literary knowledge in non-English languages. We introduce PersLitEval, a benchmark of 4,514 Persian literature multiple-choice questions across eight fine-grained categories spanning spelling, liter...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27015v1
Semantic Gradients Interactions in SSD: A Case Study in Racial Identity and Hate Speech
We introduce interaction SSD, an extension of Supervised Semantic Differential that models how semantic meaning varies across moderators such as groups, traits, or conditions making this variation testable and interpretable. The method estimates a main semantic gradient, an interaction gradient, and...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27322v1
When Does Demographic Information Help? Data and Modeling Regimes for Perspective-Aware Hate Speech Detection
Demographic information is often used to model annotator perspectives in subjective tasks such as hate speech detection, but its benefit is inconsistent: it improves performance in some settings and behaves as noise in others. This paper asks when demographic features help. We analyze demographic ga...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27313v1
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models
Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through shortcuts or prior familiarity with a chart based on their own background knowledge. To strictly evaluate visual reasoning, we propose counte...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27311v1
Self-Ensembling Vision-Language Models for Chart Data Extraction
Charts effectively convey quantitative information, but the underlying data are often locked in image form, hindering reuse and analysis. Manually digitizing charts is time-consuming and error-prone, motivating automatic chart-to-table extraction. Recent approaches use specialized vision-language mo...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27298v1
ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents
Memory-augmented language agents are increasingly deployed in affective applications such as emotional support, where understanding and responding to users' latent emotional needs is critical. However, existing research often treats memory as a tool for factual retrieval, overlooking its role in sha...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27240v1
EpiCurveBench: Evaluating VLMs on Epidemic Curve Digitization
Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-value pairs, ignoring the temporal structure of time series and penal...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27195v1
Beyond Binary: Speech Representations Across the Cognitive Score Hierarchy
This study examines the relationship between speech representations and the hierarchical structure of cognitive assessment in mild cognitive impairment. Utilizing 5,754 German neuropsychological assessment recordings, we evaluate six cognitive tasks across three score levels: task, domain, and globa...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27189v1
MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation
Large language models often solve tasks from a fully specified prompt but degrade when the same requirements unfold over multiple turns, known as the lost-in-conversation (LiC) gap. We trace part of this degradation to self-contamination: intermediate assistant replies enter later context and carry ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27186v1
BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning
In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed exampl...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27110v1
Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models
A key capability for video understanding is reliably linking subjects to events across time, yet whether Video Large Language Models (VideoLLMs) actually achieve this remains unclear. In this work, we introduce DistractionBench to evaluate whether VideoLLMs can robustly link subjects and events in t...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27101v1
E3: Issue-Level Backtesting for Automated Research Critique
We present E3, an automated review assistant that augments reviewers and engineering teams by identifying decision-relevant technical concerns in research papers. For each concern, E3 reports its nature, its location, its bearing on the contribution, and the analysis or evidence that would resolve i...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27072v1
BhashaSetu: A Data-Centric Approach to Low-Resource Machine Translation
We present BhashaSetu, a linguistically enriched English--Marathi parallel dataset addressing persistent data limitations in low-resource neural machine translation (NMT). Marathi, spoken by over 95 million people, remains underrepresented in high-quality parallel corpora across diverse domains. Our...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27050v1
SoftCap: Soft-Budget Control for Diffusion Transformer Acceleration
Diffusion Transformers (DiTs) achieve strong visual quality, but their iterative denoising process requires many costly Transformer evaluations. Training-free acceleration methods reduce this cost by caching, forecasting, or verifying intermediate features, yet the runtime decision of when to execut...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27075v1
Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and can...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27030v1
Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning
Repeated sampling with a verifier is the standard way to allocate test-time compute for code generation, with pass@$K$ as the canonical metric. Yet the standard policy class draws $K$ independent samples from a single answer distribution, so attempts often collapse onto near-duplicate reasoning path...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27000v1
Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals
Prompt injection poses a critical threat to the safe deployment of large language models, yet existing detection approaches are typically evaluated under limited settings that do not reflect real-world operating constraints. In this work, we present a deployment-aware evaluation of prompt injection ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.26999v1
PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech
Text-to-speech (TTS) evaluation for low-resource non-Latin-script languages can fail when it relies on a single ASR round-trip word error rate (WER). A system may produce no audio, speak a neighbouring language, preserve target script text only in an ASR transcript, or sound unnatural to native list...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.26978v1
Recon: Reconstruction-Guided Reasoning Synthesis for User Modeling
User modeling aims to use language models (LMs) to mimic an individual's behavior from a corpus of past context-action pairs (e.g., conversation turns), enabling the simulation of users in settings like behavioral science, human-AI collaboration, and market research. Recent approaches augment these ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.26969v1
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and sp...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27367v1
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning
Cross-view spatial reasoning remains a weak spot for vision-language models (VLMs): they often reason in language and lose the fine-grained geometry needed for the task. Thinking with images aims to address this by generating an intermediate thinking image, but recent work shows that models often ig...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27310v1
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini
We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all t...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27295v1
A Dynamic Programming Framework for Discovering Count and Values of Multilevel Image Thresholding
Multilevel Image thresholding is an important preprocessing algorithm in computer vision applications nowadays. Since most common thresholding methods take the desired count of thresholds as input by the user, thresholding methods that automatically determines a suitable count of thresholds from the...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27287v1
Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models
Large vision-language models increasingly rely on long-context modeling to reason over documents, hour-level videos, and long-horizon agent trajectories, requiring them to locate relevant evidence across interleaved text and images. Prior work has studied this behavior using retrieval heads in large...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27243v1
Touch-R1: Reinforcing Touch Reasoning in MLLMs
While rule-based reinforcement learning has recently catalyzed explicit reasoning in multimodal models, tactile reasoning remains largely underexplored. Existing tactile-language models primarily rely on supervised or contrastive objectives, which limits their capacity to ground predictions in physi...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27154v1
Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification
Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis. However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27146v1
Is an Image Also Worth 16x16=256 Superpixels? A Framework for Attentional Image Classification
Superpixel-based image classification has traditionally leveraged graph neural networks (GNNs) for processing irregular image representations. Recent advances in computer vision, driven by Vision Transformers (ViTs), have introduced new paradigms in self-attentional models, surpassing convolutional ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27144v1
Leveraging Visual Signals for Robust Token-Level Uncertainty in Vision-Language Generation
Uncertainty quantification (UQ) remains a critical challenge in Large Vision Language Models (LVLMs) for reliable predictions and real-world deployment. However, most existing methods are adapted from the LLM literature and primarily focus on the language modality, leaving the contribution of visual...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27136v1
Do Modern Post-Hoc Watermarking Methods Beat Broken-Arrows?
With the rapid proliferation of generative models, such as diffusion models, digital watermarking has emerged as a crucial solution for identifying AI-generated images. Modern post-hoc watermarking schemes use neural networks to achieve an extremely low false-alarm rate while remaining robust to com...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27135v1
Gateplex
Real-time governance firewall for autonomous AI agents
🧰 ToolsMay 26, 2026https://www.producthunt.com/products/gateplex?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
YOLO26-RipeLoc Lite: A lightweight architecture for tomato ripeness detection and picking point localization in greenhouse robotic harvesting
In greenhouse tomato production, automated harvesting requires accurate detection of ripe tomatoes, ripeness classification, and precise picking-point localization for robotic end-effectors. This paper proposes YOLO26-RipeLoc Lite, a lightweight deep learning architecture based on YOLO26 for simulta...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27129v1
COVD: Continual Open-Vocabulary Object Detection with Novel Concept Injection
Open-vocabulary object detection (OVD) has made significant progress, enabling detectors to generalize from seen to unseen categories. However, real-world category spaces continually evolve, and existing OVD models still struggle with newly emerging concepts, while repeated full retraining is prohib...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27116v1
Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate th...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27080v1
IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams
Recent multimodal large language models (MLLMs) achieve strong performance on reactive question answering, but real-world streaming assistants require proactive reasoning over continuous visual inputs. Existing benchmarks mainly study reactive or proactive interactions in isolated single-turn settin...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27074v1
Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models
The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infringements involving human-created data. Membership inference attacks (MIAs) have emerged as a promising tool for identifying unauthorized data usage during model...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27020v1
Revealing the core dimensions underlying representations in brains, behavior and AI
The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representations are often studied and compared through similarities between stimuli, current methods provide only limited access to the dimensions that shape these represe...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.26921v1
Feedforward 3D Editing Learns from Semantic-Part Transformation
3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack ...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27351v1
Towards Controllable Image Generation through Representation-Conditioned Diffusion Models
Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require extensively annotated...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27343v1
PARE: Pruning and Adaptive Routing for Efficient Video Generation
Video Diffusion Transformers (DiTs) generate high-quality videos but demand substantial compute due to wide blocks, deep architectures, and iterative sampling. Recent methods reduce cost by compressing width, depth, or sampling steps, but typically commit to a fixed architecture that cannot adapt to...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27336v1
Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning
Video spatial reasoning requires accumulating viewpoint-dependent evidence over time while retaining information useful to the question being asked. Existing spatial video-language models improve geometric perception and long-range context modeling, but often treat memory as a generic temporal cache...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27318v1
PlayClass: Automated Play Behaviour Classification in Poultry
Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored. To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video. The pipeline leverages long-dur...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27304v1
MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we pres...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27235v1
Unsupervised Deep Image Prior for Sparse-View and Limited-Angle Electron Tomography
Electron tomography (ET) plays an important role in the three-dimensional (3D) characterization of nanomaterials. However, under limited-angle and sparse-view conditions, conventional algorithms produce degraded reconstructions, which compromise the quality and interpretability of resulting 3D data....
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27139v1
PILOT: A Data-Free Continual Learning Approach for Real-Time Semantic Segmentation via Boundary Guidance
Real-time semantic segmentation models offer an excellent balance between accuracy and inference speed. However, deploying these models in dynamic real world environments often requires the ability to learn novel classes incrementally without retraining on the entire dataset. This capability is know...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27128v1
JLT: Clean-Latent Prediction in Latent Diffusion Transformers
Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression ha...
📄 ResearchMay 26, 2026http://arxiv.org/abs/2605.27102v1