AI News Archive: May 27, 2026 — Part 19

Sourced from 500+ daily AI sources, scored by relevance.

BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual Classifiers
Visual data from the Web power image classifiers, which often underpin many web services, such as recommendation and content moderation. However, the raw Web data often contain spurious correlations and social biases, and neural networks are known for their tendency to learn biases present in data. ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28450v1
Bayesian Gated Non-Negative Contrastive Learning
While Contrastive Learning (CL) has revolutionized self-supervised representation learning, its latent representations remain highly entangled and opaque, limiting their interpretability in safety-critical applications. We identify that a fundamental cause of this entanglement is the reliance on det...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28441v1
Adaptive Temporal Gating of Longitudinal Magnetic Resonance Imaging for Alzheimer's Prediction
Predicting conversion from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is critical for early intervention. Current deep learning paradigms predominantly rely on cross-sectional structural MRI, neglecting prognostic value in patient-specific anatomical trajectories. We introduce the T...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28397v1
Transfer learning RGB models to hyperspectral images with trainable tensor decompositions
Transfer learning makes it possible to use large vision networks on a variety of domains, by specializing their models' general filters to new tasks. However, these networks assume the input images to have 3 input channels, making them incompatible with multi- or hyperspectral images. Current approa...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28331v1
LV-OSD: Language-Vision-Complementary Open-Set Object Detection
Object detection is an important task in computer vision, which aims to detect the objects of interest. through the given category list or query images. In this work, we propose a new problem of language-visual-complementary open-set object detection (LV-OSD), i.e., using the flexible text-based and...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28271v1
From Pixels to Words -- Towards Native One-Vision Models at Scale
Current vision-language models (VLMs) typically stitch together separate image encoders and language decoders via multi-stage alignment, a modular framework that inevitably fragments pixel-level signals across frames and scatters early pixel-word interactions. In parallel, native VLMs, despite impre...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28820v1
AgenticCalling AI
Give your AI the power to make phone calls
🧰 ToolsMay 27, 2026https://www.producthunt.com/products/agenticcalling-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
World models for interactive video generation have largely focused on single-agent settings, where future observations are generated from a single control signal. However, many generated environments require multi-agent interaction: multiple players, robots, or embodied agents act simultaneously wit...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28816v1
HarmoVid: Relightful Video Portrait Harmonization
We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination intensity (relightful harmonization). Unlike images, acquiring labeled data for videos, where identical motions are recorded under different ligh...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28811v1
Ω-QVLA: Robust Quantization for Vision-Language-Action Models via Composite Rotation and Per-step Scaling
Vision-Language-Action (VLA) models unify perception, reasoning, and control within a single policy, yet their multi-billion-parameter backbones and diffusion-based action heads make on-device deployment prohibitively expensive. Prior quantization efforts offer only partial solutions, compressing th...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28803v1
Bias Leaves a Gradient Trail: Label-Free Bias Identification via Gradient Probes on Concept Decompositions
Vision classifiers can exploit spurious correlations, achieving high in-distribution accuracy yet failing under distribution shift. Existing approaches to bias mitigation and analysis often depend on curated datasets, spurious-attribute or group labels, or retraining, which may be infeasible once a ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28780v1
SeeGroup: Multi-Layer Depth Estimation of Transparent Surfaces via Self-Determined Grouping
Transparent objects are common in daily life, and it is important to understand their multilayer depth, including the transparent surface and the objects behind it. Existing methods for multilayer depth typically extend single-layer prediction. They define layers by the front-to-back ordering of 3D ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28735v1
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization
Despite the rapid progress of text-to-image (T2I) models, generating images that accurately reflect complex compositional prompts (covering attribute bindings, object relationships, counting) still remains challenging. To address this, we propose BiDPO, a framework to enhance T2I model's capability ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28615v1
JECA^2: Judgment-Explanation Consistent Adversarial Attack against Forensic Vision-Language Models
Forensic vision-language models (VLMs) have recently been developed to detect image tampering and provide natural-language explanations. However, their robustness against adversarial manipulation remains underexplored. Existing adversarial attacks typically aim to flip the model's binary judgment, w...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28609v1
Internally Referenced Low-Light Enhancement
Self-supervised low-light image enhancement (LLIE) is highly appealing as it eliminates the reliance on external paired data. However, the lack of external references causes networks to struggle with decoupling entangled illumination, delicate textures, and amplified noise. To resolve this challenge...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28605v1
Mining Multi-Modality Spatio-Temporal Cues for Video Important Person Identification
Identifying key individuals in video scenes is essential for applications such as automated video editing and intelligent surveillance. Current methods primarily focus on static images and immediate visual cues, overlooking the rich spatio-temporal information in videos. This leads to the phenomenon...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28604v1
Resolution-free neural surrogates for geometric parameterization and mapping with spatially varying fields
Many imaging problems require computing spatial transformations induced by spatially varying intensity, feature, or density fields. Canonical examples include distortion correction, deformable image registration, atlas-based segmentation, and deformation-driven image analysis. These tasks can be for...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28551v1
Janus-LoRA: A Balanced Low-Rank Adaptation for Continual Learning
Low-Rank Adaptation (LoRA) has emerged as a promising paradigm for Continual Learning. It independently updates its low-rank factors ($A$ and $B$), creating a composite update to the full weight matrix through their interaction. To prevent catastrophic forgetting, this update should remain orthogona...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28495v1
DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing
We study real-time audio-responsive character control as a deployment-faithful problem: strictly causal, bounded-latency streaming that must generate coherent full-body motion at interactive frame rates while the audio condition can change abruptly, including tempo shifts, drops, or user edits. Prio...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28491v1
SA4Depth: Consistent Pose-Depth Scale Alignment for Self-Supervised Monocular Depth Estimation
Self-supervised depth estimation from monocular sequences relies on the joint learning of a depth and a pose network. Despite abundant research done to improve the depth network, efforts on the pose remain limited. In this context, even when depth is estimated up to scale, we highlight the importanc...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28477v1
Self-Supervised Online Robot-Agnostic Traversability Estimation for Open-World Environments
Self-supervised online traversability estimation enables robots to continuously learn from unlabeled open-world experiences and adapt their navigation behavior toward safe and efficient trajectories. Existing approaches either rely on handcrafted proprioceptive traversability scores, limiting robot-...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28442v1
Anomaly as Non-Conformity via Training-Free Graph Laplacian Energy Minimization
Detecting subtle visual anomalies in images remains challenging, particularly when only normal samples are available a priori. Such unsupervised anomaly detection is typically solved by measuring feature similarity of a query patch to a memory of normal patches. However, similarity alone does not re...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28428v1
Cotypist
Local AI Autocomplete in your voice, anywhere on your Mac
🧰 ToolsMay 27, 2026https://www.producthunt.com/products/cotypist?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs
Latent reasoning enables reasoning over continuous hidden states rather than explicit tokens, avoiding the language bottleneck and inference overhead of chain-of-thought for medical VQA. However, existing methods suffer from modality collapse, insufficient visual supervision, and train-inference mis...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28422v1
EgoRelight: Egocentric Human Capture and Illumination Recovery for Relightable and Photoreal Avatar Rendering
Mixed Reality (MR) headsets promise a future of immersive telepresence where virtual humans blend indistinguishably into real or virtual surroundings. Achieving this vision requires a method for capturing a user's motion, estimating appearance under novel lighting, and understanding the environment ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28401v1
Sketch2Motion: Text-driven 2D Sketch to 3D Animation via Diffusion-guided Skeleton Optimization
Animation of 2D hand-drawn sketches provides an effective medium for visual communication. However, these sketches pose challenges, particularly in handling occlusions and accurately mapping motion. While 3D animation naturally addresses these challenges, estimating 3D motion remains a very complex ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28394v1
Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models
Vision-language segmentation models have recently achieved strong performance by leveraging high-level semantic object categories expressed in natural language. However, this semantic dependence limits their ability to reason about intrinsic visual properties such as shape, geometry, or texture, whi...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28348v1
Inpainting-Style Conditional Diffusion for Multivariable Time Series Forecasting
In this paper, we propose a novel conditional diffusion-based framework for multivariable time-series solar power forecasting. The proposed method reformulates temporal PV data as structured two-dimensional representations (images) using a sliding-window patch construction, enabling the application ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28324v1
EventShiftFlow: Towards Hardware-efficient FPGA-based Flow Estimation
Event-based vision sensors offer asynchronous, high-temporal-resolution measurements that are attractive for low-latency robotic perception, but many event-based motion estimation methods are computationally intensive and difficult to map to FPGA hardware. We present a streaming velocity estimator t...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28312v1
EchoAvatar: Real-time Generative Avatar Animation from Audio Streams
Real-time synthesis of high-fidelity 3D character motion from audio is a pivotal component for next-generation interactive avatars and virtual assistants. However, most existing approaches are limited to offline processing of complete audio sequences or are constrained to specific domains, rarely ha...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28272v1
Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects
Estimating the 9D pose of everyday objects from a single real-world image remains challenging. This is largely due to the lack of large-scale supervision. Most existing datasets either rely heavily on synthetic renderings or provide limited coverage of real-world objects: the largest real-world 9D p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28270v1
MORI-Seg: Learning Morphological Geometry for Instance Segmentation without Instance Annotations
Instance-level quantification of kidney functional units is essential for morphometric analysis, yet most publicly available pathology datasets provide only semantic segmentation annotations, where adjacent structures of the same class are merged into single regions. This prevents reliable instance-...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28261v1
GUI Agents for Continual Game Generation
Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game generation as one-shot translation from prompt to artifact, leaving interaction-level failures undetected. We argue that evaluating and improving game generation re...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28258v1
Category-Level 3D Correspondence in Camera Space via Morphable Object Priors
Understanding 3D objects from images is fundamental to robotics and AR/VR applications. While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions. In t...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28257v1
PointQ-Bench: Benchmarking Diagnostic and Interpretable Point Cloud Quality Assessment
Point cloud quality plays a critical role in 3D acquisition, reconstruction, rendering, and perception, yet existing point cloud quality assessment (PCQA) research remains largely centered on scalar score prediction. In practical inspection scenarios, quality assessment often involves identifying de...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28241v1
Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation
Semi-supervised referring expression segmentation (SS-RES) aims to achieve precise pixel-level language grounding under limited annotation, yet suffers from limited supervision and unreliable pseudo-labels when exploiting unlabeled image-text pairs. In this work, we propose Learning to Label, a rein...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28239v1
LLM Zeroth-Order Fine-Tuning is an Inference Workload
Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeated scoring under near...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28760v1
Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity
Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic fr...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28729v1
AI Web Scraper Builder
Lovable for Scrapers
🧰 ToolsMay 27, 2026https://www.producthunt.com/products/ai-web-scraper-builder?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Understanding Generalization and Forgetting in In-Context Continual Learning
In-context learning (ICL) derives its power from enabling Large Language Models to adapt to new tasks via prompt-based reasoning alone, entirely bypassing the need for parameter updates. Existing theories primarily study ICL in single-task settings, while real-world prompts often contain sequences o...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28705v1
Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations
Most existing expressivity theories for neural networks assume exact real arithmetic, whereas practical neural networks are executed under finite-precision floating-point arithmetic with implementation-dependent execution semantics. Recent works have begun studying the expressive power of floating-p...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28704v1
Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States
Many applications in quantum simulation, quantum chemistry, and quantum machine learning require not a single quantum state but an ensemble of states characterizing the heterogeneity of a target system. Preparing such ensembles state-by-state is prohibitive in both variational and fault-tolerant set...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28690v1
Optimal Data Acquisition for Reinforcement Learning: A Large Deviations Perspective
Data acquisition efficiency is a central challenge in deploying reinforcement learning in business and healthcare operations, where interactions are costly, slow, and often involve humans in the loop. This paper develops a unified large deviations framework for data acquisition in infinite-horizon r...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28675v1
Applications of temporal graph learning for predicting the dynamics of biological systems
Biological foundation models have shown strong performance in single-cell representation learning by applying transformer architectures directly to gene-expression matrices. However, these approaches predominantly operate in static settings and do not explicitly model the temporal evolution of devel...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28659v1
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few training instances, yet its strong sensitivity to which instances are used makes data selection a central bottleneck. Most existing selection pipelines rely on training-time optimization signals and/o...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28631v1
Learning High-Dimensional Parity Functions with Product Networks using Gradient Descent
Parity functions are fundamental Boolean operations with critical applications across machine learning, cryptography, and error correction. Yet, learning high-dimensional parity functions poses significant challenges: in a general setting, standard neural network architectures typically require expo...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28612v1
Dark Quest II: A Wide-Coverage Neural Network Emulator of the Nonlinear Matter Power Spectrum Across Extended Cosmologies
\textsc{DarkEmulator2} is a neural network emulator of the nonlinear matter power spectrum in a nine-dimensional $w_0 w_a νo \mathrm{CDM}$ parameter space, developed as the emulator component of the \textsc{Dark Quest II} (DQ2) program. It is trained on simulations generated with the \textsc{Ginkaku...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28596v1
PLS in the Mirror of Self-Attention
This note provides an interesting observation on casting partial least square (PLS) as a linearized self-attention so that PLS may be studied within the neural network paradigm. On the other hand, the dimensionality reduction and selection of predictors in PLS may indicate that self-attention includ...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28592v1
SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving
Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large Language Models (LLMs) demonstrate inherent latency in real-ti...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28583v1
A Generalized Tikhonov Layer for Interpretable-by-design Graph Neural Networks
We propose the Tikhonov layer, a graph neural network layer that is interpretable by design: once trained, its learned parameters directly reveal which node features and which aspects of the graph topology were leveraged for prediction. In practice, the layer's propagation matrix takes the closed-fo...
📄 ResearchMay 27, 2026http://arxiv.org/abs/2605.28578v1