AI News Archive: May 4, 2026 — Part 12

Sourced from 500+ daily AI sources, scored by relevance.

ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring
Online advertising governance faces significant challenges due to the non-stationary nature of regulatory policies, where emerging mandates (e.g., restrictions on education or aesthetic anxiety) create severe label inconsistencies and reasoning ambiguities in historical datasets. In this paper, we p...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02200v1
Linearizing Vision Transformer with Test-Time Training
While linear-complexity attention mechanisms offer a promising alternative to Softmax attention for overcoming the quadratic bottleneck, training such models from scratch remains prohibitively expensive. Inheriting weights from pretrained Transformers provides an appealing shortcut, yet the fundamen...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02772v1
Unified Map Prior Encoder for Mapping and Planning
Online mapping and end-to-end (E2E) planning in autonomous driving remain largely sensor-centric, leaving rich map priors, including HD/SD vector maps, rasterized SD maps, and satellite imagery, underused because of heterogeneity, pose drift, and inconsistent availability at test time. We present UM...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02762v1
DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation
Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphS...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02759v1
Biological Spatial Priors Regularize Foundation Model Representations for Cross-Site MSI Generalization in Colorectal Cancer
Predicting microsatellite instability (MSI) status from routine hematoxylin and eosin (H&E) whole slide images (WSIs) offers a practical alternative to molecular testing, but models trained at one institution tend to generalize poorly to slides acquired at a different site. Foundation model represen...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02660v1
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE
We present Mamoda2.5, a unified AR-Diffusion framework that seamlessly integrates multimodal understanding and generation within a single architecture. To efficiently enhance the model's generation capability, we equip the Diffusion Transformer backbone with a fine-grained Mixture-of-Experts (MoE) d...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02641v1
Global-Local Feature Decoding with Adapter-Guided SAMv2 for Salient Object Detection
Salient Object Detection (SOD) remains an essential yet underexplored task in the era of large-scale vision models. Although foundation models like SAM exhibit strong generalization, their potential for SOD is not fully realized, and training or fully fine-tuning them is computationally expensive an...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02616v1
Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model
Source-Free Domain Adaptation (SFDA) adapts source models to target domains without accessing source data, addressing privacy and transmission issues. However, existing methods still initialize from a source pre-trained model and thus are not truly source-free. Recent works have introduced Vision-La...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02604v1
JEE-Verse
AI-architected 3D study & simulated market ecosystem.
🧰 ToolsMay 4, 2026https://www.producthunt.com/products/jee-verse?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Representation learning from OCT images
Optical Coherence Tomography (OCT) has become one of the most used imaging modality in ophthalmology. It provides high-resolution, non-invasive visualization of retinal microarchitecture. The automated analysis of OCT images through representation learning has emerged as a central research frontier....
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02589v1
Hyp2Former: Hierarchy-Aware Hyperbolic Embeddings for Open-Set Panoptic Segmentation
Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02580v1
Automated In-the-Wild Data Collection for Continual AI Generated Image Detection
The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this wor...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02567v1
Low-Latency Embedded Driver Monitoring System with a Multi-Task Neural Network
Road traffic accidents remain a significant global concern, with the majority attributed to human factors such as driver distraction and fatigue. This study proposes a camera-based approach to derive useful indicators to assess driver attentiveness and alertness. The proposed pipeline jointly satisf...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02563v1
Improving Model Safety by Targeted Error Correction
The widespread adoption of machine learning in critical applications demands techniques to mitigate high-consequence errors. Our method utilizes a dual-classifier GBDT pipeline to distinguish routine human-like errors from high-risk non-human misclassifications. Evaluated across three domains, anima...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02544v1
AlbumFill: Album-Guided Reasoning and Retrieval for Personalized Image Completion
Personalized image completion aims to restore occluded regions in personal photos while preserving identity and appearance. Existing methods either rely on generic inpainting models that often fail to maintain identity consistency, or assume that suitable reference images are explicitly provided. In...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02892v1
Laplacian Frequency Interaction Network for Rural Thematic Road Extraction
Rural thematic road network construction aims to extract topological road structures from movement trajectory images of agricultural machinery. However, this task faces challenges where downsampling methods commonly used in existing studies tend to blur the sparse high-frequency road structures, and...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02866v1
Pixel Perfect: Relational Image Quality Assessment with Spatially-Aware Distortions
Traditional image quality assessment (IQA) methods rely on mean opinion scores (MOS), which are resource-intensive to collect and fail to provide interpretable, localized feedback on specific image distortions. We overcome these limitations by shifting from absolute quality prediction to a relationa...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02863v1
Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion
Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression fr...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02849v1
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
Videos are unique in their ability to capture actions which transcend multiple frames. Accordingly, for many years action recognition was the quintessential task for video understanding. Unfortunately, due to a lack of sufficiently diverse and challenging data, modern vision-language models (VLMs) a...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02834v1
Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models
We propose a modular framework for hybrid image restoration that integrates transformer and state-space model (SSM) blocks with a focus on improving runtime efficiency on edge hardware. While transformers provide strong global modeling through self-attention, their attention kernels incur substantia...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02794v1
HumanSplatHMR: Closing the Loop Between Human Mesh Recovery and Gaussian Splatting Avatar
Accurately recovering human pose and appearance from video is an essential component of scene reconstruction, with applications to motion capture, motion prediction, virtual reality, and digital twinning. Despite significant interest in building realistic human avatars from video, this paper demonst...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02784v1
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
We present FoR-Net, a lightweight architecture for semantic segmentation that focuses on identifying and enhancing hard regions. Instead of relying on heavy global modeling, FoR-Net adopts an efficient strategy that selectively emphasizes informative regions through a learned importance map and a To...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02764v1
Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation
Vision-language-action (VLA) models typically rely on large-scale real-world videos, whereas simulated data, despite being inexpensive and highly parallelizable to collect, often suffers from a substantial visual domain gap and limited environmental diversity, resulting in weak real-world generaliza...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02757v1
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
Open-world text-guided class-agnostic counting (CAC) has emerged as a flexible paradigm for counting arbitrary object classes by using natural language prompts. However, current evaluation protocols primarily focus on standard counting errors within single-category images, overlooking a fundamental ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02752v1
YeeroAI
Every chat builds knowledge. Every thought branches freely.
🧰 ToolsMay 4, 2026https://www.producthunt.com/products/yeeroai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
SIAM: Head and Brain MRI Segmentation from Few High-Quality Templates via Synthetic Training
Synthetic training has recently advanced brain MRI segmentation by enabling contrast-agnostic models trained entirely on generated data. However, most existing approaches rely on hundreds of automatically labeled templates, introducing systematic biases and limiting their flexibility to incorporate ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02737v1
Temporally Consistent Object 6D Pose Estimation for Robot Control
Single-view RGB object pose estimators have reached a level of precision and efficiency that makes them good candidates for vision-based robot control. However, off-the-shelf methods lack temporal consistency and robustness that are mandatory for a stable feedback control. In this work, we develop a...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02708v1
AnchorD: Metric Grounding of Monocular Depth Using Factor Graphs
Dense and accurate depth estimation is essential for robotic manipulation, grasping, and navigation, yet currently available depth sensors are prone to errors on transparent, specular, and general non-Lambertian surfaces. To mitigate these errors, large-scale monocular depth estimation approaches pr...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02667v1
Human Activity Recognition Method for Moderate Violence Detection
Physical violence in public spaces is a significant public health concern, with minor incidents such as pushing often serving as precursors to more severe escalations. This research develops an automated system for the real-time detection of moderate physical violence, specifically pushing, in surve...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02659v1
AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding
Vision-Language Models (VLMs) have enabled autonomous GUI agents that translate natural language instructions into executable screen coordinates. However, grounding performance degrades in high-resolution interfaces, where dense layouts and small interactive elements expose a resolution gap between ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02630v1
Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval
Video Moment Retrieval (VMR) aims to localize temporal segments in videos that correspond to a natural language query, but typically assumes only a single matching moment for each query. This assumption does not always hold in real-world scenarios, where queries may correspond to multiple or no mome...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02623v1
StableMind: Source-Free Cross-Subject fMRI Decoding with Regularized Adaptation
Existing cross-subject fMRI decoding methods typically train a model on multiple scanned subjects and then adapt it to a new subject using substantial paired fMRI-image data. However, in realistic scenarios, new-subject fMRI data are often limited due to costly data acquisition, and raw data from pr...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02586v1
Stylistic Attribute Control in Latent Diffusion Models
Text-to-image diffusion models have revolutionized image synthesis and editing, but precise control over stylistic attributes remains a challenge, often causing unintended content modifications. We propose an approach for fine-grained parametric control of stylistic attributes in latent diffusion mo...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02583v1
Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI
Rotating-view thick-slice acquisition is highly SNR-efficient for mesoscale diffusion MRI (dMRI) but requires numerous rotating views to satisfy Nyquist sampling, resulting in long scan time. We propose a self-supervised Spatial-Angular Implicit Neural Representation (SA-INR) that reconstructs high-...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02575v1
TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification
Accurate badminton stroke prediction is crucial for fine-grained sports analysis and tactical decision support. However, existing methods struggle to model rich temporal context. This paper introduces \emph{TemPose-TF-ASF (Adjacent-Stroke Fusion)}, a context-aware extension of \emph{TemPose}. It enh...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02558v1
MooD: An Efficient VA-Driven Affective Image Editing Framework via Fine-Grained Semantic Control
Affective image editing (AIE) aims to edit visual content to evoke target emotions. However, existing methods often overlook inference efficiency and predominantly depend on discrete emotion representations, which to some extent limits their practical applicability and makes it challenging to captur...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02521v1
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
Single-image HDR reconstruction aims to recover high dynamic range radiance from a single low dynamic range (LDR) input, but remains highly ill-posed due to detail saturation in over-exposed regions and noise amplification in under-exposed areas. While recent diffusion-based approaches offer powerfu...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02464v1
M\textsuperscript{4}Fuse: Lightweight State-Space MoE with a Cross-Scale Gating Bridge for Brain Tumor Segmentation
Encoder-decoder imbalance and the reliance on large input volumes make many 3D brain tumor segmentation models both compute-heavy and brittle. We present M\textsuperscript{4}Fuse, a lightweight network that prioritizes discriminative brain tumor cues over exhaustive appearance reconstruction. Our me...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02444v1
Anomaly-Preference Image Generation
Synthesizing realistic and diverse anomalous samples from limited data is vital for robust model generalization. However, existing methods struggle to reconcile fidelity and diversity, often hampered by distribution misalignment and overfitting, respectively.To mitigate this, we introduce Anomaly Pr...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02439v1
Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection
Open-set supervised anomaly detection (OSAD) aims to identify unseen anomalies using limited anomalous supervision. However, existing prototype-based methods typically model normal data via a unimodal Gaussian prior, failing to capture inherent multi-modality and resulting in blurred decision bounda...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02438v1
Viral Boost AI
Your Personal AI Film Director in Your Pocket
🧰 ToolsMay 4, 2026https://www.producthunt.com/products/viral-boost-ai?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Unsupervised Machine Learning for Detecting Structural Anomalies in European Regional Statistics
Ensuring the coherence of regional socio-economic statistics is a central task for national statistical institutes. Traditional validation tools, such as range edits, ratio checks, or univariate outlier detection, are effective for identifying extreme values in individual series but are less suited ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02884v1
Multi-fidelity surrogates for mechanics of composites: from co-kriging to multi-fidelity neural networks
Composite materials exhibit strongly hierarchical and anisotropic properties governed by coupled mechanisms spanning constituents, plies, laminates, structures, and manufacturing history. This intrinsic complexity makes predictive modeling of composites expensive, because repeated experiments and hi...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02871v1
Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring
Understanding whether deep neural networks are effectively optimized remains challenging, as training occurs in highly nonconvex landscapes and standard metrics provide limited visibility into layer-wise learning quality. This challenge is particularly acute for transformer-based language models, wh...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02853v1
Universality in Deep Neural Networks: An approach via the Lindeberg exchange principle
We consider the infinite-width limit of a fully connected deep neural network with general weights, and we prove quantitative general bounds on the $2$-Wasserstein distance between the network and its infinite-width Gaussian limit, under appropriate regularity assumptions on the activation function....
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02771v1
Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs
Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visua...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02735v1
Federated Reinforcement Learning for Efficient Mobile Crowdsensing under Incomplete Information
Mobile crowdsensing (MCS) is a distributed sensing architecture that utilizes existing sensors on mobile units (MUs) to perform sensing tasks. A mobile crowdsensing platform (MCSP) publishes the sensing tasks and the MUs decide whether to participate in exchange for money. The MCS system is dynamic:...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02705v1
Robust and Fast Training via Per-Sample Clipping
We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex optimization problems ...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02701v1
ParaRNN: An Interpretable and Parallelizable Recurrent Neural Network for Time-Dependent Data
The proliferation of large-scale and structurally complex data has spurred the integration of machine learning methods into statistical modeling. Recurrent neural networks (RNNs), a foundational class of models for time-dependent data, can be viewed as nonlinear extensions of classical autoregressiv...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02692v1
Spectral Model eXplainer: a chemically-grounded explainability framework for spectral-based machine learning models
Spectral-based machine learning models have been increasingly deployed in chemometrics and spectroscopy, where predictive accuracy is as important as explainability. Current employed eXplainable Artificial Intelligence (XAI) methods are largely adapted from tabular or generic multivariate domains, a...
📄 ResearchMay 4, 2026http://arxiv.org/abs/2605.02684v1