AI News Archive: April 30, 2026 — Part 21

Sourced from 500+ daily AI sources, scored by relevance.

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27393v1
Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams
What shapes a consequential decision when human and artificial intelligence work on it together? The answer is becoming harder to see. A decision may look human-led after AI has set the frame, or appear automated while human judgment still carries decisive force. This paper offers a leadership-facin...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27392v1
Geometry-Calibrated Conformal Abstention for Language Models
When language models lack relevant knowledge for a given query, they frequently generate plausible responses that can be hallucinations, rather than admitting being agnostic about the answer. Retraining models to reward admitting ignorance can lead to overly conservative behaviors and poor generaliz...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27914v1
Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems
Task-based dialogue systems assist users in achieving specific goals, such as executing actions or retrieving information, through natural language interactions. Accurate coreference resolution is essential, as it involves identifying object references within the dialogue - a task that becomes incre...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27850v1
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27695v1
RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems
People commonly leverage structured content to accelerate knowledge acquisition and research problem solving. Among these, roadmaps guide researchers through hierarchical subtasks to solve complex research problems step by step. Despite progress in structured content generation, the roadmap generati...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27616v1
JaiTTS: A Thai Voice Cloning Model
We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressive TTS model. JaiTTS-v1.0 directly processes numerals and Thai-Englis...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27607v1
Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis
Large-scale transformers achieve impressive results on program synthesis benchmarks, yet their true generalization capabilities remain obscured by data contamination and opaque training corpora. To rigorously assess whether models are truly generalizing or merely retrieving memorized templates, we i...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27551v1
AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech, or lack explicit dialect annotations to evaluate robustness for a diverse user base. This work pre...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27543v1
HealthBench Professional: Evaluating Large Language Models on Real Clinician Chats
Millions of clinicians use ChatGPT to support clinical care, but evaluations of the most common use cases in model-clinician conversations are limited. We introduce HealthBench Professional, an open benchmark for evaluating large language models on real tasks that clinicians bring to ChatGPT in the ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27470v1
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models
Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, existing systems fail to provide accurate verification and efficiency under high-concurrency workloads....
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27467v1
Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings
For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse information beyond the first-order statistics of the token embeddings,...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27398v1
3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases
This comprehensive review examines the evolution and the current state of the art in three-dimensional (3D) reconstruction techniques in manufacturing applications. The analysis covers both traditional approaches and emerging deep learning methods, showing a critical research gap in unified 3d recon...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28064v1
Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation
Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28011v1
Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training
The computational cost of training a vision-language model (VLM) can be reduced by sampling the training data. Previous work on efficient VLM pre-training has pointed to the importance of semantic data balance, adjusting the distribution of topics in the data to improve VLM accuracy. However, existi...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27932v1
Miaw AI secretary
Non-invasive AI secretary to help without context switching
🧰 ToolsApr 30, 2026https://www.producthunt.com/products/miaw-ai-secretary?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection
Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained sp...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27889v1
Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs
Convolutional Neural Networks (CNNs) are widely assumed to be translation-invariant, yet standard architectures exhibit a startling fragility: even a single-pixel shift can drastically degrade performance due to their reliance on spatially dependent fully connected layers. In this work, we resolve t...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27870v1
Taming Noise-Induced Prototype Degradation for Privacy-Preserving Personalized Federated Fine-Tuning
Prototype-based Personalized Federated Learning (ProtoPFL) enables efficient multi-domain adaptation by communicating compact class prototypes, but directly sharing them poses privacy risks. A common defense involves per-example $\ell_2$ clipping before prototype computation to bound sensitivity, fo...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27833v1
Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures
The rapid proliferation of image generation models and other artificial intelligence (AI) systems has intensified concerns regarding data privacy and user consent. As the availability of public datasets declines, major technology companies increasingly rely on proprietary or private user data for mo...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27804v1
Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition
Integrating domain knowledge into deep neural networks is a promising way to improve generalization. Existing methods either encode prior knowledge in the loss function or apply post-processing modules, but both depend on identifying useful symbolic knowledge to integrate. Since such rules are often...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27759v1
Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining
Test-time prompt tuning (TPT) has emerged as a promising technique for enhancing the adaptability of vision-language models by optimizing textual prompts using unlabeled test data. However, prior studies have observed that TPT often produces poorly calibrated models, raising concerns about the relia...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27715v1
A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images
In the segmentation of remotely sensed images, deep learning models are typically pre-trained using large image databases like ImageNet before fine-tuned on domain-specific datasets. However, the performance of these fine-tuned models is often hindered by the large domain gaps (i.e., differences in ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27704v1
Deep Learning-Based Segmentation of Peritoneal Cancer Index Regions from CT Imaging
Peritoneal metastases are currently assessed using diagnostic laparoscopy to determine Sugarbaker's Peritoneal Cancer Index (sPCI), which works by dividing the abdomen into 13 regions and scoring each region based on tumor size. A recent consensus study defined 3D regions to facilitate a radiologica...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27697v1
FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging
Conventional push-broom hyperspectral imaging suffers from slow acquisition speeds, precluding real-time object detection; in contrast, snapshot spectral imaging enables instantaneous hyperspectral images (HSIs) capture, making real-time object detection feasible, yet its potential is often compromi...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27653v1
Robot Learning from Human Videos: A Survey
A critical bottleneck hindering further advancement in embodied AI and robotics is the challenge of scaling robot data. To address this, the field of learning robot manipulation skills from human video data has attracted rapidly growing attention in recent years, driven by the abundance of human act...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27621v1
Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection
With the widespread application of Unmanned Aerial Vehicles (UAVs) in bridge structural health monitoring, deep learning-based automatic crack detection has become a major research focus. However, practical UAV inspections still face four key challenges: weak crack features, degraded imaging conditi...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27617v1
ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval
Video moment retrieval is the task of retrieving specific segments of a video corresponding to a given text query. Recent studies have been conducted to improve multimodal alignment performance through visual-linguistic similarity learning at the snippet-level and transformer-based temporal boundary...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27591v1
Self-Supervised Learning of Plant Image Representations
Automated plant recognition plays a crucial role in biodiversity monitoring and conservation, yet current approaches rely heavily on supervised learning, which is limited by the availability of expert-labeled data. Self-supervised learning (SSL) offers a scalable alternative, but existing methods an...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27538v1
FMCL: Class-Aware Client Clustering with Foundation Model Representations for Heterogeneous Federated Learning
Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its performance deteriorates under statistical heterogeneity. Clustered Federated Learning addresses this challenge by grouping similar clients and training separate models per clust...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27510v1
Leveraging Verifier-Based Reinforcement Learning in Image Editing
While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually gi...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27505v1
doola MCP for US LLC Formation
Start your business using AI in Claude and Replit
🧰 ToolsApr 30, 2026https://www.producthunt.com/products/doola-mcp?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
TAFA-GSGC: Group-wise Scalable Point Cloud Geometry Compression with Progressive Residual Refinement
Scalable compression is essential for bandwidth-adaptive transmission, yet most learned codecs are optimized for a fixed rate-distortion point, making rate adaptation costly due to re-encoding or maintaining multiple bitstreams. In this work, we propose TAFA-GSGC, a scalable learned point cloud geom...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28045v1
ResiHMR: Residual-Limb Aware Single-Image 3D Human Mesh Recovery for Individuals with Limb Loss
Single-image human mesh recovery provides a compact 3D, person-centric representation that supports analysis, animation, AR and VR, rehabilitation, and human-computer interaction. However, prevailing systems impose an intact-limb prior and degrade on people with limb loss, because fixed-topology mod...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28025v1
Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge
Current DeepFake detection scenarios are mostly binary, yet data manipulation can vary across audio, video, or both, whose variability is not captured in binary settings. Four-class audio-visual formulations address this by discriminating manipulation type, but introduce a unresolved problem: models...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28022v1
Faster 3D Gaussian Splatting Convergence via Structure-Aware Densification
3D Gaussian Splatting has emerged as a powerful scene representation for real-time novel-view synthesis. However, its standard adaptive density control relies on screen-space positional gradients, which do not distinguish between geometric misplacement and frequency aliasing, often leading to either...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.28016v1
FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting
Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and why agents fail. To ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27974v1
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
Due to the scarcity of large-scale in-the-wild triplet data and the improper use of masks, the performance of video virtual try-on models remains limited. In this paper, we first introduce **TripVVT-10K**, the largest and most diverse in-the-wild triplet dataset to date, providing explicit video-lev...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27958v1
Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction
Tunnel inspection requires outputs that can support defect localization, measurement, severity grading, and engineering documentation. Existing training-free foundation-model pipelines usually stop at coarse open-vocabulary proposals, which are difficult to use directly in interference-heavy tunnel ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27928v1
Generate Your Talking Avatar from Video Reference
Existing talking avatar methods typically adopt an image-to-video pipeline conditioned on a static reference image within the same scene as the target generation. This restricted, single-view perspective lacks sufficient temporal and expression cues, limiting the ability to synthesize high-fidelity ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27918v1
HiMix: Hierarchical Artifact-aware Mixup for Generalized Synthetic Image Detection
The rapid evolution of generative models has enabled the creation of highly realistic and diverse synthetic images, posing significant challenges to reliable and generalizable Synthetic Image Detection (SID). However, existing detectors are typically trained on limited and biased datasets, resulting...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27903v1
Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection
AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combin...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27875v1
GourNet: A CNN-Based Model for Mango Leaf Disease Detection
Mango cultivation is crucial in the agricultural sector, significantly contributing to economic development and food security. However, diseases affecting mango leaves can significantly reduce both the production and overall fruit grade. Detecting leaf diseases at an early stage with precision is ke...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27764v1
RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging
Video snapshot compressive imaging (SCI) enables the reconstruction of dynamic scenes from a single snapshot measurement. Recently, NeRF-based methods have shown promising reconstruction performance. However, such methods typically adopt random ray sampling strategies and fail to capture content str...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27702v1
MSR:Hybrid Field Modeling for CT-MRI Rigid-Deformable Registration of the Cervical Spine with an Annotated Dataset
Accurate CT-MRI registration of the cervical spine is essential for preoperative planning because this region is anatomically complex,highly variable,and vulnerable to injury of the vertebral arteries and spinal cord. However,cervical CT-MRI registration remains underexplored,particularly for rigid-...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27654v1
SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation
Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs to VLN requires endowing them with two complementary capabilities for acquiring such awareness, namel...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27620v1
ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data
Learning informative representations from tabular data in remote sensing and environmental science is challenging due to heterogeneity, scarce labels, and redundancy among features. We present ZAYAN (Zero-Anchor dYnamic feAture eNcoding), a self-supervised, feature-centric contrastive framework for ...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27606v1
MailToDock
Turn Gmail into Google Tasks with AI-powered
🧰 ToolsApr 30, 2026https://www.producthunt.com/products/mailtodock?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Decoding Scientific Experimental Images: The SPUR Benchmark for Perception, Understanding, and Reasoning
We introduce SPUR, a comprehensive benchmark for scientific experimental image perception, understanding, and reasoning, comprising 4,264 question-answering (QA) pairs derived from 1,084 expert-curated images. SPUR features three key innovations: (1) Panel-Level Fine-Grained Perception: evaluating t...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27604v1
SECOS: Semantic Capture for Rigorous Classification in Open-World Semi-Supervised Learning
In open-world semi-supervised learning (OWSSL), a model learns from labeled data and unlabeled data containing both known and novel classes. In practical OWSSL applications, models are expected to perform rigorous classification by directly selecting the most semantically relevant label from a candi...
📄 ResearchApr 30, 2026http://arxiv.org/abs/2604.27596v1