AI News Archive: June 15, 2026 — Part 13

Sourced from 500+ daily AI sources, scored by relevance.

IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication Dataset
IMPACTeen is a dataset of textual social influence scenarios spanning interpersonal, media-based, and digital settings in an adolescent context. It contains 1,021 texts, 5,100 individual annotation records, and gold labels for social influence techniques, with each text annotated from five distinct ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16910v1
LESS Is More: Mutual-Stability Sampling for Diffusion Language Models
Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed numbe...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16908v1
Does Traversal Order Matter? A Systematic Study of Tree Traversal Methods in Transformer Grammars
Transformer Grammars (TGs) enhance language modeling by incorporating syntactic tree structures. Despite the potentially significant impact on model performance of how syntactic trees are linearized in TGs, existing studies rely solely on Depth-First Traversal (DFT) for linearization. In this paper,...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16836v1
Connecting Speech to Words through Images
How can we learn the mapping between written words and their spoken counterparts in the absence of explicit textual supervision? We present a visually grounded method for building a vocabulary of spoken words using only images and their spoken descriptions. First, image captioning systems are used t...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16807v1
VEXI
Open-source AI coding agent for your terminal
🧰 ToolsJun 15, 2026https://www.producthunt.com/products/vexi?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents
Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16748v1
Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models
While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) n...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16700v1
FraudSMSWalker: Benchmarking Agentic Large Language Models for SMS-to-Webpage Fraud Detection
SMS fraud is increasingly cross-channel: a message directs the user to a webpage, and the final risk depends on how the SMS claim aligns with the page content and requested user action. However, existing evaluations either focus on message-only smishing classification or expose URL and domain cues t...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16659v1
How Far Can Machine Translation Quality Take You? Extrinsic Discourse Evaluation in Goal-Oriented Setups
Existing machine translation (MT) metrics and discourse-focused evaluations primarily assess translation quality intrinsically, without measuring the downstream consequences of translation errors. In this work, we focus on extrinsic discourse evaluation of machine translation under two distinct regi...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16596v1
Uncertainty Is Not a Safety Net for Clinical VQA, but Can It Anticipate Model Failure?
Safe deployment of clinical vision-language models (VLMs) requires reliable uncertainty estimation (UE): a signal indicating when predictions should be trusted or escalated to a clinician. We test whether current UE methods actually deliver this signal. Benchmarking 8 methods across 12 VLMs on clini...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16583v1
Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation
Reliable turn-taking is essential for spoken dialogue systems. However, most existing methods are designed for two-speaker interaction and struggle with realistic multiparty audio containing overlap and rapid speaker changes. We study multiparty turn-taking on the VoxConverse dataset and propose an ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16568v1
SkillWiki: A Living Knowledge Infrastructure for Agent Skills
While knowledge is managed through Wikipedia and software through GitHub, agent skills still lack an infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure that supports the organization, grounding, and continuous evolution of agent skill...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16523v1
From Awareness to Adherence: Bridging the Context Gap in Spoken Dialogue Systems via Context-Aware Decoding
Despite the success of end-to-end (E2E) spoken dialogue systems, maintaining strict context adherence in multi-round conversations remains a challenge. While prior works attribute these failures to models forgetting dialogue history, we highlight an equally critical but overlooked bottleneck: a gap ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16472v1
Taylor-Calibrate: Principled Initialization for Hybrid Linear Attention Distillation
Hybrid linear attention models offer an appealing path to faster long-context inference: they reduce the quadratic cost and KV-cache burden of full softmax attention while retaining much of the quality of Transformer models. A practical way to obtain such models is to convert a pretrained Transforme...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16429v1
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners. However, existing educational agents have primarily focused on lecture content automation and simu...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16428v1
Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization
Detecting unanswerable user queries remains essential for the reliable deployment of real-world embodied agents. However, modern vision-language models (VLMs) often generate overly confident answers even when the available visual memory cannot support the query. Such overconfidence poses various tas...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16898v1
Latent Space Reinforcement Learning for Inverse Material Estimation in Food Fracture Simulation
Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16870v1
Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection
While federated learning (FL) enables collaborative medical image segmentation without centralizing sensitive data, real-world deployment is frequently complicated by cross-site label imperfections such as contour disagreement, missing or additional structures, and confused labels. Federated noisy l...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16868v1
Redirecting the Flow: Image Customization through Attention Distribution Shift
Subject-driven image customization aims to generate images that not only follow textual instructions but also preserve the identity of a given reference subject. Existing approaches, including test-time fine-tuning, encoder-based methods, and token competition in shared attention spaces, suffer from...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16866v1
Decoupling Semantics from Distortions: Multi-Scale Two-Stream Vision-Language Alignment for AI-Generated Image Quality Assessment
Existing vision-language model (VLM)-based AI-generated image quality assessment (AIGIQA) methods suffer from a fundamental semantic-distortion dimensional conflict: monolithic representations optimized for semantic discrimination inherently entangle compositional understanding with low-level percep...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16799v1
LLM-Based Visual Explanation Evaluation Framework for Assessing the Explainability of Facial Skin Disease Classification Models
This study proposes a domain-specific LLM-based Visual Explanation Evaluation Framework for assessing Grad-CAM explanations in facial skin disease diagnosis models. While previous studies have primarily focused on improving classification performance through data augmentation techniques, relatively ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16794v1
Gen-VCoT: Generative Visual Chain-of-Thought Reasoning via Diffusion-Based RGB Intermediate Representations
Multimodal large language models (MLLMs) excel at visual reasoning but rely on text-based chain-of-thought (CoT), lacking interpretable visual intermediates. Existing methods use opaque tokens or external tools, missing key properties. We propose Gen-VCoT, a framework using expert vision models to g...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16783v1
Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video Detection
With the rapid advancement of video generation models, distinguishing between AI-generated and authentic videos has emerged as a challenging endeavor. The majority of existing research endeavors concentrate on the development of detectors for identifying samples generated by generative adversarial n...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16742v1
MMDiff: Extending Diffusion Transformers for Multi-Modal Generation
Diffusion transformers have demonstrated remarkable generative capabilities, yet the rich perceptual representations computed across their denoising trajectory are discarded once the content is rendered. We present MMDiff, a framework that transforms a frozen diffusion transformer into a multi-modal...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16673v1
Look Again Before You Abstain:Budgeted Conformal Evidence Acquisition for Reliable Vision-Language Model
Large vision-language models (LVLMs) hallucinate: they assert visual details that the image does not support. A principled remedy is selective prediction with a distribution-free guarantee-verify each claim and abstain when the claim is not grounded, so that the hallucination rate among asserted cla...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16667v1
SUP-MCRL: Subject-aware Unified Pseudo-feature Coded Multimodal Contrastive Representation Learning for EEG Visual Decoding
Non-invasive brain-computer interfaces suffer severe fidelity degradation in neural visual decoding when generalizing to natural visual experiences. Conventional multimodal contrastive representation learning solely optimizes geometric distance alignment, neglecting semantic consistency and subject ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16615v1
DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models
Industrial anomaly localization aims to accurately identify and localize abnormal regions in industrial products, addressing the critical challenge of detecting unseen defect categories in real-world scenarios. Traditional closed-set methods often suffer from poor cross-scenario generalization, whil...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16601v1
LOCUS: Local Visual Cue Search for Enhancing Fine-Grained Perception in Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) remain unreliable on fine-grained visual perception, even when high-resolution inputs preserve the necessary local details. We identify this limitation as visual context rot: decisive evidence may exist in the full image, yet fail to be reliably selected and ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16586v1
Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction
Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal informati...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16580v1
Transformation-driven generation of comparable projection images from multimodal anatomical scenes
This work addresses the computational problem of generating reproducible projection-space observations from heterogeneous anatomical scenes whose components may undergo independent spatial transformations. We propose a transformation-driven framework for synthetic projection imaging from multimodal ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16573v1
Assessing Reliability of Symbol Detection in Concept Bottleneck Models
Concept Bottleneck Models (CBMs) are a relevant tool for explainable Artificial Intelligence because they make their predictions through human-interpretable symbols. However, high task accuracy does not guarantee that these symbols are detected faithfully: jointly trained CBMs may encode task-specif...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16535v1
Unified Multimodal Model for Brain MRI Imputation and Understanding
Multimodal large language models (MLLMs) hold great potential for medicine, as they inherit knowledge from LLM and allow multiple data modalities to be integrated, analysed and interpreted in natural language. However, the field of medical MLLMs is constrained by non-trivial challenges, notably the ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16484v1
ResEdit: Residual embeddings for precise generative image editing
Conditional diffusion image generators can be repurposed for editing through inversion, without the need for large-scale paired fine-tuning data. However, producing high-quality, targeted edits while maintaining image identity and global consistency remains challenging, as weakly conditioned inversi...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16457v1
Hierarchical Fine-Grained Aerial Object Detection
Fine-grained aerial object detection, driven by the intrinsic granularity of real-world object categories, is crucial for advanced scene understanding in remote sensing. Existing methods largely inherit the paradigm of coarse-grained object detection, relying solely on single-label supervision and t...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16448v1
Beer-Lambert Guided Representation Learning for Unsupervised Anomaly Detection in Sub-THz Food Inspection Images
Food manufacturing requires reliable inspection systems to detect foreign material contamination and maintain product safety. Sub-THz transmission imaging provides material-dependent attenuation characteristics that are useful for detecting low-density contaminants in food products. However, existin...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16421v1
Instance-Aware Knowledge Distillation for Semi-Supervised Learning of an On-Board Multi-Task Dense Prediction Model for Collision Avoidance System
Collision avoidance systems have evolved toward camera-based deep learning approaches for driving scene understanding. However, deployment in edge environments such as country clubs is constrained by limited computational resources and unreliable communication infrastructure. Moreover, constructing ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16414v1
GraphBEV++: Multi-Modal Feature Alignment for Autonomous Driving
Feature misalignment in BEV perception is a critical yet often overlooked challenge in autonomous driving, especially under calibration uncertainties between LiDAR and camera sensors. To address this issue, we propose a robust multi-modal fusion framework, GraphBEV++, which systematically mitigates ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16354v1
Robust Spoofed Speech Detection via Temporal Pyramid Modeling
Spoofed speech detection is increasingly challenged by realistic synthesis, voice conversion, and replay attacks, with cross-dataset generalization remaining a major limitation. This work we propose a Temporal Pyramid Adapter that utilize parallel temporal convolutions with varying receptive fields ...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16837v1
WaveDINO: Learning-Based Atmospheric Correction of Unwrapped InSAR Interferograms Validated by GNSS: Results at Laguna del Maule and Campi Flegrei Volcanoes
Interferometric Synthetic Aperture Radar (InSAR) enables effective monitoring of volcanic deformation; however, the observed signals are often corrupted by atmospheric phase delays, seasonal surface changes, and decorrelation effects. Existing atmospheric correction methods, such as numerical weathe...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16795v1
Text-Vision Co-Instructed Image Editing
Existing image editing methods can be generally categorized into textual instruction-based and visual prompt-based ones. Textual instructions are semantically expressive, but are limited by the coarse granularity of spatial control of the editing results. In contrast, visual prompts such as drag and...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16767v1
3D Classification of Paramagnetic Rim Lesions in Multiple Sclerosis via Asymmetric QSM-FLAIR Modeling
Paramagnetic rim lesions (Rim$^+$) identified on susceptibility-sensitive MRI have recently emerged as a specific biomarker of chronic active inflammation in Multiple Sclerosis (MS) and are associated with long-term disability progression. However, susceptibility imaging and expert interpretation re...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16756v1
Structure-aware Knowledge-guided Heterogeneous Mamba for Zygomaticomaxillary Suture Assessment
The Zygomaticomaxillary Suture is a key circummaxillary structure that connects the zygomatic bone and the maxilla, which serves as a primary site of resistance during maxillary advancement, and its maturation status directly influences the timing and efficacy of orthopedic interventions. However, a...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16749v1
PATCH: Action-Chunk-Conditioned Latent Patch Innovation Monitoring for Robot Manipulation
Learning-based manipulation policies have made substantial progress in real-world robot manipulation, particularly for short-horizon action generation. However, deployment in open workspaces remains fragile under unexpected local scene dynamics, such as moving objects, transient occlusions, or distu...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16690v1
Sinkhorn-CPD: Robust point cloud registration via unbalanced entropic optimal transport
Coherent Point Drift (CPD) is widely used for rigid point cloud registration because of its soft correspondences and closed-form parameter updates. However, CPD's target-side marginal constraint forces every observation, including outliers, to receive exactly unit probability mass. This assumption d...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16672v1
Vision-Language Models as Zero-Annotation Oracles in Histopathology
Foreground segmentation is the critical first step of every computational pathology pipeline, yet existing methods rely on hand-tuned heuristics or supervised models that overfit to narrow stain and scanner distributions, failing silently on specialised stains such as Jones silver or Elastica van Gi...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16658v1
DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation
Recent vision token pruning methods effectively preserve model performance under moderate token budgets but become unstable under ultra-low token budget. Our analysis shows that as the pruning budget decreases, accuracy degradation is often accompanied by larger feature distribution shifts. Critical...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16633v1
Rotational Symmetry based Object Pose Estimation from Point Clouds in the Absence of Known 3D Models
Object pose estimation is crucial to many industrial applications, with one example being automated spray painting using a robot. However, confidentiality concerns often limit access to high-quality 3D models, posing a significant challenge for point-cloud-based pose estimation. In such scenarios, r...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16593v1
PROSE: Training-Free Egocentric Scene Registration with Vision-Language Models
Registering two captures of the same indoor space taken at different times underpins persistent spatial memory for robots and AR systems, yet the realistic version of this task is egocentric and its most scalable form is RGB-only. Head-mounted cameras yield blurry, fast-moving, partially overlapping...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16569v1
Local-GS: Accelerating 3D Gaussian Splatting via Tile-Local Warp Coherence
3D Gaussian Splatting (3DGS) has significantly advanced real-time novel view synthesis by representing scenes as dense collections of anisotropic 3D Gaussian primitives. However, the irregular spatial distribution of Gaussians often leads to poor GPU utilization, as warp divergence and redundant com...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16566v1
Kairos: A Native World Model Stack for Physical AI
World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constrain...
📄 ResearchJun 15, 2026http://arxiv.org/abs/2606.16533v1