AI News Archive: June 16, 2026 — Part 22

Sourced from 500+ daily AI sources, scored by relevance.

Qwen-RobotManip Technical Report: Alignment Unlocks Scale for Robotic Manipulation Foundation Models
Foundation models in language and multimodality achieve strong generalization by aligning heterogeneous data under a unified formulation and training at scale. In this report, we investigate whether this scaling recipe can be applied to robotic manipulation to achieve genuine generalization. This is...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17846v1
MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model
As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this work, we define the position of social world models and build a prototy...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17800v1
ED3R: Energy-Aware Distributed Disaster Detection Enabled by Cooperative Robotic Agents
Robotics are expected to support environmental monitoring and natural disaster management, where decisions must be made under uncertainty, resource limitations, and strict operational constraints. In critical missions, such as wildfires, robotic agents must not only identify hazardous events with su...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17739v1
ActWorld: From Explorable to Interactive World Model via Action-Aware Memory
Interactive world models aim to simulate environment dynamics under real-time user actions. However, their action vocabulary is largely confined to navigation: most actions correspond to motion (e.g., walk, turn, look around), while interaction with objects in the scene (e.g., pick up plates, open d...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17730v1
Structured Adversarial Camouflage via Voronoi Diagrams
Pixel-wise adversarial patches are computationally heavy and often visually detectable, limiting utility in security-critical systems. We present adversarial Voronoi camouflage that optimizes only seed-point locations under fixed, printable palettes using a soft assignment, producing structured, spl...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17711v1
Do We Really Need Diffusion? A Fast U-Net for Paired Medical Image Translation
Magnetic resonance imaging-signal fat fraction (MRI-SFF) quantifies tissue fat and serves as an established biomarker for metabolic and musculoskeletal disorders. The acquisition requires, however, specialized MRI sequences, which are not available routinely. We investigate whether SFF can be estima...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17675v1
Divide, Deliberate, Decide: A Multi-Agent Framework for Fine-Grained Egocentric Action Recognition
Fine-grained action recognition in egocentric video is challenging for Vision-Language Models (VLMs): actions often differ only in small visual cues, and a single model tends to be biased toward a subset of these cues. We propose Divide, Deliberate, Decide, a fully-local, zero-shot multi-agent frame...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17627v1
RAVA: Retrieval-Augmented Viewpoint Alignment for Subject-Driven Image Generation
Reference-driven image generation has made rapid progress on identity preservation, but reliable viewpoint control across different subjects remains poorly understood. The difficulty is not merely generating a new image of the target subject: the model must infer the implicit viewpoint of one subjec...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17619v1
Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?
Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under i...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18209v1
Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation
Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets su...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18190v1
A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks
We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18175v1
Tadka
Ship 10x more ad creative, without hiring a design team.
🧰 ToolsJun 16, 2026https://www.producthunt.com/products/tadka?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports
Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the compl...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18166v1
Deep Reinforcement Learning for Minimum Zero-Forcing Sets
This paper explores the problem of finding the minimum zero-forcing set on undirected graphs and proposes an adapted machine-learning framework to solve the problem. The minimum zero-forcing set problem is a graph coloring problem where the color of an initial set of nodes propagates throughout a ne...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18106v1
From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning
Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined success is driven by compositional generalization, which we formalize thr...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18089v1
Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability
Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/η$, where $η$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18080v1
ConTex: Reformulating Counterfactual Generation For Time Series Forecasting
Decision-making with deep learning-based time series forecasting requires not only accurate predictions but also actionable insights. However, current architectures do not inherently provide such information. Specifically, guidance is needed on how current conditions must be modified to shift from a...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18049v1
INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities
We propose a new weak-form Physics-Informed Neural Network approach (named INI-VPINN). INI-VPINN naturally incorporates Neumann boundary and interface conditions into the variational formulation. It removes the need for additional loss terms or multiple subdomain networks. This framework employs com...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18032v1
SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs
Sparse Mixture-of-Experts (MoE) architectures enable scaling LLM parameters under a fixed inference budget by activating only a small subset of experts via top-$k$ routing. While this preserves causality and suits autoregressive language models, the discrete top-$k$ operator is not differentiable, f...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17952v1
Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias
Monotonicity has been a long-running architectural inductive bias for neural networks, motivated by tabular, scientific, and economic settings where outputs are known to respond monotonically to certain inputs. Existing approaches are MLP- or flow-based and lack per-edge functional transparency; the...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17886v1
AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability \cite{...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17872v1
Meta-classification of one-class classification models using ranking correlation and nearest neighbor
Machine Learning (ML) techniques have been applied to various problems. However, applying ML to ML models is an unexplored direction. For this purpose, this paper considers a meta-classification of one-class classification (OCC) models, because all ML models could be approximated as OCC models. The ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17858v1
From Drift to Coherence: Stabilizing Beliefs in LLMs
Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regi...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17832v1
A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise
Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error fl...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18183v1
Tensor-based second-order causal discovery
Causal discovery seeks to uncover the causal dependencies among variables. For this purpose, we propose an algorithm called Tensor-based Second-order Causal Discovery (TSCD). Its input is a tensor obtained from the covariance matrices of observational and interventional data. Assuming the causal dep...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18074v1
NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment
We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18066v1
Uncertainty Quantification for Flow-Based Vision-Language-Action Models
Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictio...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18043v1
agentbrowse
Give your AI coding agent the web as a command line
🧰 ToolsJun 16, 2026https://www.producthunt.com/products/agentbrowse?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models
Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their gener...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18001v1
Differential Privacy of Gaussian Process Posterior Sampling
We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that t...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17995v1
Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model
In recent years, electronic (E) commerce services have rapidly increased in the daily lives of people, which helpsthem to purchase products online. However, retail platforms have struggled to understand customer behavior and make it difficult to predict their future purchases. To overcome these chal...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17931v1
KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation
Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional multilayer perceptrons by replacing linear weights with learnable univariate functions. Despite their theoretical advantages in interpretability and expressiveness, practical research of KANs remains di...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17927v1
Dimensionality Controls When Modularity Helps in Continual Learning
Compositional learning systems must balance plasticity, the ability to acquire new knowledge, with stability, the preservation of previously learned components, especially when tasks share structure and risk interference. We study how modular architecture, task similarity, and representational dimen...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17889v1
Revisiting Structural Dependency in Autoregressive Multi-Task Table Recognition via Order-Independent Cell-Level Representations
Multi-task table recognition jointly addresses table structure prediction, cell localization, and cell content recognition within a unified framework. Existing approaches often rely on autoregressive decoders to generate table structures and reuse their hidden states for cell localization and conten...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17874v1
WAM-RL: World-Action Model Reinforcement Learning with Reconstruction Rewards and Online Video SFT
Recent World-Action (WA) models demonstrate strong generalization ability and data efficiency, but they typically rely on expert trajectories for training. This reliance limits their ability to acquire fine-grained manipulation skills beyond the demonstration distribution and prevents them from cont...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17906v1
Learn to Quantify Social Interaction with Constraints for Pedestrian Walking
Long-term human path forecasting in crowds is critical for autonomous moving platforms (like autonomous driving cars and social robots) to avoid collision and make high-quality planning. Although the current research take into account social interactions for prediction, they don't reveal the exact k...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17897v1
MuseVLA: An Adaptive Multimodal Sensing Vision-Language-Action Model for Robotic Manipulation
Humans naturally leverage diverse sensing modalities to interact with the physical world, while most Vision-Language-Action (VLA) models for robotics rely solely on RGB observations. This limits their ability to perceive physical properties that are difficult or impossible to infer from RGB cameras,...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17598v1
MagicSim: A Unified Infrastructure for Executable Embodied Interaction
Robot learning and embodied agents now require simulation to serve as a shared execution substrate linking control, skills, and planning, not only as a renderer, controller testbed, or fixed task environment. Existing pipelines split these layers with "magic" actions, disconnected training environme...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17511v1
When Robots Sleep: Offline Skill Consolidation for Shared-Policy Robot Learning
Robots that learn over long deployments must add new skills without losing the shared policy structure that makes earlier skills reusable. We study sequential robot skill learning, where previous trajectories and task losses may be unavailable, and the deployed policy must remain a single shared con...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17493v1
Embodiment Shapes Rolling Behavior in a Multimodal Infant Model
Rolling over is one of the earliest milestones in infant motor development, reflecting the emergence of coordinated, whole-body sensorimotor control. Here, we conduct a computational study of infant rolling using MIMo, a virtual infant embodiment equipped with proprioception and vestibular sensation...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17456v1
TerraTransfer: Learning End-to-End Driving Policies Without Expert Demonstrations
End-to-end autonomous driving has achieved state-of-the-art performance on benchmarks and real-world deployments. Its standard training recipe, however, is expensive across all stages: collecting and labeling millions of driving frames is costly, and closed-loop RL on images is bottlenecked by the p...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17386v1
EgoInfinity: A Web-Scale 4D Hand-Object Interaction Data Engine for Any-View Robot Retargeting and Video-to-Action Robot Learning
Internet videos constitute the largest reservoir of embodied human manipulation knowledge, yet converting arbitrary RGB footage into actionable robot training data remains a major bottleneck. Existing lab- or factory-collected datasets are narrow in scale and diversity, limiting open-world robot lea...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17385v1
Contactless Respiratory Monitoring on Heterogeneous Mobile Robots: A Multimodal Edge-Computing Framework
Respiratory-rate (RR) monitoring is a critical component of remote triage and victim assessment in emergency response, disaster recovery, and infectious-disease scenarios, where minimizing physical contact can reduce responder risk and improve operational safety. However, field deployment of contact...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17376v1
ClientJam
AI-powered lead generation for designers and agencies
🧰 ToolsJun 16, 2026https://www.producthunt.com/products/clientjam?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies
We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art g...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18239v1
Beyond Failure Recovery: An Engagement-Aware Human-in-the-loop Framework for Robotic Systems
Conventional human-in-the-loop approaches typically involve users only when a robot encounters failure or uncertainty, treating humans primarily as tools for improving robot performance. However, in many human-centered robotics settings, interaction should support engagement by keeping users involve...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18189v1
A Hybrid Optimization Framework for Grasp Synthesis under Partial Observations
We propose a hybrid grasp synthesis framework that combines a learning-based Energy-Based Model (EBM) with an analytical Iterative Closest Point (ICP) method to generate robust grasps from partially observed point clouds. The learned energy function acts as a prior within a Stein Variational Gradien...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18053v1
LAGO Policy: Latency-Aware Asynchronous Diffusion Policies with Goal-Directed Collision-Free Planning for Smooth Manipulation
Diffusion-based visuomotor policies deployed with asynchronous inference often exhibit inter-chunk discontinuities and lack explicit mechanisms for obstacle-aware execution, leading to jerky motions and collisions that hinder reliable manipulation in real-world scenes. To address these issues, we pr...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17982v1
ThinkingVLA: Interleaved Vision and Language Reasoning for Robotic Manipulation
Most Vision-Language-Action (VLA) models map observations directly to actions without explicit reasoning, limiting their capacity for reasoning-intensive long-horizon tasks. To address this, existing approaches adopt Chain-of-Thought (CoT) reasoning to enable subgoal decomposition and spatial antici...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17937v1
SPARK: Low Latency Single-Camera 3D Pose Estimation for Autonomous Racing using Keypoints
In autonomous racing, fast detection of other participants' movements is required to plan safe, collision-free trajectories with non-cooperative opponents. LiDAR detection is inherently slower and harder to deploy on edge devices than vision methods, causing delayed detections that limit object trac...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17936v1