AI News Archive: June 16, 2026 — Part 20

Sourced from 500+ daily AI sources, scored by relevance.

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using we...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18111v1
Querying an astronomical database using large language models: the ALeRCE text-to-SQL system
We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18108v1
IsabeLLM: Automated Theorem Proving Applied to Formally Verifying Consensus
Advances in Artificial Intelligence (AI) have led AI for Theorem Proving to become a promising means of formally verifying computer systems. Whilst formal verification is traditionally reserved for safety-critical systems due to the required amount of expertise and effort, AI can help to automate a ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18098v1
A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing large language models (LLMs) with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18075v1
Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications
Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premature diagnostic handoff and silent clinical hallucinations tha...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18068v1
When LLMs Analyze Scars: From Images to Clinically-Meaningful Features
Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounc...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18063v1
Security and Privacy Prompts in the Wild: What Users Ask LLMs and How LLMs Respond
Large language models (LLMs) are widely used to fulfill users' information needs; users ask LLMs about the weather, pose educational questions, and consult them for legal assistance. One particularly understudied area is digital security and privacy (S&P), where users may seek LLMs' help on how to s...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18062v1
PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience
As Large Language Model based agents enter autonomous scientific research, their ability to resist pseudoscience becomes increasingly important. Otherwise, such systems may rapidly generate plausible yet misleading studies that contaminate academic literature and erode trust in science. We present P...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18060v1
When AI Says "I have been in similar situations": Synthetic Lived Experience in Peer-Like Caregiver Support
Caregivers often turn to online communities for informational and emotional support. In these spaces, peer supporters frequently draw on personal narratives to respond to emotionally complex caregiving situations. As LLMs are increasingly designed as peer-like sources of support, they introduce a cr...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18057v1
ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents
Tool-using LLM agents increasingly use the Model Context Protocol (MCP) to answer from heterogeneous evidence sources, including search, APIs, databases, clinical records, and formulary tools. Standard factuality metrics usually test whether an answer is supported by pooled evidence, missing a prove...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18037v1
When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning
Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward few-shot In-Context Learning (ICL), it is often presumed that insights...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18033v1
LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI
AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an au...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18021v1
LLM Consumer Behavior Theory: Foundations of a Novel Research Field
Large language models (LLMs) are increasingly deployed as autonomous agents that make consumption decisions on behalf of users. This shift raises fundamental questions for consumer theory, which has traditionally modeled humans as the primary decision-makers. In this paper, we introduce LLM Consumer...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18005v1
C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift
Collective Adaptive Systems (CAS) increasingly rely on machine learning to let each node learn from locally sensed data, aligning its behavior with the surrounding environment. Scaling this intelligence, however, raises fundamental challenges: sensed data is often privacy-sensitive, preventing centr...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18003v1
Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement
Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time policy steering and self-...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18247v1
EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation
Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typically rely on static priors and lack adaptation, which leads to repeated errors and costly trial an...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18235v1
whoburnedmore
Spotify Wrapped for Claude, Codex & a Public leaderboard.
🧰 ToolsJun 16, 2026https://www.producthunt.com/products/whoburnedmore?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Looped World Models
Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopWM), which are the first looped architectures for world model...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18208v1
Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers
Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone t...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18206v1
DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction
Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, ra...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18191v1
Kolmogorov Regression for Robust Diffusion Policies
Finite-dimensional (FD) diffusion policies exhibit temporal drift owing to discretization artifacts that degrade long-horizon performance (when deployed on physical systems). We introduce a backward Kolmogorov equation that lifts diffusion policies to a Cameron-Martin space -- a subset of the Hilber...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18186v1
The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act
Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18158v1
Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure
Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physics-neural architectures, which requires deep domain expertise...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18154v1
Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing So
A robot's flash endurance is a non-renewable stock: every persisted write spends one of a few thousand program/erase cycles and never refills, yet no fielded robot memory system prices which memories are worth an erase cycle. We treat embodied memory as depreciating capital and price that stock with...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18144v1
Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models
AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts, leaving open whether the welfare reasoning surfaced in those responses tra...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18142v1
Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models
State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,000x. Using grouped qu...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18114v1
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18101v1
S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices
Structured State Space Models (SSMs), including the S4 and S4D architectures, have recently emerged as powerful alternatives to attention-based models for capturing long-range dependencies in sequential data. Despite their strong empirical performance, deploying these models in time- and resource-co...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18096v1
EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning
Cross-end-effector grasp generation seeks a unified model that generalizes across objects and across embodiments ranging from parallel grippers to dexterous end effectors. Existing grasp generators are typically designed for a fixed embodiment or encode embodiment identity with a static descriptor, ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18092v1
Volterra Generative Models
Score-based diffusion models typically use Brownian perturbations, which provide tractable reverse-time dynamics but impose memoryless noising. We introduce Volterra generative models, a continuous-time score-based framework whose forward process injects path-dependent noise through fractional kerne...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18071v1
Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation
Catastrophic forgetting in continual adaptation is usually studied through parameter drift, replay, or distillation, but these views do not identify which output-space directions are vulnerable. We give a function-space account in the NTK regime: new-task training induces old-task prediction drift t...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18024v1
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window atten...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18023v1
A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions
Optical networks need intent-driven, closed-loop agentic management, a key enabler for higher autonomy levels. We present the first T-API-compliant reasoning and act (ReAct) loop. We show that domain-specific composite tools achieve 90% oracle-validated correctness with threefold token savings compa...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18000v1
Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting
Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data wh...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17996v1
Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast Synthesis
Multi-contrast magnetic resonance imaging (MRI) provides complementary information for clinical diagnosis. However, acquiring all MRI sequences is often time-consuming and costly. Recent generative models perform cross-contrast synthesis to address this issue by inferring absent contrasts from the a...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17989v1
STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training
Existing RL post-training methods for text-to-image generation usually convert the final-image reward into a single scalar advantage and apply it with the same strength to the entire generative trajectory. However, text-to-image generation naturally has temporal and spatial structure: different deno...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17979v1
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
Knowledge distillation transfers a teacher's competence to a small student but is brittle in the small-student regime: forcing the student to imitate logits from a much larger teacher concentrates it on the teacher's sharpest modes, hurting generalization on benchmark families beyond the training co...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18216v1
Learning from the Self-future: On-policy Self-distillation for dLLMs
On-policy self-distillation (OPSD) has proven effective for post-training large language models (LLMs), yet its application to diffusion LLMs (dLLMs) remains unexplored. Existing OPSD methods are inherently autoregressive-centric. They inject privileged information via left-to-right prefix condition...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18195v1
HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice
Retrieval-Augmented Generation (RAG) is the prevailing architecture for grounding language model outputs in external evidence, yet its dominant evaluation paradigms and default configurations remain oriented toward factual question-answering. For interpretive disciplines such as historical studies, ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18103v1
ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation
Hybrid architectures combining full attention (FA) and sliding-window attention (SWA) are a promising paradigm for efficient LLM inference. However, existing methods typically rely on hand-crafted rules or simple post-hoc heuristics for FA/SWA allocation and offer limited analysis of the attention b...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18056v1
Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose
LLM agents increasingly rely on external skills -- reusable tool specifications -- but real-world tasks often require composing multiple skills, not just selecting one. We formalize this as the Compositional Skill Routing problem: given a complex user query and a large skill library, decompose the q...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18051v1
Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews
Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.18019v1
VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination
MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dua...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17999v1
Fine-tuning LLMs for Passive Depression Severity Estimation from AI Mental Health Dialogue
Depression is the leading cause of disability worldwide, and early detection of symptom change is essential for timely intervention. Validated instruments such as the Patient Health Questionnaire-9 (PHQ-9) support symptom monitoring at scale, but real-world completion rates are low, introducing resp...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17973v1
Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models
Long-form chain-of-thought reasoning can improve LLM performance on complex tasks, but models often continue generating unnecessary reasoning after a correct answer has emerged. We refer to this behavior as overthinking. We study this phenomenon from the perspective of GRPO-style reinforcement learn...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17890v1
Environment-Grounded Automated Prompt Optimization for LLM Game Agents
LLM agents in interactive environments are highly sensitive to their prompts, yet prompt engineering remains a manual, task-specific process. We introduce an automated prompt optimization framework for LLM agents that decomposes the observation-to-action pipeline into a goal-conditioned descriptor a...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17838v1
When Multiple Scripts Matter: Evaluating ASR in Clinical Settings
Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics often underestimate ASR performance by treating orthographic variants as ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17826v1
Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation
This study explores how bilingual fine-tuning affects automatic speech recognition (ASR) in low-resource languages. We evaluate this method across nine linguistically and geographically diverse language pairs, covering a range of language families and writing systems. To distinguish the two language...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17820v1
ANSUZS
Reinventing AI Chatbots
🧰 ToolsJun 16, 2026https://www.producthunt.com/products/ansuzs?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
A Framework for Evaluating Agentic Skills at Scale
Agent skills -- structured, reusable knowledge artifacts that augment LLM agent capabilities -- have been rapidly adopted in industry, yet their cross-domain impact and use across commercial and open-source models remain under-studied, and no reusable methodology exists for evaluating an individual ...
📄 ResearchJun 16, 2026http://arxiv.org/abs/2606.17819v1