AI News Archive: June 4, 2026 — Part 13

Sourced from 500+ daily AI sources, scored by relevance.

Unsupervised Skill Discovery for Agentic Data Analysis
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success cri...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06416v1
HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes
Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or s...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06390v1
Emergent Language as an Approach to Conscious AI
The question of whether artificial systems can be conscious remains open, in part because existing approaches either evaluate systems against theory-derived checklists (discriminative) or engineer consciousness-inspired modules directly (architectural); both leave open whether observed structures ar...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06380v1
EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models
Medical vision-language models (VLMs) have shown increasing potential for clinical image interpretation, including lesion detection and report generation. However, their practical utility remains limited by insufficient sensitivity to subtle lesions, whose visual evidence is often sparse, low-contra...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06379v1
F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation
Continuous audio autoencoders reconstruct waveforms well but often produce latents with weak structure for understanding, while self-supervised audio encoders capture semantics but are not directly decodable. This mismatch complicates a single audio tokenizer that must support both understanding and...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06357v1
Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI respon...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06345v1
Bridging Domain Expertise and Generalization for Performance Estimation
Performance estimation under distribution shift aims to predict how a model behaves on an unlabeled test set whose distribution differs from the training data, a scenario that requires reliable indicators that can faithfully reflect model behavior without ground-truth labels. Existing approaches rel...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06335v1
Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability
Sparse Autoencoders (SAEs) are widely used for mechanistic interpretability in large language models, yet their formulation assigns each latent feature a single decoder direction, implicitly assuming features to be one-dimensional. We show that this assumption mismatches with the multi-dimensional s...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06333v1
Plug-and-Play Guidance for Discrete Diffusion Models via Gradient-Informed Logit Correction
Controllable generation with discrete diffusion models is often hindered by high computational overhead or the need for retraining. In this paper, we present \underline{\textbf{G}}radient-\underline{\textbf{I}}nformed \underline{\textbf{L}}ogit \underline{\textbf{C}}orrection (\textbf{GILC}), a plug...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06303v1
Adapting Diffusion Language Models for Lossless Pixel-Level Image Transmission
Lossless pixel-level image transmission is a fundamental regime beyond semantic communications, because exact recovery requires both accurate symbol probability modeling and reliable delivery over noisy channels. This paper proposes DDM-SSCC, a discrete-diffusion-model-based separate source-channel ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06273v1
Your GFlowNet Secretly Learns an Optimal Transport Plan
Generative Flow Networks (GFlowNets) are a framework for sampling structured objects via stochastic trajectories in a directed graph. In this work, we establish a theoretical connection between non-acyclic GFlowNets and optimal transport (OT). We show that fixing the initial flow distribution in a m...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06272v1
Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?
A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules, where an effect requires the simultaneous presence of multiple causes, while performing better in disjunctive settings. However, most demonstrations of this ``conjunctive handicap'...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06464v1
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding
Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEA...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06444v1
Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions
Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06443v1
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To tra...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06428v1
A Komi-Yazva--Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation
We present the first Komi-Yazva--Russian parallel corpus together with an explicit evaluation protocol for studying LLM translation in an endangered, extremely low-resource setting. The dataset contains 457 aligned sentence pairs from 74 narrative texts and is accompanied by documented provenance, s...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06420v1
CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments
Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving a...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06399v1
EDIT: Evidence-Diagnosed Intervention Training for Rule-Faithful LLM Grading
Reliable rubric grading requires more than accurate score prediction. Each judgement must be grounded in the mark scheme and evidence from the student answer. Existing credit-assignment and intervention methods, primarily designed for self-contained reasoning tasks such as mathematics reasoning, str...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06350v1
"Chi nas dal soch el sent de legn" -- Auditing Text Corpora for Lombard
Several of the world's languages are still under-resourced in terms of Natural Language Processing (NLP) tools. This is mostly due to the lack of high-quality datasets to train, develop, and evaluate systems and models for several tasks, such as Machine Translation (MT). We conduct a manual audit of...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06349v1
Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness
Factual sycophancy occurs when a language model abandons a correct, verifiable answer under social pressure. Because a flip occurs only when pressure toward a false answer exceeds the model's neutral preference for the truth, flip rates conflate two mechanisms: the strength of that baseline preferen...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06306v1
FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays
While large language models (LLMs) are increasingly used to generate writing feedback, there remains no systematic comparison of LLM and expert feedback on the dimensions that writing research identifies as central to revision: goal-orientation, anchoring to specific sentences, and prioritization. W...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06271v1
Headroom
Cut Claude Code token costs by ~50% with Headroom
🧰 ToolsJun 4, 2026https://www.producthunt.com/products/headroom-3?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation
Hate speech detection is inherently subjective: people from different demographic groups perceive the same content very differently. Collecting enough annotations from multiple demographic groups is costly and difficult to scale. Persona-conditioned Large Language Models (models prompted to adopt a ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06266v1
FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition
Automatic speech recognition (ASR) has advanced remarkably for standard speech; however, pathological speech from neurological conditions remains a significant challenge. We investigate speaker conditioning via Feature-wise Linear Modulation (FiLM), injecting x-vector-derived information into each t...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06211v1
Dense Contexts Are Hard Contexts: Lexical Density Limits Effective Context in LLMs
Input length and the position of relevant information are widely cited as the primary causes of degraded LLM long-context performance. Here, we study lexical density -- the rate at which a context introduces distinct information -- as a third, largely overlooked factor that systematically reduces th...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06203v1
Improving Answer Extraction in Context-based Question Answering Systems Using LLMs
Question answering (QA) systems have achieved notable progress with the advent of large language models (LLMs). However, they still face challenges in accurately extracting and generating precise answers from given contexts, particularly when dealing with complex or ambiguous queries. Existing appro...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06197v1
The Tell-Tale Norm: $\ell_2$ Magnitude as a Signal for Reasoning Dynamics in Large Language Models
Recent work has sought to understand Large Language Models (LLMs) reasoning, yet a principled, model-intrinsic signal that captures its layer-wise reasoning dynamics remains underexplored. We bridge this gap by demonstrating that the l2 norm of hidden states serves as an endogenous signal of the mod...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06188v1
Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning
Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for differ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06178v1
Harnessing Structural Context for Entity Alignment Foundation Models
Entity alignment (EA) aims to identify equivalent entities across heterogeneous knowledge graphs (KGs) and is a key component of knowledge fusion and cross-KG reasoning. The recent EA foundation model demonstrates that alignment knowledge, once pretrained, can be directly applied to diverse previous...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06109v1
IR3DE: A Linear Router for Large Language Models
Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LL...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06098v1
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play L...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06087v1
On Advantage Estimates for Max@K Policy Gradients
Reinforcement learning with verifiable rewards is widely used for post-training reasoning models, but sparse outcome rewards make exploration difficult. A complementary approach is to optimize inference-time objectives such as pass@K and max@K directly, yet existing policy-gradient estimators for th...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06080v1
Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition
Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Kore...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06065v1
MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Reinforcement learning with verifiable rewards is ideal for multi-constraint instruction following, yet standard group-relative policy optimization (GRPO) becomes unstable under discrete, low-dispersion rewards, where within-group reward distributions are frequently homogeneous. We identify and form...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06058v1
IA-RAG: Interval-Algebra-Driven Temporal Reasoning for Dynamic Knowledge Retrieval
Retrieval-Augmented Generation (RAG) has shown strong effectiveness in grounding Large Language Models (LLMs) with external knowledge. However, existing RAG and Graph RAG frameworks largely treat knowledge as static or associate time with coarse-grained timestamps or metadata, failing to capture ric...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06044v1
English-to-Prakrit Machine Translation via Multilingual Transfer Learning
We study English-to-Prakrit machine translation in a low-resource setting where the target language is unsupported by IndicTrans2. We adapt the multilingual model by mapping Prakrit to the Hindi language tag (hin_Deva) without modifying the tokenizer, vocabulary, or architecture. Using a 1,474-pair ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06038v1
RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit
Community-conditioned language model adaptation requires choices about data collection, community definition, and evaluation that are currently made independently in each study, making it hard to compare assumptions or reuse artifacts. We present RedditPersona, a modular framework that standardizes ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06027v1
Moneyball
AI finance tracker for multi-bank, multi-currency households
🧰 ToolsJun 4, 2026https://www.producthunt.com/products/moneyball?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
EGTR-Review: Efficient Evidence-Grounded Scientific Peer Review Generation via Multi-Agent Teacher Distillation
Scientific peer review generation has attracted increasing attention for reducing reviewing burdens and providing timely feedback. However, existing Large Language Model (LLM)-based methods often produce generic comments with insufficient evidence support and weak source traceability, while complex ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06025v1
Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill
Large language models increasingly write, review, and judge code, and a fast-growing practice equips them with prompt 'skills' that ask the model to reason like a scientist. A prominent example tells the model to act as a Popperian falsificationist, and such skills are reported to improve generated ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06454v1
Latent Reasoning with Normalizing Flows
Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbali...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06447v1
Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery
Circuit discovery methods identify subgraphs that explain specific model behaviors, and structural differences between discovered circuits are commonly interpreted as evidence of distinct mechanisms. We test this assumption by varying input statistics while holding the task fixed, and show that the ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06267v1
Where does Absolute Position come from in decoder-only Transformers?
RoPE-trained transformers distinguish absolute position in their attention patterns, even though RoPE encodes only relative offsets in the inner product. We trace this leakage to two architectural components, The causal mask is responsible for the first: its per-query softmax denominator depends on ...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06160v1
OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation
Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators for order...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06096v1
SkillComposer: Learning to Evolve Agent Skills for Specification and Generalization
Agent skills, which consist of reusable strategies that guide agent reasoning and action, have shown strong potential for improving model capability at inference time. However, current skill construction methods treat the problem as one-shot extraction, overlooking a fundamental tension: a skill tai...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06079v1
NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models
Masked diffusion language models generate text by iteratively unmasking many tokens in parallel, but this speed comes with a correction problem: tokens generated in the same step are predicted from marginal distributions, and early local dependency errors can later contaminate the context. PRISM add...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06031v1
A Vision-language Framework for Comparative Reasoning in Radiology
Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological compari...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06407v1
Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning
Machine learning is increasingly employed for the evaluation of football tactics. However, existing approaches focus on characterising historical actions or analyst-specified counterfactual scenarios. In this work, we seek to go beyond the imitation of historically observed patterns towards discover...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06353v1
Performance Evaluation of GraphCast for Medium-Range Weather Forecasting over Brazil
The paradigm of global weather forecasting is rapidly shifting with the emergence of Machine Learning Weather Prediction models (MLWP). While these data-driven architectures demonstrate remarkable global skill, regional benchmarks in the Global South remain scarce, leaving their efficacy in complex,...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06348v1
Equivariant Neural Belief Propagation
Probabilistic inference over spatially embedded variables requires beliefs that respect $SE(3)$ symmetry, yet existing equivariant networks produce only scalars and vectors -- not the rank-2 precision tensors needed for anisotropic uncertainty, and single-component messages collapse multi-modal ener...
📄 ResearchJun 4, 2026http://arxiv.org/abs/2606.06344v1