AI News Archive: June 11, 2026 — Part 15

Sourced from 500+ daily AI sources, scored by relevance.

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where pa...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13572v1
Is It You or Your Environment? A Bayesian Inference Framework for Genomically-Anchored Personalized Physiological Interpretation
Personalized health AI systems face a fundamental cold-start problem: machine learning models for physiological interpretation require weeks of individual behavioral data before they can distinguish constitutional variation from environmentally driven deviation. We propose a solution grounded in cau...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13556v1
Uncertainty-Aware Hybrid Retrieval for Long-Document RAG
Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worsen long context utilization. Fine-grained units are more compa...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13550v1
SupraBench: A Benchmark for Supramolecular Chemistry
Supramolecular chemistry, which includes the study of non-covalent host-guest assemblies, has advanced various applications. However, designing host-guest systems remains time-consuming, requiring days of dry-lab verification per candidate pair. Although LLMs have emerged as a fast alternative with ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13477v1
Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations
Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction requires conversation-level contextual evidence. Existing ASR correction methods of...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13464v1
Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models
Recent advances in large language models (LLMs) have prompted claims that such systems exhibit agency or qualify as moral agents. This paper argues that these attributions are misguided. We maintain that moral responsibility requires commitment-bearing agency grounded in intrinsic intentionality and...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13441v1
Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems
Evaluation in machine learning is typically treated as a neutral measurement process. However, in operational information systems, evaluation outcomes are often conditioned by the processes used to generate labels. This paper does not seek to improve classification performance. Instead, it examines ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13436v1
Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda
LLM-based agents are entering regulated industries where they automate judgment intensive quality management processes. We argue that symbolic structures already embedded in these domains, including regulations, typed process models, and compliance constraints, should be treated not merely as extern...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13405v1
Journey Now
Learning copilot for human ambition via step-by-step plans
🧰 ToolsJun 11, 2026https://www.producthunt.com/products/journey-now?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities
Language operates as a mechanism of both marginalization and resistance, especially for minority communities navigating insensitive and harmful speech online. As content moderation increasingly depends on large language models (LLMs), concerns arise about whether these systems can recognize cultural...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13397v1
MiniMax Sparse Attention
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untena...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13392v1
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents
Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversa...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13385v1
An LLM System for Autonomous Variational Quantum Circuit Design
The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit designs under explicit design constraints. Our system integrates seven components: ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13380v1
A Quantitative Experimental Repeated Measures Study of Training Dynamics in a Small Llama Style Language Model Under a Compute-Aware Token Budget
This study examines training dynamics in a small Llama-style language model trained under a fixed, compute-constrained token budget. Rather than evaluating efficiency solely through endpoint performance, the study uses a quantitative experimental repeated measures design to analyze how validation lo...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13370v1
IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13368v1
Mana: Dexterous Manipulation of Articulated Tools
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool use remains underexplored because of its physical complexity ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13677v1
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13673v1
Automated reproducibility assessments in the social and behavioral sciences using large language models
Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13670v1
SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 embedding models rev...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13647v1
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal importance via early exit and use this measure to study how answers form across ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13603v1
EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis
We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations across CUT\&Tag/CUT\&R...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13602v1
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a hum...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13578v1
A Three-Layer Framework for AI in Scientific Discovery
Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution o...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13566v1
Adaptive Turn-Taking for Real-time Multi-Party Voice Agents
Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing voice agent that conditions turn-taking behavior on an explicitly assigned role i...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13544v1
AgentRivet: an automated system for producing Rivet routines from journal publications
Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event gene...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13535v1
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generati...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13473v1
Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset
AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents Copilot, Devin, Cursor, and Claude are rejected. This represents a significant ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13468v1
Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests
AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project,...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13449v1
OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data sca...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13432v1
PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update
While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existing approaches typically enforce safety through post-hoc corrections, which incur...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13400v1
SmartFont: Dynamic Condition Allocation for Few-Shot Font Generation
Few-shot font generation simultaneously requires global structural completeness and fine-grained local style fidelity. Existing methods usually either rely on global content-style modeling, which is robust but imperfectly disentangled, or emphasize component/local modeling, which captures fine detai...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13382v1
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing envir...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13681v1
Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution
With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13668v1
HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally deterministic tool workflows are unfolded into repeated model-visible ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13663v1
Operadic consistency: a label-free signal for compositional reasoning failures in LLMs
Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampling and self-evaluation. Operad theory, the formalism for systems built by itera...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13649v1
Operads for compositional reasoning in LLMs
Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical struc...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13634v1
Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models
Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficien...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13624v1
Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data
Large-scale mined corpora provide abundant training data for end-to-end speech-to-speech translation (S2ST) but may contain noise, misalignment, and semantic errors. Filtering noisy data is crucial to maintain robust speech translation performance. We study how to train an audio-language model to ma...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13507v1
S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP
Despite recent progress in Natural Language Processing (NLP), models remain vulnerable to word substitution attacks. Most existing defenses focus on first order sensitivity and measure how much the output changes when the input is slightly perturbed. However, they ignore how this sensitivity evolves...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13439v1
SlimSnap
Your AI doesn't know which button you mean
🧰 ToolsJun 11, 2026https://www.producthunt.com/products/slimsnap?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect
The rapid growth of social media has intensified the spread of rumours. This issue is more challenging in the Algerian context due to the informal and code-switched nature of dialectal content, the scarcity of annotated resources, and the limited effectiveness of standard Arabic NLP tools on dialect...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13411v1
From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent
Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitation is the lack of flexibility to proactively investigate suspicious parts of a pa...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13349v1
IVIE: A Neuro-symbolic Approach to Incremental and Validated Generation of Interactive Fiction Worlds
Computational creativity in Interactive Fiction faces a fundamental tension: Large Language Models (LLM) may produce creative narratives but struggle with world coherence, while symbolic systems ensure consistency but lack creative flexibility. We present IVIE (Incremental & Validated Interactive Ex...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13348v1
Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation
We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for e...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13322v1
SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents
Skill self-evolution methods for LLM agents aim to turn execution trajectories into reusable skill documents, but current pipelines typically learn from one trajectory per task, merge candidate skill patches before checking them, and load the full skill corpus before inference. We propose SkillCAT, ...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13317v1
Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality
Contrastively trained vision-language models like CLIP, have made remarkable progress in learning joint image-text representations, but still face challenges in compositional understanding. They often exhibit a "bag-of-words" behavior--struggling to capture the object relations, attribute-object bin...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13288v1
TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum
TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or Arabic. The work addresses three problems specific to in-gallery deployment...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13267v1
Evaluating Pluralism in LLMs through Latent Perspectives
The growing need to represent diverse perspectives has increased interest in pluralistic LLM generation. Although difficult to operationalize, identifying perspectives expressed in text would provide clear guidance on pluralistic alignment and more clearly articulate the pluralistic gap in LLM gener...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13254v1
PolyAlign: Conditional Human-Distribution Alignment
Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress the natural variation of human responses across languages, tasks, and...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13227v1
When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understanding in large language models (LLMs). To evaluate this capability, we introduce Se...
📄 ResearchJun 11, 2026http://arxiv.org/abs/2606.13218v1