AI News Archive: June 10, 2026 — Part 15

Sourced from 500+ daily AI sources, scored by relevance.

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization
Trustworthy AI requires reliable data-processing pipelines, not only robust downstream predictive models. As an upstream component, data summarization determines which information is retained and passed to subsequent learning or decision modules. Therefore, adversarial perturbations to the summariza...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11804v1
Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models
Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in contrast to English an...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12114v1
Debiasing Without Protected Attributes: Latent Concept Erasure from Textual Profiles
Most fairness research in NLP assumes direct access to protected attributes such as gender, race, or nationality. In practice, however, such information is often unavailable due to privacy constraints, missing metadata, or legal restrictions, even though models may infer it from indirect textual cue...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12088v1
StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse
We present StanceNakba 2026, a shared task on stance detection in polarized social media discourse related to the Palestinian-Israeli conflict, organized as part of Nakba-NLP 2026 at LREC-COLING 2026. The task introduces two subtasks: Subtask A (Actor-Level Stance Detection), which classifies Englis...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12068v1
Agreement in Representation Space for Open-Ended Self-Consistency
Self-consistency improves LLM reasoning by sampling multiple outputs and selecting the most consistent answer, but existing formulations largely rely on exact matching and therefore remain limited to tasks with categorical outputs. In this work, we study self-consistency in open-ended generation tas...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12003v1
SportTrack
Live football scores + AI predictions in 5 languages
🧰 ToolsJun 10, 2026https://www.producthunt.com/products/sporttrack?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Decoding Multimodal Cues: Unveiling the Implicit Meaning Behind Hateful Videos
Hateful videos have become prevalent on online platforms, highlighting an urgent need for effective detection. However, existing studies primarily focus on binary classification and fail to provide contextual rationales that reveal the implicit meanings behind these judgments, significantly undermin...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11953v1
uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking
This report describes our participation in SemEval-2026 Task 8 on multi-turn retrieval and question answering. The task evaluates conversational systems across four domains (finance, cloud documentation, government, Wikipedia), and includes unanswerable queries where the available collection does no...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11945v1
Semantic Grading of Written Answers in Low-Resource Language Bangla Using a Fine-Tuned Lightweight Language Model
Bangla is among the world's most widely spoken languages, yet it remains underserved in educational NLP research. In many remote and rural regions, access to qualified subject teachers is limited, and written answers are consequently graded largely by hand, restricting timely and consistent feedback...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11931v1
An Ontology-Guided Multi-Anchor Graph Retrieval Framework for Traffic Legal Liability Determination
Traffic law liability determination is critical for assigning legal penalties, requiring the simultaneous identification of interdependent statutory provisions across multiple legal dimensions. However, existing retrieval-augmented generation methods suffer from a multi-dimensional retrieval bottlen...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11910v1
GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs
Research on Text-Attributed Graphs (TAGs) has gained significant attention recently due to its broad applications across various real-world data scenarios, such as citation networks, e-commerce platforms, social media, and web pages. Inspired by the remarkable semantic understanding ability of Large...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11898v1
External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs
Production LLM systems accumulate reusable operational experience, but the practical deployment issue is not merely whether such experience can help. It is how different serving strategies trade off quality against online cost under realistic constraints. Injecting external experience can improve ta...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11806v1
MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models
Video Large Multimodal Models have achieved remarkable progress in video understanding, yet they remain prone to hallucinations, where generated responses are not faithfully supported by the input video. In this paper, we propose MultiToP, a multimodal-context-aware visual token patching framework t...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11792v1
Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11786v1
Fast Speech Foundation Model Distillation Using Interleaved Stacking
Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11766v1
Automated Creativity Evaluation of Language Models Across Open-Ended Tasks
Large language models (LLMs) have achieved remarkable progress in language understanding, reasoning, and generation, sparking growing interest in their creative potential. Realizing this potential requires systematic and scalable methods for evaluating creativity across diverse tasks. However, most ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11762v1
Substrate Asymmetry in User-Side Memory: A Diagnostic Framework
User-side memory in LLMs is typically scored as a single "personalization" capability: given a user's history, is the output more user-aware? We show this aggregate metric hides opposite-direction failures. Memory factorises into at least three orthogonal axes -- behavioral consistency (style, voice...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11712v1
RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation
On-policy self-distillation (OPSD) provides dense, token-level supervision for reasoning models by aligning a model's own distribution with the distribution it produces under privileged context, typically a verified solution. However, we show that the learning signal drawn from this distributional g...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11709v1
MedCTA: A Benchmark for Clinical Tool Agents
To make clinically grounded decisions, medical AI agents are expected to go beyond simple recognition and be capable of tool retrieval, evidence acquisition, and integration. Existing benchmarks largely evaluate isolated perception or single-turn question answering, and therefore provide limited vis...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11702v1
Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
Long-horizon LLM agents are not trusted to run unattended: with no human watching, they confidently report success they never verified. We treat honesty -- bounding what an agent may claim at termination -- as a first-class metric for unattended autonomy, distinct from capability. We present Autopil...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11688v1
Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness
End-to-end task-success is the dominant way to evaluate LLM agents, but one aggregate number tells you that an agent regressed, not where. We present layer-isolated evaluation: a deployed ordering agent is decomposed into a fixed taxonomy of layers (ontology, intent, routing, decomposition, escalati...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11686v1
AIF Protocol
RSS for AI agents to publish and subscribe
🧰 ToolsJun 10, 2026https://www.producthunt.com/products/aif-protocol?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-BERT scales to 495 l...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11681v1
Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents
Large language model (LLM) agents struggle with long-horizon tasks due to their inherent statelessness, requiring all task-relevant information to be encoded in growing input contexts. The resulting degraded reasoning quality, increased inference cost, and higher latency necessitate efficient workin...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11680v1
Can AI Reason Like an Urban Planner? Benchmarking Large Language Models Against Professional Judgment
Problem, Research Strategy, and Findings: The rise of large language models (LLMs) raises a key question for urban planning: which forms of professional planning knowledge can AI replicate, and which still require human judgment? Although AI tools are increasingly used in planning practice, there is...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11678v1
Dummy Backdoor as a Defense: Removing Unknown Backdoors via Shared Internal Mechanisms for Generative LLMs
Backdoor attacks pose a serious threat to the safety and reliability of Large Language Models (LLMs), as they cause models to behave normally on clean inputs while producing attacker-specified responses when hidden triggers are present. Removing such unknown backdoors is particularly challenging whe...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11648v1
Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models
The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard grapheme-based ASR systems with comparatively little em...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11639v1
Multi-Agent Reasoning with Adaptive Worker Allocation for Stance Detection
Stance detection requires identifying an author's position toward a target, often from short-form texts where stance is implicit, indirect, or rhetorically framed. Although large language models (LLMs) achieve strong performance on this task, single-pass prompting can be brittle when multiple interp...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11609v1
When is Your LLM Steerable?
Activation steering offers a lightweight approach to control language models' behavior at inference time, but whether it succeeds or fails heavily depends on the prompt, concept, model, and steering configuration. Finding the regime and boundaries of successful steering typically requires expensive ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11599v1
GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs
Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node together with its neighbourhood. We introduce GraphInfer-Bench, a benchmar...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11562v1
Teaching Diffusion to Speculate Left-to-Right
Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs substantial inference costs due to inherently sequential token generation. Speculative decoding addresses this bottleneck by employing a lightweight draft model ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11552v1
Pretrained self-supervised speech models can recognize unseen consonants
Modern pretrained self-supervised automatic speech recognition models are trained on large-scale audio data to encode speech into contextualized representations. However, their training data are heavily skewed toward high-resource languages with little data from low-resource languages, raising conce...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11542v1
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents
Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph structures, but structural complexity alone does not guarantee reali...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12087v1
When Does Language Matter? Multilingual Instructions Reveal Step-wise Language Sensitivity in Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown strong performance in language-conditioned robotic manipulation, yet their robustness to linguistic variation remains poorly understood. In this work, we present the first systematic multilingual evaluation of VLA models by translating the LIBERO benchm...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11906v1
Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills
Scientific discovery workflows usually contain and rely heavily on lab notes, where researchers record observations, interpret uncertain results, and plan follow-up experiments. Such informative lab notes preserve evolving scientific reasoning and author uncertainty, rather than polished final resul...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11897v1
I Understand How You Feel: Enhancing Deeper Emotional Support Through Multilingual Emotional Validation in Dialogue System
Emotional validation - explicitly acknowledging that a user's feelings make sense - has proven therapeutic value but has received little computational attention. Emotional validation in dialogue systems can be decomposed into (i) validating response identification, (ii) validation timing detection, ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11875v1
Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild
Large language models are now widely used for everyday learning, but the underlying interactions are typically unstructured chats rather than following a curriculum. Unlike formal online learning systems, these interactions carry no prior record of the student, so any estimate of what the student al...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11744v1
WebCLI - Agent Interface for the Web.
Stop doing web chores your agents can do instead.
🧰 ToolsJun 10, 2026https://www.producthunt.com/products/webcli-agent-interface-for-the-web?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA
We study whether grounded reasoning supervision from abundant 2D medical images can improve 3D medical VQA when both input types are aligned through a common reasoning interface. We introduce UniReason-Med, a single-checkpoint framework that processes either a 2D image or a slice-serialized 3D volum...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11740v1
ICA Lens: Interpreting Language Models Without Training Another Dictionary
Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11722v1
Improving Cross-Format Robustness in Language Models with Multi-Format Training
Large language models often remain sensitive to answer format: a question solved correctly in one form may fail in another semantically equivalent form. To study this gap, we define cross-format robustness as the extent to which a model answers the same underlying question consistently across format...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11643v1
Kuramoto Attention: Synchronizing Self-Attention on the Torus
We introduce Kuramoto attention, a self-attention layer in which each hidden coordinate is an angle. The layer scores tokens by gated cosine similarity, attends over previous phase states, and updates each token by the tangent component of the attention-weighted circular mean. Because the values are...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11585v1
AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents
Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, fli...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12142v1
Q-Fold: Query-Aware Focus-Context Spatio-Temporal Folding for Long Video Understanding
Long-video understanding remains challenging for multimodal large language models, because temporally extended videos often contain thousands of frames and are therefore expensive to process exhaustively. Existing methods usually construct compact visual inputs from long videos under a limited visua...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12125v1
DAM-VLA: Decoupled Asynchronous Multimodal Vision Language Action model
Vision-language-action (VLA) models inherit a shared synchronous clock from vision-language pretraining, processing every input at one rate. This is misaligned with physical interaction, where a high-frequency modality changes at hundreds of hertz, vision evolves more slowly, and language stays cons...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12105v1
Tac-DINO: Learning Vision-Tactile Features with Patch Alignment
Touch is the primary medium through which humans interact with the environment. Currently, tactile learning mainly focuses on image-level pretraining or alignment. However, tactile signals correspond to local object contact, while research into scale alignment and holographic matching remains limite...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12069v1
Performance Analysis of YOLOv11 and YOLOv8 for Mixed Traffic Object Detection under Adverse Weather Conditions in Developing Countries
In modern vehicular systems, robust performance under harsh conditions has become a critical problem of autonomous driving. Our study delivers a comprehensive evaluation of the newest iteration of the YOLO series, which is YOLOv11 Nano architecture benchmarked against the widely adopted YOLOv8 Nano ...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12066v1
SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection
Video understanding is a crucial part of computer vision, with numerous application scenarios. With the increasing popularity of mobile devices, an increasing number of efforts are trying to deploy video understanding models on them. However, existing video understanding models are difficult to depl...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12033v1
ViT-FREE: Efficient Face Recognition via Early Exiting and Synthetic Adaptation
Vision Transformers (ViTs) have gained significant attention in computer vision and shown strong potential for face recognition (FR). However, their high computational cost makes deployment on resource-constrained devices challenging, motivating the need for methods that balance efficiency and accur...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.12023v1
ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction
In this report, we present our third-place solution for the DataMFM Challenge Track 1: Document Parsing. This track requires models to recover structured Markdown documents from document page images while preserving textual content and document structure. To address the complementary requirements of...
📄 ResearchJun 10, 2026http://arxiv.org/abs/2606.11977v1