AI News Archive: June 25, 2026 — Part 12

Sourced from 500+ daily AI sources, scored by relevance.

The Geometry of Updates: Fisher Alignment at Vocabulary Scale
Training-free source selection for LLM families with shared vocabularies arises in scientific string domains such as SMILES, protein, and genomic sequences, where candidate corpora share a tokenizer but differ in prediction targets. This creates an activation-dark regime: representation-similarity m...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27242v1
BrowserBash
CLI that turns plain-English into real browser tests
🧰 ToolsJun 25, 2026https://www.producthunt.com/products/browserbash?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes
We argue that safety classifiers should model user intent as an explicit signal between the prompt and the final label. To study this, we introduce AIMS, a human-annotated dataset of 1,724 difficult safety prompts, each paired with an intent description and harm label. We use AIMS to evaluate intent...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27210v1
Forecasting With LLMs: Improved Generalization Through Feature Steering
Successful forecasting involves identifying patterns between historical and future states of the world which generalize to future observations. We apply LLMs to a variety of forecasting tasks and inspect their internal states using sparse autoencoders to understand whether they appear to rely on tim...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27199v1
HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models
Large vision-language models (LVLMs) have recently shown immense potential in automated content moderation, sparking growing interest in developing harmful-video benchmarks. However, we identify two primary limitations in existing works: 1) The multi-layered characteristics of harmful videos are ove...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27187v1
The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans
Humans flexibly adapt their reasoning strategies to the requirements of a given problem. Large language models (LLMs) have performed well on many cognitive tasks, however, it is unclear whether this accuracy is a result of pattern matching from training data or flexible reasoning. Here, we introduce...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27103v1
Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization
Building general-purpose role-playing agents that faithfully portray any character from a natural-language profile remains challenging. The dominant paradigm -- supervised fine-tuning -- encourages behavioral mimicry without deep, human-like internal thought processes, resulting in poor out-of-distr...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27025v1
Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA
Multimodal large language models (MLLMs) applied to Medical Visual Question Answering (VQA) tend to produce overconfident outputs regardless of actual correctness, and existing verbalized confidence calibration methods, developed primarily for text only LLMs, do not account for the multimodal nature...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27023v1
Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries
With a profusion of jailbreaks for LLMs now widely known, a growing concern is that non-expert malicious actors ("the average Jane") could elicit actionable responses to malicious requests. In this work, we examine whether this concern is justified. A non-expert malicious actor requires two ingredie...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26936v1
SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages
Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare context remains largely unknown. In this study, we first conduct the systematic audit of ASR performance on real-world psychiatri...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26901v1
Information-Aware KV Cache Compression for Long Reasoning
Reasoning capability has advanced rapidly in large language models (LLMs), leading to an increasing size of key-value (KV) cache in both prefilling and decoding stages. Existing KV cache compression methods mainly rely on attention weights to estimate token importance. While attention effectively ca...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26875v1
Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT
Deploying large language models (LLMs) on Industrial Internet of Things (IIoT) edge devices demands extreme compression, yet existing structured pruning methods collapse at high compression ratios due to one-shot importance estimation, and their cross-architecture behavior remains unpredictable. Thi...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26861v1
KARLA: Knowledge-base Augmented Retrieval for Language Models
We propose a new method that allows an LLM to automatically pull in factual knowledge from a knowledge base during token generation. This means that (1)~factual knowledge in the LLM output can be updated without retraining the LLM, (2)~facts in the LLM output can be traced to the knowledge base for ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26807v1
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning
Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy self-distillation offers dense token-level supervision, yet ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26790v1
AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing
Traditional dynamic pricing models in large-scale e-commerce suffer from limited interpretability, poor utilization of unstructured information, and misalignment with long-term business objectives such as cumulative Gross Merchandise Value (GMV), Return on Investment (ROI) and milestone achievement....
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26787v1
Structure Before Collapse: Transient semantic geometry in next-token prediction
Neural Collapse predicts that balanced one-hot classification pushes model representations to be equally far from each other; a symmetric configuration that depends only on the output label and ignores any semantic similarity in the inputs. This creates a puzzle: next-token prediction language model...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26749v1
Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification
In today's fast-paced information era, logical fallacies, defined as defective patterns of reasoning, inevitably contribute to the growth of information disorder. However, often fallacies appear in nuanced forms that complicate automated classification. In this study, we investigate whether merging ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26698v1
VTT for Mac
Voice-to-text for macOS with a fully on-device option
🧰 ToolsJun 25, 2026https://www.producthunt.com/products/vtt-for-mac?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29
SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context
Personalized language-model assistants are often evaluated through a memory lens: can a model recall preferences users have explicitly stated in dialogue? More comprehensive personalization demands a harder capability -- inferring what users care about from the multimodal traces they naturally leave...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26654v1
CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs
In this paper, we present CAT-Q, Cost-efficient and Accurate Ternary Quantization, for compressing and accelerating LLMs. Unlike existing state-of-the-art ternary quantization methods that rely on data-intensive and costly quantization-aware training to mitigate severe performance degradation, CAT-Q...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26650v1
DanceOPD: On-Policy Generative Field Distillation
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and lo...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27377v1
Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline
Whether political elites organise into rent-seeking coalitions that capture public resources or civic networks that sustain governance is a central question in comparative politics. Yet observing these complex, informal, and adversarial ties at scale has historically required intensive manual coding...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27347v1
Multilingual Reasoning Cascades Need More Context
Translation cascades for reasoning translate the query from another language to English, reason in English, and translate the answer back to the original language. This is a competitive approach to multilingual reasoning, but structurally lossy, since each stage discards information later stages may...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27306v1
LMs as Task-Specific Knowledge Bases: An Interpretability Analysis
Language models (LMs) capture large amounts of factual knowledge applicable to a wide range of tasks, motivating the view of their parameters as a knowledge base. An important property of knowledge bases is that different queries for the same fact return consistent results, drawing on a single sourc...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27237v1
Towards Explainable Adjudicative Variance: Quantifying Judicial Discretion via Gated Multi-Task Learning
Legal outcome prediction must disentangle objective case facts from adjudicative context. Merit-based rulings rely on factual evidence while technical disposals may hinge on judicial discretion. We propose a Judge-Aware Gated Multi-Task Learning architecture that explicitly models this distinction. ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27069v1
MinGram: A Minimalist Unigram Tokenizer with High Compression and Competitive Morphological Alignment
The Unigram tokenizer uses an elegant representation which makes it straightforward to edit vocabularies, but its training is comparatively heavy and complex. We introduce MinGram (Minimalist Unigram), which keeps the token-list representation but simplifies training using a BPE-derived seed vocabul...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27019v1
RedVox: Safety and Fairness Gaps in Speech Models Across Languages
Speech-capable models are increasingly deployed in real-world applications across languages. Yet their safety and fairness beyond English settings and under naturalistic conditions remain understudied. We survey safety reporting practices across state-of-the-art speech model releases, finding that o...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26968v1
Term-Centric Hierarchy Induction from Heterogeneous Corpora
Organizing knowledge from diverse text sources into interpretable hierarchies is crucial for tasks such as policy analysis, innovation monitoring, and exploratory domain mapping. Existing taxonomy induction methods typically rely on document-level representations that capture entire documents rather...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26963v1
GAVEL: Grounded Caption Error Verification and Localization
Vision-language models (VLMs) often produce hallucinated or inconsistent outputs, where text and images are not properly aligned. Addressing this issue requires not only detecting misalignment but also explaining the discrepancy and localizing its visual evidence. We introduce GAVEL (Grounded Captio...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26923v1
Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension
Language-model representations provide structured, high-dimensional annotations of naturalistic language stimuli and can serve as informative neural predictors during comprehension. We analyzed locked derived data from Brain Treebank, MEG-MASC, and Podcast ECoG with eight frozen language models, blo...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26880v1
AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems
Recommendation algorithm iteration is moving from an artisanal, engineer-bound process toward an industrialized research loop, but this transition remains blocked by a structural execution bottleneck: the idea-to-launch cycle still depends on human engineers to generate hypotheses, modify production...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26859v1
FBK's Long-form SpeechLLMs for IWSLT 2026 Instruction Following
This paper describes our submission to the IWSLT 2026 Instruction Following shared task. SpeechLLMs are developed for both short-form and long-form speech instruction following under constrained settings. For the short track, strong performance is achieved on MCIF, with a SIFS score of 2.0708. For t...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26819v1
Reproducibility Study of "AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models"
Fang et al. (2025) introduced a null-space constrained projection, named AlphaEdit, for locate-then-edit knowledge editing methods, theoretically guaranteeing that edits do not disrupt previously preserved knowledge, and reports substantial gains over existing editing methods on LLaMA3, GPT2-XL, and...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26783v1
Evaluation Pitfalls and Challenges in Multimedia Event Extraction
Multimedia event extraction aims to jointly identify events and their arguments across multiple modalities, such as text and images, to support more comprehensive event understanding. While recent work reports steady and substantial progress, the reliability and comparability of these results critic...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26775v1
ConvMemory v3: A Validity Context Layer for Conversational Memory via Target-Conditioned Relation Verification
Conversational memory retrieval optimizes relevance, yet a retrieved memory can be relevant and simultaneously outdated: a later turn updates, corrects, or supersedes it. ConvMemory v3 adds a validity context layer that detects and surfaces this update evidence through target-conditioned relation ve...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26753v1
HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction
We present HyperDFlash, a block-parallel speculative decoding framework tailored to the novel multi-hyper-connection (MHC) architecture proposed by DeepSeek-V4. Despite the strong initial-token drafting performance of the native Multi-Token Prediction (MTP) module in DeepSeek-V4, its draft accuracy ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26744v1
Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation
In order to screen a prompt or a response, the recent guardrail methods generate a chain-of-thought (CoT) before they issue a verdict. This design follows a common belief that step-by-step reasoning improves a decision. However, CoT also makes the guard heavy and slow, because the model must generat...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26686v1
PhysReflect-VLA: Physical Feasibility and Self-Reflective Regulation for Reliable Vision-Language-Action Policies
Long-horizon robotic manipulation is highly sensitive to physically infeasible transitions, contact-induced disturbances, and the lack of effective self-correction during execution. Although Vision-Language-Action (VLA) models provide strong task grounding through multimodal learning, they typically...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27146v1
PAMAE: Phase-Aware-MoE Action Experts Towards Reliable Flow-Matching Vision-Language-Action Policies
Reliable action generation for multi-stage robotic manipulation remains challenging for Vision-Language-Action (VLA) models. While existing flow-matching VLA policies offer strong multimodal grounding and generalization, they typically employ a single shared action expert, limiting their ability to ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27144v1
FlameVQA: A Physically-Grounded UAV Wildfire VQA Benchmark with Radiometric Thermal Supervision
Wildfire monitoring from UAVs requires reliable reasoning over complex aerial scenes, where smoke, scale variation, and occlusions often limit RGB-only interpretation. We introduce FlameVQA, a multiple-choice visual question answering benchmark for UAV-based wildfire intelligence built on FLAME 3, l...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27128v1
Risk-Aware Selective Multimodal Driver Monitoring with Driver-State World Modeling
Continuous driver monitoring in automated vehicles requires low-latency inference while avoiding unsafe decisions under uncertain driver states. Large vision-language models provide broad multimodal priors, but their latency and limited reliability in this setting make them unsuitable as always-on i...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26922v1
PlanRL: A Trajectory Planning Architecture for Reinforcement Learning-based Driving Experts
Reinforcement learning (RL) has become a prominent framework for developing driving experts in autonomous vehicles. However, most existing RL-based experts are designed to output direct control commands (e.g., throttle, steering), which suffer from a lack of interpretability, high spatial complexity...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26858v1
Improving Vision-Language-Action Model Fine-Tuning with Structured Stage and Keyframe Supervision
Vision-Language-Action (VLA) models have shown strong potential for generalizable robotic manipulation. During fine-tuning, however, action supervision applies equally across all timesteps, without structured supervision on which manipulation stage the robot is in or what the next gripper-event targ...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26801v1
PressMimic: Pressure-Guided Motion Capture and Control for Humanoid Robot Imitation
Humanoid motion imitation requires not only accurate perception of human kinematics but also faithful reproduction of physical interactions with the environment. However, existing pipelines rely primarily on vision-based motion capture and kinematic imitation, largely ignoring contact dynamics, lead...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26741v1
LAMP: Lane-Aligned Motion Primitives for Feasible Trajectory Prediction
Motion forecasting is essential for autonomous driving systems to enable safe decision-making and planning in complex driving scenarios. While existing predictors excel at minimizing standard displacement errors, they often overlook the adherence to lane topology of multimodal predictions, particula...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26661v1
Inference-Time Robot Behavior Steering through Physically-Aware Reconfiguration of Task-Structure
A central challenge in deploying learned robot policies is inference-time behavior steering: redirecting a policy at test time to satisfy user preferences not anticipated during training, without retraining. Existing methods fail in two modes: end-to-end methods require fine-tuning or expert-level g...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.26588v1
Proposal-Conditioned Latent Diffusion for Closed-Loop Traffic Scenario Generation
Closed-loop traffic simulation remains challenging because it must generate interactive multi-agent behaviors that are scene-consistent and controllable throughout rollout. Prior diffusion-based approaches achieve strong realism, but their computational cost can hinder deployment in time-constrained...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27123v1
ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models
In embodied intelligence, safety is a prerequisite for reliable robot deployment in the physical world. Current vision-language-action (VLA) models continue to advance toward general-purpose task capability, yet their embodied safety limits remain poorly understood. To address this gap, we introduce...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27079v1
RelAfford6D: Relational 6D Affordance Graphs for Constraint-Driven Robotic Manipulation
Bridging abstract semantics and precise physical control remains a fundamental challenge in open-world robotic manipulation. While recent data-driven policies show promise, their reliance on isolated contact points or latent affordance embeddings lacks the rigorous kinematic constraints necessary fo...
📄 ResearchJun 25, 2026http://arxiv.org/abs/2606.27036v1
Sidegent
Learn to build AI agents by actually building them
🧰 ToolsJun 25, 2026https://www.producthunt.com/products/sidegent?utm_campaign=producthunt-api&utm_medium=api-v2&utm_source=Application%3A+the500feed+%28ID%3A+283491%29