AI News Archive: June 9, 2026 — Part 25
Sourced from 500+ daily AI sources, scored by relevance.
- Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling
The rapid progress of large language models (LLMs) has opened up a new frontier for automatic speech recognition (ASR), making their effective integration a critical and challenging research direction. To this end, this work proposes a projector-based LLM-ASR framework targeting the key challenges o...
- Recovering the Zipfian Distribution in Unsupervised Term Discovery
Unsupervised term discovery involves segmenting unlabelled speech into word- or syllable-like units and clustering these into a lexicon of candidate types. True lexicons follow a Zipfian distribution, yet the dominant centre-based clustering approach -- K-means -- produces a more uniform distributio...
- Anchoring the Unknown: Open-Set Model Attribution via Proxy-Anchor Learning
The proliferation of text-to-speech (TTS) systems capable of generating realistic synthetic speech poses growing challenges for audio forensics. While binary deepfake detection has received considerable attention, source tracing (i.e., identifying which TTS system produced a given audio sample) rema...
- A Lightweight Dual-Factor Acoustic Authentication System via Cascaded GMM-DTW Architecture for Edge Computing
This paper presents a lightweight, cascaded GMM-DTW dual-factor voice lock system for resource-constrained edge environments. By utilizing a shared MFCC feature space, the framework implements a sequential defense mechanism combining GMM speaker screening and DTW passphrase verification. To counter ...
- SSL-GMMVC: Interpretable Voice Conversion via Locally Linear GMM Transforms in Self-Supervised Representation Space
We introduce SSL-GMMVC, an interpretable voice conversion method in self-supervised speech space. The method models paired source-target features with a Gaussian mixture model and performs conversion as a posterior-weighted sum of affine transforms. This yields locally linear transformations that ad...
- miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity
Multimodal large language models (MLLMs) have recently shown strong potential as point-wise rerankers by directly modeling query--document relevance through next-token prediction. However, point-wise reranking suffers from substantial repeated computation across query--document pairs, while the caus...
- Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training
The use of GRPO-style algorithms has become the standard strategy for training LLM search agents under outcome-only rewards. With these algorithms, a query contributes to parameter updates only when its rollout group mixes successes and failures; all-correct (too-easy) and all-incorrect (too-hard) g...
- Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations
Cross-domain recommendation is a core problem in content-to-e-commerce platforms. Its objective is to leverage user interactions with content to infer potential purchasing intent on the e-commerce side, thereby enhancing conversion rates and commercial value. However, in real industrial scenarios, c...
- From Prompt to Purchase: How AI Brand Recommendations Move Consumers on the Open Web
When a conversational assistant recommends a brand to a user with no recent observed engagement, that user's same-name Google search rises +4.3 percentage points (pp) [3.1, 5.5], visits to the brand's own site +2.4 pp [1.4, 3.5], and brand-specific retailer-page visits +1.0 pp [0.3, 1.7] over matche...
- Beyond Patches: Superpixel Token-based Transformers for Attribute-Specific Fashion Retrieval
Attribute-Specific Fashion Retrieval (ASFR) aims to improve fine-grained image retrieval by focusing on specific attributes. However, existing patch-based attention and Transformer methods often misalign with irregular attribute regions and are prone to background noise, limiting their ability to ca...
- STORM: Stepwise Token Optimization with Reward-Guided Beam Search
Modern retrieval increasingly relies on dense and learned-sparse neural models that are effective but require encoding the entire corpus into a specialized index, rebuilt whenever the model changes. Lexical retrievers like BM25 stay efficient and transparent on a standard inverted index that need no...
- SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval
Agent skill libraries are becoming routable software assets: a retrieved skill can contribute instructions, scripts, resource bindings, and execution assumptions to an agent. This makes skill retrieval more than broad relevance matching. A retriever can find the right capability family yet expose th...
- Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis
Muon collider research spans accelerator physics, detector instrumentation, and high-energy phenomenology, with relevant evidence scattered across a rapidly expanding and heterogeneous body of scientific literature. As high-energy physics (HEP) increasingly explores agent-assisted analysis workflows...
- SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers
Semantic-ID (\sid) tokenizers are increasingly reused as standalone artifacts in generative recommendation: an exported item-to-code mapping becomes the address space that a later sequence generator must use. These mappings rarely come with a common inspection interface, so coverage gaps, full-code ...
- TileBac: A Benchmark CryoEM Dataset of Bacteria in Ultralow-Dose Montage Tiles
Current segmentation models are capable of routine identification of biological features in noisy cryogenic electron microscopy (cryoEM) images. However, there are still challenges with complete segmentation of high boundary, thin objects such as bacterial cell envelopes and flagella. Moreover, ultralow-dose cryoEM images pose as an additional challenge to boundary distinctions between the object and background. Here, we present TileBac, a benchmark dataset of ultralow-dose montage tiles of Pantoea sp. YR343 to segment bacterial inner and outer membranes for evaluation of model effectiveness. We show that foundation models outperform convolutional neural networks at continuous bacterial cell envelope segmentation despite having lower performance metrics. We release the TileBac benchmark dataset on Hugging Face for further insights into model architecture development.
- PhysiCase: Development and dual-layer validation of synthetic cases for health professional education: A pilot study leveraging Generative AI
High-quality, domain-specific datasets are foundational to advancing educational tools and AI systems in healthcare, yet assembling case repositories from real-world clinical records faces substantial privacy, ethical, and licensing barriers. Synthetic data generation offers a compelling pathway forward, but educational cases require rigorous validation to ensure clinical plausibility and pedagogical utility. This pilot study introduces PhysiCase, a dual-layer validation pipeline for synthetic case generation and evaluates the feasibility of combining automated LLM-based screening with expert educator review. We generated 128 synthetic musculoskeletal(MSK) cases using four frontier large language models (GPT-4.1, GPT-4o, Google Gemini 2.5 Pro, and Llama 4 Scout) across 28 clinical conditions. Cases underwent automated quality screening using an "LLM-as-judge" framework (DeepEval) assessing prompt alignment, JSON correctness, answer relevance, bias, toxicity, and completeness. Ninety cases (70.3%) passed automated filtering and proceeded to expert evaluation by four MSK physiotherapy educators, who rated medical accuracy, realism, fidelity, relevance, and usability on 5-point Likert scales. GPT-4.1 demonstrated the highest automated pass rate (96%) and strongest expert ratings (medical accuracy 4.10/5, usability 4.38/5), while Llama 4 Scout showed the lowest pass rate (33.3%) and expert ratings. Expert-evaluated cases achieved strong content validity indices for usability (97.5%), relevance (97.5%), and realism (95%), though medical accuracy showed greater variance (CVI 87.5%). Cross-layer correlation analysis revealed that automated completeness metrics moderately aligned with expert usability ratings , while answer relevance and prompt alignment showed weak or negative correlations with clinical correctness. Qualitative analysis identified three primary failure modes: reductive logic, biomechanical inconsistency, and administrative/contextual gaps. The dual-layer validation framework proved methodologically viable: automated screening efficiently reduced expert review burden, while human judgment remained indispensable for detecting subtle clinical reasoning failures. LLM-generated synthetic cases has the potential to meet practical educational needs for MSK physiotherapy, but expert validation is essential to safeguard clinical accuracy. These findings support a scalable division of labour for synthetic case development, with targeted improvements to prompting and automated reasoning checks needed to address identified "nuance gaps." The code for this paper is available on https://github.com/kwid-ai/PhysiCase
- Sensor Geometry, Not Signal Processing, Limits Opportunistic Detection of Capillary-Refill-Like Signals by Rule-Based and Language-Model Methods in Archived ICU Waveforms
Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.
- Technology acceptance of machine learning in life sciences: the role of hype perception and journal impact factor.
Machine learning (ML) has emerged as a transformative technology across biomedical and life science sectors, with applications spanning drug discovery, medical imaging, genomics, and clinical decision support (Goecks et al., 2020; Patel et al., 2020). Despite exponential growth in ML-related publications, from fewer than 100 articles in 2003 to nearly 25,000 by 2021 (NCBI, 2022), adoption among industry professionals remains uneven and sector-dependent. Understanding what drives or inhibits this adoption is critical for organisations seeking to leverage ML capabilities in research and clinical practice. Technology adoption in organisational contexts has been extensively studied through the Technology Acceptance Model (TAM), originally proposed by Davis (1989) and subsequently extended to incorporate external variables influencing perceived usefulness (PU) and perceived ease of use (PEU) (Venkatesh & Davis, 1996). While TAM has been applied across multiple industries, its application within biomedical and life science contexts remains limited, and the industry-specific factors that shape ML acceptance in this sector have not been systematically examined. Two external variables are particularly relevant to life science professionals. First, the bibliometric journal impact factor (JIF) functions as a cognitive signal of scientific credibility, a sector where evidence-based decision-making is culturally embedded, and publication quality serves as a proxy for technological legitimacy (Garfield, 1996). Second, technology hype, operationalised through the Gartner Hype Cycle framework, represents a social influence variable that shapes organisational expectations and investment decisions around emerging technologies (Gartner Inc., 2018). Whether these variables influence ML acceptance among life science professionals, alongside individual knowledge and experience, has not been empirically tested. This study addresses that gap by investigating ML technology acceptance among 213 biomedical and life science professionals across EMEA, LATAM, and North America, using a cross-sectional quantitative survey and PLS-SEM analysis. The TAM model is extended with three external variables, JIF, technology hype, and prior knowledge and experience, to test their influence on PU and PEU in this specific professional context. Additionally, the study examines demographic and regional differences in ML acceptance, with particular attention to variation between academic researchers and healthcare professionals. The findings contribute a validated, sector-specific extension of TAM for life sciences, provide actionable insights for organisations seeking to accelerate ML implementation, and establish a framework for future subsector-specific research.
- Characterizing Documented Psychosocial Stressors in Pediatric Psychiatric Emergencies with an Open-Weight Large Language Model
Objective: To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort. Materials and Methods: We identified emergency department presentations at Cincinnati Children's Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients younger than 18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss' kappa among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes. Results: Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-versus-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human {kappa} 0.71-0.94; human-plus-LLM {kappa} 0.70-0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with less than 80% stability to 82.7% for classifications with 100% stability (Pearson r = 0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%). Discussion: Agreement varied by construct and was strongest when repeated model outputs were stable. Conclusion: Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.
- Multimodal MRI Characterization of Nucleus Basalis of Meynert Degeneration: Structural Atrophy and Free-water Diffusion in Parkinson's Disease Cognitive Impairment
Background: Cognitive impairment in Parkinson's disease (PD) is linked to degeneration of the cholinergic basal forebrain, particularly cholinergic nucleus 4 (Ch4) in the nucleus basalis of Meynert. Structural and diffusion MRI separately detect this degeneration, but few studies have combined these modalities across the PD cognitive spectrum. Methods: We analyzed 92 participants: 14 healthy controls (HC), 35 PD with normal cognition (PD-NC), 33 with mild cognitive impairment (PD-MCI), and 10 with dementia (PDD). For Ch4 and cholinergic nuclei 1, 2, and 3 (Ch1-3) in the medial septal/diagonal band complex, we determined TIV-normalized gray matter density (GMD) and free-water (FW) fraction. We evaluated group differences, cognitive correlations, adjusted multivariable regression, and exploratory ROC discrimination. Results: Ch4 GMD was significantly lower in PDD compared to PD-MCI (p=0.007), PD-NC (p<0.001), and HC (p<0.001). Ch4 GMD was also lower in PD-MCI versus HC (p=0.028); the PD-MCI versus PD-NC difference was not significant after correction (p=0.074). Ch1-3 GMD was lower in PDD versus PD-NC (p=0.008) and HC (p=0.009). Ch4 and Ch1-3 FW were elevated in PDD versus all other groups (all p<0.01). Among PD patients (n=78), MoCA was positively correlated with Ch4 GMD ({rho}=0.49) and Ch1-3 GMD ({rho}=0.42) and negatively correlated with Ch4 FW ({rho}=-0.51) and Ch1-3 FW ({rho}=-0.40; all p<0.001). In the full four-metric model, Ch4 GMD and Ch4 FW were the only independent basal forebrain predictors (Ch4 GMD {beta}=+2.04, p<0.001; Ch4 FW {beta}=-1.46, p=0.005) of MoCA score. The combined Ch4 GMD + Ch4 FW model showed high discrimination for PDD versus non-demented PD (AUC=0.934; optimism-corrected AUC=0.925). Conclusions: Structural and free-water diffusion MRI provide complementary information about Ch4 degeneration in PD. The combined Ch4 model showed promising exploratory discrimination of PDD; validation in larger independent samples is needed.
- Acceptability and Perceptions of Artificial Intelligence in Organized Breast Cancer Screening: A Study of French Women
This study aims to assess women's perceptions of artificial intelligence (AI) used in breast cancer screening in France by examining their knowledge of AI and the barriers to their participation in organized screening. The results of a survey conducted in June 2025 among a national sample of 2000 women (aged 40-75) reveal limited participation and persistent concerns among women. Nevertheless, despite a low awareness of specific AI applications, a large majority of the women surveyed are very favorable to the use of AI in breast cancer diagnosis, even considering it a lever to increase screening participation.
- The impact of B1+ inhomogeneity on image quality metrics and morphometric statistical inferences at 7 T MRI
Introduction: Structural neuroimaging relies on T1-weighted (T1w) magnetic resonance imaging (MRI) for brain morphometry, yet at 7 Tesla (7 T) transmit field (B1+) inhomogeneity remains a major source of bias. Although Magnetization Prepared 2 Rapid Acquisition Gradient Echoes (MP2RAGE) improves the tissue contrast, residual B1+ effects may persist and may be exacerbated in aging or clinical populations, where anatomical and physiological factors further challenge image quality and preprocessing. The impact of B1+ inhomogeneity on automated quality assessment and morphometric statistical inference remains insufficiently understood. Methods: Submillimeter 7 T MP2RAGE brain acquisitions from carriers of a mitochondrial gene mutation (m.3243A>G) and controls were retrieved from previous studies. Image quality before and after B1+ inhomogeneity correction was assessed by multiple automated pipelines. Case-control morphometric studies, including regional volume and mean cortical thickness, were analyzed in both registration based and deep learning based segmentation frameworks. Changes in image quality metrics (IQMs) and morphometric statistical significance were evaluated to determine the impact of B1+ inhomogeneity correction. Results: Overall image quality rating and metrics sensitive to intensity non-uniformity and topological integrity consistently improved after B1+ inhomogeneity correction. However, its impact on morphometric statistical inferences was strongly method-dependent. Some pipelines showed redistribution of significant regions, whereas others predominantly demonstrated increased effects in sensitivity. Across methods, B1+ inhomogeneity correction altered the findings of morphometric analyses, particularly in cortical regions. Conclusion: Residual B1+ inhomogeneity at 7 T substantially influences both image quality control and morphometric evaluations. Current automated quality control approaches can hardly capture these effects reliably. B1+ inhomogeneity correction will not only improve intensity uniformity, but also change sensitivity of morphometric statistical inferences. To establish reliable morphometric biomarkers at UHF strengths, explicit B1+ correction and customized preprocessing are practically necessary and highly recommended.
- STELLAR: A flexible ensemble learning framework integrating rare variants to enhance polygenic risk prediction
Whole-exome and whole-genome sequencing technology has enabled the discovery of rare genetic variants associated with human health and diseases. However, existing statistical methods used for rare variant association testing are not well-suited for building genetic risk prediction models that jointly incorporate rare and common variants. We propose STELLAR, a flexible ensemble learning-based approach to compute rare variant polygenic risk scores (PRS) using association summary statistics to enhance conventional common variant PRS. Our method combines burden-based and penalty-based rare variant analysis and leverages functional annotation information to prioritize potentially causal variants within the prediction models. In simulation studies, PRS using STELLAR consistently showed the highest prediction accuracy compared to models using common variants alone or rare variant burdens. Applied to UK Biobank whole-exome sequencing data (n=310,831) across eight continuous and five binary traits, STELLAR significantly improved prediction accuracy, refined stratification of individuals at the highest genetic risk beyond common variants, and prioritized biologically relevant genes. STELLAR provides a scalable strategy to incorporate rare variants into PRS in addition to common variants, advancing precision risk prediction and enabling more comprehensive assessment of genetic contributions to complex diseases.
- Subthalamic DBS Engages Right-lateralized Frontal Control to Improve Gait Adaptation in Parkinson's
Adapting ongoing gait patterns to environmental challenges is essential for safe navigation through the environment. Impairment of gait adaptation is common in many neurodegenerative disorders, such as Parkinson's disease (PD), where it hampers mobility and limits quality of life. The neural control of gait adaptation remains largely unclear, thereby limiting the development of targeted treatments, such as deep brain stimulation of the subthalamic nucleus (STN-DBS). We integrated clinical, kinematic, brain metabolic imaging, and electrophysiological data, obtained during a fully immersive virtual reality overground walking task, to characterize the neural underpinnings of gait adaptation performance during dynamic obstacle avoidance and its improvement with STN-DBS. Movement kinematics, brain oscillatory activity, and metabolic activation were simultaneously acquired in 12 patients with PD during rest and gait adaptation, under active or paused STN-DBS, using inertial measurement units, electroencephalography, and three separate [18F]fluorodeoxyglucose positron emission tomography scans. Eight age-matched healthy subjects completed the same task for comparative kinematic analyses. All patients showed significant clinical improvement with STN-DBS. During the gait adaptation task with paused stimulation, patients exhibited increased metabolic activity in the cerebellum and sensorimotor cortex. Active STN-DBS selectively enhanced thalamic and superior frontal gyrus (SFG) metabolism, while concomitantly reducing cerebellar uptake. Right-lateralized SFG metabolism correlated with gait adaptation performance, with DBS-driven shifts toward greater right SFG activity predicting the magnitude of gait adaptation improvement. This correlation was independent of baseline asymmetry in clinical impairment, electrode placement, or structural connectivity to the SFG. Of note, STN-DBS amplitude asymmetry emerged as an independent predictor of right-lateralization of SFG metabolism. EEG recordings confirmed this lateralized network modulation, with theta-band asymmetry paralleling PET findings. Our findings identify a lateralized thalamo-cortical network supporting gait adaptation in PD and highlight a distinctive role for the SFG. We further show that effective STN-DBS acts as a lateralized regulator, dynamically rebalancing cortico-thalamic circuits to support context-appropriate gait control. The observed right-hemispheric lateralization may foster novel image-guided programming strategies to enhance the consistency and effectiveness of gait control in PD.
- Aperiodic and oscillatory activity of the human brain during induced emotional states
Normal emotional experience depends on dynamic modulation of neural excitability across limbic and prefrontal circuits, yet the spectral markers that reflect these shifts in humans remain incompletely understood. In this study, we combined a validated video-based emotion induction paradigm with stereotactic electroencephalography (SEEG) in 31 patients with drug-resistant epilepsy to investigate how positive and negative affective states modulate oscillatory and aperiodic (asynchronous) neural activity. Using spectral parameterization to dissociate oscillatory power from the aperiodic 1/f component, we found that emotional valence robustly altered the aperiodic slope in a regionally specific manner: negative valence flattened the slope in thalamus, posterior insula, and posterior cingulate cortex, whereas positive valence produced flattening in dorsolateral prefrontal cortex. Simultaneous oscillatory changes included increased high-frequency activity and decreased alpha/beta power during negative affect, and reduced alpha power during positive affect, which were elucidated after adjusting for broadband aperiodic spectral shifts. These effects persisted after controlling for audiovisual stimulus or physiological features and were not evident in simultaneously recorded scalp EEG, underscoring their localization to intracranial sites. Together, these results provide the first direct evidence that active induction of emotional states modulates the aperiodic slope of human intracranial field potentials, reflecting valence-dependent shifts in local circuit excitability. The findings highlight the 1/f slope as a sensitive neural marker of affective brain states and for mood dysregulation.
- Claude Fable 5 and Claude Mythos 5
Claude Fable 5 and Claude Mythos 5
- Business Brief (June 9): Moonshot AI Said to Seek Funding at $30 Billion Valuation
Business Brief (June 9): Moonshot AI Said to Seek Funding at $30 Billion Valuation Caixin Global
- Trump Signs AI Security Order Requiring 30-Day Review of Frontier Models
Trump signs order for 30-day review of frontier AI models
- From Data Heterogeneity to Convergence: A Data-Centric Review of Federated Learning
Federated Learning (FL) has emerged as a promising solution for data hunger in centralized learning. This paradigm enables privacy with multiple clients to train a shared-task model collaboratively without exposing their local data. While being a key component in any learning system, data is also a ...
- In Defense of Information Leakage in Concept-based Models
Concept-based models (CMs), deep neural networks that ground their predictions on representations aligned with human-understandable concepts (e.g., "round", "stripes", etc.), have been shown to learn representations that leak concept-irrelevant information. As the traditional narrative goes, this le...
- Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors
In preparation for potential international agreements on artificial intelligence, the development of verification infrastructure for AI data centres is vital. We propose a method for cryptographically committing all information entering and leaving a data centre: Hashes are computed by network taps ...
- Do LLMsMakeNeural Distinguishers Wise?
Neural distinguishers are a cryptanalysis method for symmetric-key cryptography that trains machine learning models on pairs of plaintexts and ciphertexts with specific differences in order to recover a secret key. To the best of our knowledge, no existing work has explored the use of large language...
- MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents
External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent ...
- Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation
Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory, and act on external environments. This transition changes the nature of security risk. In agentic settings, failures are no longer limited to unsafe te...
- Secure Aggregation with Top-K Sparsification in Decentralized Federated Learning
Secure aggregation is a vital component for mitigating gradient leakage in federated learning, but its communication cost conventionally scales with the gradient dimension. This becomes prohibitive for large models and even more pronounced in decentralized federated learning with limited bandwidth a...
- Comparative Analysis of Inference-Time Defense Methods for Multimodal Large Language Models
Multimodal large language models (MLLMs) now appear in safety-critical applications, but the visual channel leaves them open to adversarial attacks that predominantly text-oriented safety alignment addresses only in part. Retraining a model for each new vulnerability class is usually too expensive t...
- OpenPCC: Open and Confidential LLM Serving on Commodity TEEs
Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud...
- Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice
The quality of piano performance depends on nuanced timing, articulation, and dynamic control, but practice feedback is often summary-based and hard to act on. We introduce Profy, a weakly supervised system that learns from take-level labels derived from aggregated listener ratings (expert-labeled v...
- Deploying Speech-Driven 3D Facial Animation in Unreal Engine for Production-Ready Digital Humans
Speech-driven 3D facial animation research has shown promising results, but most methods rely on representations that are not compatible with production pipelines. In this work, we present a deployable system that bridges this gap by enabling speech-driven 3D facial animation directly in Unreal Engi...
- Google's NotebookLM AI Note-Taking Assistant Just Got More Useful
Google's NotebookLM AI Note-Taking Assistant Just Got More Useful PCMag
- WWDC 2026: Apple's 7 Biggest AI Upgrades, Ranked
WWDC 2026: Apple's 7 Biggest AI Upgrades, Ranked PCMag UK
- X-Pilot AI
Turn Documents into Accurate Video Course Series
- Edith
The personal AI that is actually proactive
- Siri AI
Truly helpful AI that's centered around you.
- Better Stack | AI SRE Agent
AI agent with infrastructure knowledge for incidents.
- smallest.ai
Real-time voice AI built to scale
- MP3 to Text.org
Convert MP3 to text online instantly with AI.
- China prepares $295 billion plan to fund nationwide AI buildout, Bloomberg News reports
China prepares $295 billion plan to fund nationwide AI buildout, Bloomberg News reports Reuters
- Apollo, Blackstone lend $35B against AI chips, computing power
Apollo, Blackstone lend $35B against AI chips, computing power PitchBook
- Apple bets on overdue Siri fix to close AI gap
The revamp, unveiled at its annual Worldwide Developers Conference in Cupertino, California, introduces "Siri AI," a more conversational assistant with a standalone app and the ability to analyze what is on a user's screen and pull in information from the web. The update comes two years after Apple first promised major upgrades that were repeatedly delayed.