AI News Archive: June 8, 2026 — Part 15
Sourced from 500+ daily AI sources, scored by relevance.
- Your U-Net Dereverberation Model is Secretly an RIR Encoder
In this work, we analyze the ability of NCSN++ U-Net based audio dereverberation models to capture global room characteristics in their intermediate representations. Through an empirical study of both a state-of-the-art diffusion-based model and a discriminative counterpart, we show that deeper laye...
- Factors affecting ASR performance: A study using state of the art ASR models in Indic Languages
ASR performance varies across languages, speakers, and recording conditions, yet systematic analysis for Indic languages remain limited. We present a large-scale study of decoded outputs from multiple open-source ASR models evaluated on diverse Indian speech datasets in zero-shot settings. We analyz...
- A Comparative Study of Pre-trained Speech Encoders and Training Objectives for Large-Scale Indic Spoken Language Identification
Spoken language identification (LID) for Indian languages is a challenging problem due to the large number of languages, significant phonetic overlap among related varieties, and the scarcity of labeled data for many low-resource languages. In this work, we present a systematic comparative study of ...
- HoliDubber: Holistic Video Dubbing for Complex Acoustic Scenes via Text-Guided Audio Synthesis
Video dubbing is a cornerstone of multimedia content creation, aiming to synthesize synchronized acoustic sequences for visual streams. While Text-to-Speech (TTS) and Text-to-Audio (TTA) generation have each achieved remarkable progress, existing dubbing systems remain confined to isolated speech sy...
- MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion
Streaming zero-shot voice conversion (VC) has become increasingly popular due to its potential for real-time applications. The recently proposed MeanVC achieves lightweight streaming zero-shot VC, but it has several limitations: its chunk-wise autoregressive denoising doubles the effective training ...
- BareWave: Waveform-Native Flow-Matching Text-to-Speech
Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we...
- Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training
In the task of few-shot class-incremental audio classification, the number of classes is assumed to always increase without considering the possibility of decrease. However, the number of classes generally increases or decreases in practice. In this paper, we investigate a problem of Few-shot Class-...
- Popcorn: A Configurable Benchmark for Visual Evidence in Multimodal Movie Recommendation
Movies are long-form audiovisual works, yet recommender benchmarks often rely on trailers, thumbnails, or metadata. These sources differ in semantics and scalability: full movies preserve consumption-level evidence, trailers concentrate promotional highlights, and thumbnails provide sparse but catal...
- Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization
Multimodal generative retrieval formulates multimodal retrieval as discrete identifier generation, eliminating the need for explicit similarity search over external embeddings. Existing approaches construct identifiers via residual quantization and decode them with trie-constrained beam search. This...
- Driving Video Retrieval for Complex Queries with Structured Grounding
Video retrieval at scale is central to data curation and safety validation in autonomous driving, where users want to find not only scenes but also dynamic events such as cut-ins and hard braking. Existing vision-language and keyword-based retrieval methods often miss these events because the releva...
- Teach Multimodal Recommendation Model to See via Personalized Visual Extraction and Adaptive Learning
Multimodal sequential recommendation (MSR) incorporates textual and visual information to improve recommendation quality. However, recent studies and our empirical analysis show that visual features are often underutilized, thereby contributing far less than textual signals. We attribute this issue ...
- EviProp: Seeded Relevance Diffusion on Chunk-Page Graphs for Long Multimodal Document Retrieval
Retrieving evidence pages from visually rich long documents is a key challenge in document question answering. Existing page-level visual retrievers operate under an independent matching paradigm: each page is scored in isolation based on query-page similarity. This paradigm can under-rank evidence ...
- Decoy-Calibrated Failure Audits for Language Models
Useful audits reveal not only how often a model fails, but also where its failures concentrate. An auditor may test many candidate explanations: long inputs, indirect questions, distracting evidence, or combinations of these factors. The risk is selection. The largest observed effect may reflect a r...
- White Matter Hyperintensity Burden Modifies the Association Between Atrial Fibrillation and Cerebral Microbleeds
Background: In atrial fibrillation (AF), cerebral microbleed (CMB) burden guides anticoagulation decisions, yet AF is itself inconsistently associated with CMBs, a paradox unexplained by frameworks that treat CMBs as a unitary marker of small vessel disease. We hypothesized that the white matter hyperintensity (WMH) context in which CMBs arise modifies their vascular meaning, and that this context-dependence underlies the inconsistent AF-CMB association. Methods: From a multicenter Korean stroke registry, we analyzed 5,735 first-ever ischemic stroke patients imaged at nine centers using susceptibility-weighted MRI. WMH volume and CMB count were extracted by validated deep learning pipelines. Patients were cross-classified by age-adjusted WMH residual (median split) and CMB count (2) into four groups. The AF-CMB association was estimated by multivariable logistic regression within each WMH stratum with formal interaction testing. Spatial CMB distribution was analyzed against the Automated Anatomical Labeling atlas. Results: In the full cohort (mean age 69.5 years; 57.7% male), AF was not associated with CMBs (OR 1.04; 95% CI 0.87-1.25). Stratification yielded divergent estimates: the adjusted AF OR was 1.46 (1.11-1.93; P = 0.007) in the WMH-low stratum and 0.95 (0.73-1.22; P = 0.665) in the WMH-high stratum, with significant interaction (OR 0.56; P < 0.001). The discordant phenotype (low WMH, high CMB; 8.9%) was enriched for AF (28.0%) and showed fronto-temporal cortical predominance with deep structure sparing. AF independently reduced the proportion of deep CMBs (IRR 0.80; P = 0.040). The interaction was preserved across prespecified sensitivity analyses. Conclusions: The AF-CMB association is confined to patients with low WMH burden relative to age and is accompanied by a topographically distinct CMB distribution. Clinical assessment of small vessel disease based on WMH alone may overlook a CMB phenotype linked to AF.
- Does ECG-Based AI Detect Aortic Stenosis Beyond Conventional LVH Criteria? An Analysis of the CLIDAS Database
Background: Aortic stenosis (AS) is a progressive valvular disease associated with poor prognosis once symptoms develop, yet routine echocardiographic screening is impractical. While artificial intelligence (AI)-based electrocardiogram (ECG) models have shown promise for AS detection, it remains unclear whether they primarily reflect conventional left ventricular hypertrophy (LVH) voltage criteria or capture additional ECG features. Methods and Results: We developed a deep learning model using 244,816 ECGs from 51,713 patients across six academic institutions in Japan (CLIDAS database). AS labels were derived from inpatient Diagnosis Procedure Combination (DPC) codes. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.849 (95% confidence interval 0.832-0.865) in the independent test cohort, with consistent performance across institutions, sex, and age. At a threshold of 0.1, sensitivity was 79.1%, specificity was 73.9%, and negative predictive value (NPV) was 98.0%. Conventional LVH voltage criteria (Sokolow-Lyon AUC 0.706; Cornell AUC 0.692) showed lower performance, and adding them to the AI model conferred no incremental benefit (AUC 0.849 vs. 0.847). Gradient-weighted class activation mapping (Grad-CAM) revealed predominant attention around QRS complexes in limb leads, beyond regions typically assessed in LVH evaluation. Conclusions: This multicenter AI-ECG model demonstrated strong discrimination for AS and captured ECG features beyond conventional LVH voltage criteria. The high NPV supports its use as a rule-out pre-screening tool.
- Next-Generation Skin Cancer Detection Using Efficient Fuzzy Fusion of Genomic and Imaging Data
Skin cancer requires early detection for improved survival rates. Most existing methods rely on deep learning based image classification, which is affected by visual similarity among lesions. Fewer studies use Gene Expression (GE) analysis, which captures molecular characteristics but lacks structural and visual details. To overcome limitations of individual modalities, this paper proposes a multimodal framework integrating dermoscopic images and GE profiles for skin cancer classification. EfficientNet and logistic regression are used for image based analysis and genomic skin lesion profiling, respectively, followed by fuzzy rule based decision systems to reduce uncertainty within individual modalities. Finally, fuzzy fusion combines predictions from both modalities using uncertainty based weighting of classifier outputs. The experimental findings show that both the image based and GE based classification models individually achieved accuracies of nearly 92%. However, the integration of prediction results through the proposed fuzzy fusion strategy further enhanced the classification performance, achieving an overall accuracy of 94.25%. The results obtained outperform contemporary methods, highlighting the effectiveness of combining complementary multimodal information compared with single modality approaches.
- A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis
Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [≥] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.
- Multiplexed temporal SWCNT biosensor combined with convolutional autoencoding identifies ALS-specific serum protein corona signatures
Amyotrophic lateral sclerosis (ALS) lacks a validated blood-based diagnostic, and the field is increasingly moving from single-molecule markers toward integrative, multi-component signatures. Here we present a liquid-biopsy strategy that transduces disease dependent serum-nanoparticle interactions into a learnable near-infrared spectral phenotype. A sensor array of twelve DNA-functionalized single-walled carbon nanotube (SWCNT) chiralities, functionalized with (GT)6 ssDNA coupled with a deep learning model was tested on serum from 20 ALS patients and 19 age- and sex-matched controls (n = 39, TargetALS). Our multiplexed sensor design (12 SWCNT chiralities) and data acquisition strategy based on excitation-emission matrices acquired at three timepoints (0, 6, 24 h) was conceived to maximize sensor carried information. Indeed, we show that the array generates partially independent temporal dynamics across chiralities governed primarily by tube diameter. To decode this multiplexed, time-resolved signal, we trained a dual-objective convolutional autoencoder that jointly optimizes reconstruction and classification, achieving 84.6% cross-validated accuracy (AUC = 0.87). Selected latent features were reproducible across an independent same-subject experimental batch and correlated with serum neurofilament light chain, linking the spectral phenotype to a clinically relevant neurodegeneration marker. Mass spectrometry supported a molecular basis for discrimination, revealing an ALS-biased protein corona enriched in adaptive-immune and inflammatory proteins. Together, these results establish proof of principle that time-resolved, multi-chirality SWCNT spectral sensing can compress complex serum composition into a reproducible near-infrared biomarker signature for ALS.
- Neonatal Brain Network Integration Trajectories Predict Neurodevelopment in Congenital Heart
Background: Infants with critical congenital heart disease (CHD) are at high risk for abnormal brain development and later neurodevelopmental impairment. We hypothesized that the trajectory of perioperative whole-brain network development would predict neurodevelopmental outcomes in early childhood. Methods: This prospective longitudinal cohort of neonates with critical CHD (n = 97) underwent preoperative and/or postoperative brain MRI with diffusion imaging. Whole-brain network measures were derived from structural connectomes. Neurodevelopment was assessed between 1 and 4 years using the Bayley Scales of Infant and Toddler Development. Results: White matter injury was associated with slower perioperative growth in global efficiency (p = 0.013), a measure of network integration, whereas cardiac physiology was not associated with network development. Infants with greater perioperative increases in global efficiency had higher cognitive (p = 0.001), language (p < 0.001), and motor (p = 0.008) scores. For each 1-standard deviation increase in the trajectory of global efficiency, cognitive scores increased by 8.2 points (95% CI, 3.64-12.78), independent of brain injury and socioeconomic factors. Conclusion: In infants with critical CHD, longitudinal whole-brain network development was associated with neurodevelopment across multiple domains. Early network development may represent a candidate biomarker of neurodevelopmental risk and resilience in this population.
- TACR3 variant confers resilience to aging and Alzheimer's disease
Background: While genetic factors strongly influence brain aging trajectories, variants conferring cognitive resilience remain poorly characterized. The neurokinin-3 receptor (NK3-R), encoded by Tachykinin Receptor 3 (TACR3), modulates cholinergic signaling in memory circuits vulnerable to aging. Previous studies linked the non-WT expression of the TACR3 variant rs2765 with cognitive decline and reduced volume of the hippocampus and basal forebrain, but systematic replication and mechanistic validation were lacking. Methods: We investigated rs2765 in the preregistered AgeGain cohort of cognitively healthy older adults (n=188) with independent validation in the ADNI cohort (n=809) which includes persons with and without Alzheimers Disease (AD) that show healthy cognition, mild cognitive impairment or dementia. Analyses integrated structural neuroimaging, longitudinal cognitive assessments, epigenetic aging (PhenoAge), genome-wide methylation profiling, and mechanistic validation through luciferase assays and cross-species protein expression studies. Results: The infrequent protective rs2765 WT variant, found in 12.8% of Europeans, conferred 49% slower cognitive decline (p = 0.002) for amyloid-positive individuals of the ADNI cohort and 3.7 years younger epigenetic age (p = 0.013, 95% CI: 0.79-6.67 years) in the cognitively healthy AgeGain cohort. WT carriers showed larger hippocampal and basal forebrain volumes across cohorts, with Allen Brain Atlas integration revealing these outcomes to occur exclusively in regions where TACR3 expression positively correlated with gray matter volume. Mechanistically, the non-WT variant ameliorated RBMX-mediated post-transcriptional regulation, reducing NK3-R protein expression by 25-40% in vitro and ex vivo murine brain slice models. Senescence-accelerated mice exhibited reduced endogenous NK3-R expression, phenocopying the predicted functional consequences of the variant. In AgeGain participants, genome-wide methylation profiling identified 2,313 differentially methylated CpGs affecting 228 pathways spanning glutamatergic signaling, acetylcholine receptor pathways, chromatin remodeling, and angiogenesis, suggesting coordinated molecular reprogramming from synaptic function to systemic aging. Conclusions: rs2765 WT confers resilience to age- and AD-related cognitive decline through RBMX-dependent regulation of NK3-R expression, with effects of remarkable size cascading from memory to systemic aging. rs2765 genotyping could stratify individuals for NK3-R modulator therapy (e.g., fezolinetant or senktides) and identify those maintaining function despite pathological burden, complementing APOE-based risk assessment in precision geromedicine.
- Convert Claude Code Project To Codex
Convert project from Claude Code to Codex with Claude Skill
- ProcoHQ
AI health intelligence — live longer, live better
- H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions
Large language model agents are increasingly deployed in human-human interaction settings, such as meeting assistants and clinical documentation systems, where they must observe conversations and retain information for downstream queries. Unlike traditional human-assistant settings, these environmen...
- AbstRAG: Learning to Abstract for Retrieval Problems
Retrieval-augmented generation often fails when the query, the document evidence, and the user's intent are expressed at different levels of abstraction. A query may ask about a class, a relation, or an event, while the document only states specific instances, indirect framings, or scoped formulatio...
- As OpenAI files for IPO, Sam Altman’s eye-scanning company is doing layoffs, report says
Tools for Humanity, Sam Altman's identity verification company, is reportedly struggling to generate revenue and will downsize its staff.
- OpenAI files confidentially for IPO, following Anthropic
The filing comes a little more than a week after its main rival, Anthropic, also filed to go public, ramping up the race between the two AI firms.
- "Chat is dead": OpenAI preps overhaul of ChatGPT
OpenAI to recast hit chatbot as a route to higher-margin products before a potential IPO.
- OpenAI Confidentially Files for IPO on the Heels of SpaceX and Anthropic
The ChatGPT-maker announced it has filed paperwork to go public, just a week after rival Anthropic took the same step.
- OpenAI Files Confidentially for IPO
OpenAI, the maker of ChatGPT, filed confidentially for an IPO, joining artificial intelligence rivals in tapping public markets to fund ambitious growth plans. Bloomberg's Michael Hytha has more. (Source: Bloomberg)
- OpenAI Files Confidentially for IPO as Rivals Race to Market
OpenAI, the maker of ChatGPT, filed confidentially for an IPO, joining artificial intelligence rivals in tapping public markets to fund ambitious growth plans.
- OpenAI files for US IPO after Anthropic as AI giants head to public markets
OpenAI files for US IPO after Anthropic as AI giants head to public markets Reuters
- OpenAI confidentially files for IPO, prepping Wall Street for mega AI debut
OpenAI's confidential filing lands days before SpaceX is set to go public and a week after Anthropic announced its confidential disclosure with the SEC.
- OpenAI Files to Go Public in Test of Investor Appetite for Top AI Startups
The ChatGPT maker confidentially filed for an offering, but said ‘it may be a while’ before it goes public
- OpenAI Files for IPO
Plus, the Trump administration’s $100,000 H-1B visa fee is declared unlawful, and private racetracks zoom into view.
- OpenAI Says It Filed Paperwork To Go Public
OpenAI said it has not decided exactly when the IPO would take place.
- AI Companies Are Rapidly Expanding Into Each Other's Markets
AI Companies Are Rapidly Expanding Into Each Other's Markets Business Insider
- OpenAI confidentially files for IPO, but says it 'may be a while' before it goes public
OpenAI confidentially files for IPO, but says it 'may be a while' before it goes public Business Insider
- Sam Altman's eye-scanning startup is laying off employees
Sam Altman's eye-scanning startup is laying off employees Business Insider
- OpenAI Confidentially Files IPO Paperwork, Plans Separate Employee Share Sale
OpenAI Confidentially Files IPO Paperwork, Plans Separate Employee Share Sale The Information
- OpenAI files IPO paperwork
OpenAI said Monday it has confidentially filed draft paperwork for an IPO, giving itself the option to tap public markets — even as the company says its focus remains on building new AI products and infrastructure rather than preparing for a listing. Why it matters: The race is on between Anthropic and OpenAI to go public and tap investors for tens of billions of dollars. This is breaking news.
- OpenAI files to go public in major test of AI investment boom
The ChatGPT-maker filed confidentially for an IPO one week after its rival Anthropic.
- OpenAI files for IPO, joining Anthropic in a $2 trillion AI listing race
The ChatGPT maker, valued at $852 billion, says it may be a while before it goes public but wants the option to move sooner
- OpenAI confidentially files for initial public offering on US stock market
ChatGPT maker expected to be valued at more than $850bn, one of most highly valued listings in market history OpenAI has filed confidentially to go public on the US stock market, according to a company blogpost published on Monday. The artificial intelligence giant’s debut on Wall Street is expected to be one of the most highly valued listings in market history with a valuation at more than $850bn. “We recently submitted a confidential S-1. We expect it to leak so we’re just announcing it,” the company’s post reads. “We have not decided on timing yet; it may be a while because there are things we want to do that are likely easier as a private company. But it’s a complicated set of tradeoffs and this gives us the option to go public sooner if that ends up being best.” Continue reading...
- OpenAI files to go public in blockbuster Wall Street listing
ChatGPT creator has submitted paperwork for IPO expected to value company at more than $1tn
- OpenAI plans to go public, intensifying investment race with Anthropic
The company behind ChatGPT filed its plans one week after Anthropic did the same.
- OpenAI files for public float on Wall Street
OpenAI files for public float on Wall Street The Telegraph
- Public ownership of AI? US officials eye stake in tech revolution
From Washington to Brussels, the fight over who controls artificial intelligence is reaching a turning point, and history suggests the public has always had to wrestle essential technology from completely private control.
- OpenAI files IPO paperwork, eyes stock market debut
The AI company behind ChatGPT said it had no date yet for a listing, but it is joining rival Anthropic in the lineup for the stock market.
- Betting early on geospatial tech; OpenAI plans ChatGPT super app
Betting early on geospatial tech; OpenAI plans ChatGPT super app YourStory.com
- Washington wants a piece of OpenAI
PLUS: Find five prospects a day with this agentic framework