AI News Archive: June 5, 2026 — Part 15
Sourced from 500+ daily AI sources, scored by relevance.
- GitHub
AI agent with channels, SillyTavern cards, and sandboxing
- GitHub Scout
AI-powered codebase search & explorer for GitHub Repos
- lingua.direct
Real-time AI voice translator for live conversations
- AiToolsay
Free AI Tools (ChatGPT, Gemini, Grok, Claude & More)
- LumixAI
Free AI images and videos generator
- Prompt Optimizer
Turn basic ideas into elite Master Prompts instantly.
- VidPrompt
Where visual inspiration meets instant generative execution.
- AI Code Exposure Monitor
Live % of your code that AI assistants have seen
- FastVids
Free AI Video Generator – No Signup, Unlimited Use
- Free Gemini Omni
Create AI videos from text, images with Gemini Omni.
- 100% FREE AI News for you [AI Today]
Get AI news the way you want, when you want all for FREE
- SmarToken
One API for Chinese frontier AI models
- PhantomOps AI
AI Receptionist & WhatsApp Automation for Clinics
- Solar AI Quotation Maker
An AI tool that helps Pakistan buyers estimate solar system
- ProducerBuddy
From script to screen — AI film production in one tool
- AISkillfy
AI assistants can directly call skills.
- Pilotry
Run AI agents like a real workspace
- comP
Local-first code indexing for AI agents with MCP support
- Huntlo
AI-native Hiring OS for modern recruiting teams
- TotalMedia | Video Enhancer
AI-powered cloud video enhancement for creators.
- LLMBuddy - GEO Agency for B2B SaaS
Get your B2B SaaS brand cited by ChatGPT and Perplexity
- GenXEmpire YouTube Analyzer Pro
Easy with genxempire YouTube automation
- SatsetUI
Build Websites, Resumes & Presentations with AI
- Youtube Transcription Tool
get transcripts for whatever youtube video
- AudioWave
Free online audio converter, editor & AI vocal extractor
- Edumo
Generate, distribute and track language learning materials
- GetFairview
The Operating Intelligence Platform for Modern Operators
- VibeHacking
Secure workflows for AI-powered vibe coding
- PromptKing
Copy, paste, and create: The ultimate prompt library.
- Sound Blog
This product converts voice notes into structured blog posts
- Summi.pro
AI website audits for online stores
- WhisperSub
Transcribe & Translate Video and Audio to Text — All Offline
- Capsule Wardrobe AI
AI try-on for outfits you can actually wear
- PallasAI
Answer Engine Optimization Platform
- MapRanker.ai
Track Google Maps rankings & audit your GBP with AI
- ChapterWave
100% Offline AI Audiobook Converter for Authors
- AIChangeHair
Change your hair color with AI
- GroundPound AI
A coordinated team of agents that runs your business.
- VideoUpscale.net
Upscale video to 4K in minutes.
- NextDoor AI
Pick a template. Make a picture.
- AI Natural Write
Transform AI text into undetectable human writing.
- Reverse Prompt | Video to Prompt
Turn images and videos into AI prompts.
- Robust Multi-Mutant Protein Stability Prediction from a Fine-Tuned Evolutionary Scale Model
Recently, high-throughput experimental techniques have propelled improvements in deep learning-based prediction of mutation effects on protein stability. However, leading stability predictors still struggle to predict the combined effect of multiple mutations and prefer mutations that negatively impact other properties, including expressibility. To mitigate these limitations, we apply Low-Rank Adaptation (LoRA) to specialize ESM3 for stability prediction by fine-tuning on the Megascale protease susceptibility dataset, developing a novel dual-perspective inference mechanism to provide explicit mutant context information. ESM-Mutant Stability Ranker (ESM-MSR) significantly exceeds all contemporary methods tested on the prioritization of stabilizing mutations ({Delta}NDCG@96 >= +0.12), double mutant ranking ({Delta}{rho}avg >= +0.068) and direct epistasis ranking ({Delta}{rho}avg >= +0.164) within the Megascale test set. Further, it generalizes effectively to heterogeneous thermostability benchmarks, consistently matching or exceeding current approaches across our comprehensive suite. Finally, a single parameter {sigma} enables tunable control of the model's compromise between stability and more general sequence fitness, leading to state-of-the-art performance in the Human Domainome 1 benchmark ({Delta}{rho}avg = 0.573) at {sigma} = 0.5, demonstrating the broad applicability of ESM-MSR as a protein engineering tool.
- CarotidMamba: Foundation Model-Enabled CTA Phenotyping of Symptomatic Carotid Plaques in a Multi-Center Retrospective Study
Background: Treatment decisions for carotid atherosclerotic disease rely primarily on luminal stenosis, although plaque vulnerability and symptomatic status better reflect short-term cerebrovascular risk. A scalable CTA tool for automated phenotyping of symptomatic carotid disease is lacking. Materials & Methods: In this multi-institutional retrospective study, 689 patients (mean age, 67.9 {+/-} 7.7 years; 366 men) from four hospitals were analyzed after screening 705 CTA examinations. 423 patients from one center were used for five-fold development and internal validation, and 266 patients from three centers for independent external validation. CarotidMamba, a deep learning framework combining dual foundation-model encoders with Mamba-based sequence modeling, was developed and benchmarked against clinical, radiomics, clinic-radiomics, CNN, and transformer comparators. Results: In the development cohort, CarotidMamba achieved an AUC of 0.839 (95% CI, 0.799-0.879) and accuracy of 0.825 (95% CI, 0.793-0.857), outperforming the strongest comparator by 0.066 and 0.050, respectively. External validation yielded AUCs of 0.897 (95% CI, 0.835-0.959) in YCH, 0.809 (95% CI, 0.720-0.898) in DCH, and 0.762 (95% CI, 0.649-0.875) in GH-NTC. CarotidMamba showed the lowest Brier score and expected calibration error across cohorts, with calibration slopes near 1.0. Conclusion: CarotidMamba provides an interpretable, clinically oriented, and externally validated CTA framework for phenotyping symptomatic carotid plaques, supporting vulnerability-aware imaging assessment beyond stenosis alone.
- EXHEART: A Fairness-Aware Explainable Stacked Ensemble for Cardiovascular Disease Classification with Cross-Instrument Disparity Attribution
Background: Machine learning models trained on population health surveys offer scalable tools for cardiovascular screening, but recurring methodological weaknesses undermine their credibility and equity: data leakage from synthetic oversampling, qualitative rather than quantitative explainability evaluation, and the absence of demographic fairness auditing at the clinical operating threshold. Methods: We present EXHEART, a leakage-free stacked ensemble pipeline trained on BRFSS 2015 (n = 253,680) and validated on BRFSS 2020 (n = 319,795; temporal transport and retrain) and a clinical cardiovascular examination dataset (n = 68,730). The pipeline combines XGBoost, LightGBM, Random Forest, and a multi-layer perceptron as base learners with 5-fold out-of-fold logistic regression stacking and Platt scaling calibration. A quantitative SHAP-LIME consistency framework, based on Kendall-tau rank correlation and Jaccard overlap, accompanies a decision-curve analysis, a subgroup-stratified SHAP interaction analysis, and an intersectional fairness audit (Sex x Age x Income) with threshold-shifting mitigation and a frontier of the fairness-utility trade-off. The framework also adds cross-instrument fairness-disparity attribution, an empirical diagnostic that provides evidence on whether an observed subgroup disparity is more consistent with a measurement-induced or a substantive explanation by re-validating it on a dataset that measures the same clinical construct objectively. On heart disease, this diagnostic associates 89% of the sex TPR gap (95% CI [0.65, 0.99]) with the self-reported survey outcome rather than with a substantive risk difference. Results: On BRFSS 2015, EXHEART achieves AUC-ROC = 0.850, AUPRC = 0.371, Brier score = 0.071, and reduces ECE by 96% (0.256 to 0.011) via Platt scaling. Global SHAP-LIME rank agreement is moderate-to-strong (Kendall-tau = 0.580, Spearman-rho = 0.818) with a substantial top-3 divergence (Jaccard@3 = 0.200), where Stroke flips from SHAP rank 8 to LIME rank 1. The Sex TPR gap is 0.124 at the screening threshold; intersectional Sex x Age disparities reach 0.649 among adequately-powered cells, 5.2x the single-attribute gap. Temporal transport to BRFSS 2020 collapses sensitivity from 0.776 to 0.267, while retraining restores AUC = 0.840 and ECE = 0.012. On clinical examination data, the Sex TPR gap collapses to 0.014; the attribution test indicates this gap is instrument-dependent, consistent with a measurement or outcome-definition explanation rather than a substantive risk difference. Cross-domain SHAP analysis identifies four instrument-independent CVD risk factors and two major portability failures. Conclusions: EXHEART combines three practices that population-scale cardiovascular classifiers usually apply in isolation: leakage-free training with calibrated probabilities, a test of whether the model's explanations are stable, and a fairness audit that examines intersecting subgroups rather than single attributes. Bringing them together proved worthwhile. The intersectional audit revealed disparities that single-attribute auditing missed, and the cross-instrument comparison indicated that much of the sex gap reflects how the outcome is measured in survey data rather than a substantive difference in risk. The temporal transport findings indicate that deployed BRFSS models warrant periodic monitoring and retraining to maintain clinical utility. EXHEART is a retrospective methodological evaluation on public de-identified data; it is not validated for direct clinical decision-making, diagnosis, or treatment recommendation without prospective clinical validation.
- The Multimodal Anonymizer: a fully local multi-agent AI system for medical data deidentification
Background: Safe reuse of multimodal hospital data for AI development is limited by the absence of reliable, context-aware deidentification across multimodal data and longitudinal patient data. Existing approaches are largely modality-specific and can indiscriminately remove clinically important information. Methods: We developed the Multimodal Anonymizer, a modular, locally deployable multi-agent framework integrating multimodal large language models, task-specific neural networks and rule-based transformations. We evaluated 16 orchestrator model configurations on a benchmark built from publicly available data and hospital data from our institution. The benchmark dataset included data from different origins: 250 MIMIC-IV patients with synthetically injected personally identifiable information (PII) supplemented with head CT, face images, handwriting, audio, German clinical-text datasets and local data. Primary outcomes were deidentification sensitivity and preservation of clinically important content; secondary analyses examined model characteristics, reproducibility, and performance against leading market and open-source solutions. Results: The best local configuration (the orchestrator being Qwen3-VL-235B-A22B-Thinking) achieved near-complete deidentification across all datasets, with per-patient sensitivity of 98.80% (95%-CI 97.20; 100), and per-PII sensitivity of 99.82% (95%-CI 99.76; 99.88). Critical clinical preservation was 99.60% (95%-CI 98.80; 100) per-patient, and clinical preservation was 99.61% (95%-CI 99.51; 99.71) per-file. All modalities achieved at least 98.30% sensitivity (lower bound 95%-CI). On our local data, the system achieved a deidentification sensitivity of 100% per-patient and per-PII; and a critical clinical preservation of 100% per-patient as well as a clinical preservation of 99.97% (95%-CI 99.91; 100) per-file. When comparing orchestrators, the leading local models were similar to proprietary models (GPT-5.2) in deidentification sensitivity while showing higher deidentification specificity. The Multimodal Anonymizer outperformed previous tools on most modalities. Conclusion: Near-complete, utility-preserving deidentification of multimodal clinical data is achievable with a unified, locally deployable multi-agent system, enabling safer large-scale reuse of hospital data for research and AI development.
- Context-Dependent Age-Group performance hierarchies limit fairness interventions in PPG-based heart rate prediction
Background. Fairness-aware machine learning increasingly targets demographic performance disparities in clinical prediction, yet whether standard bias mitigation strategies genuinely improve equity in physiological signal analysis remains unclear. Age-based disparities in photoplethysmography (PPG)-based heart rate prediction present a particular challenge, as age-related performance differences may reflect context-dependent physiological structure rather than correctable artifacts. Methods. We evaluated three fairness interventions, inverse-frequency weighting (IF), Group Distributionally Robust Optimization (GroupDRO), and adversarial debiasing (ADV), applied via fine-tuning of a PPG foundation model across three clinical datasets spanning intensive care unit, laboratory, and consumer wearable contexts. Outcomes were assessed using a 2x2 framework classifying each intervention-dataset combination by the joint direction of change in mean absolute error (MAE) and fairness gap (FG) across age groups, yielding four outcome types: genuine improvement (G), leveling down (L), selective benefit (S), and both worse (W). Results. Across nine intra-domain conditions, no intervention simultaneously improved both MAE and FG (0/9 genuine improvement). The dominant pattern was leveling down (5/9): FG decreased but was accompanied by MAE degradation, indicating that apparent fairness gains were achieved at the cost of overall predictive performance. Age-group difficulty ordering varied across clinical contexts at baseline and was not preserved under intervention. In 18 cross-domain transfer conditions, genuine improvement was rare (4/18) and observed exclusively in non-MIMIC source configurations; models fine-tuned on MIMIC-sourced data yielded no genuine improvements (0/6). Embedding-level representation changes following fine-tuning did not reliably predict fairness outcomes. Conclusions. Age-based fairness interventions in PPG heart rate prediction indicate a leveling-down pattern rather than genuine equity improvement, suggesting that age-related performance gaps reflect context-dependent physiological structure not fully addressable through standard bias mitigation. Cross-domain transfer further amplifies this instability. These findings suggest that fairness evaluation frameworks for age-stratified physiological prediction should account for context-dependent performance structure rather than treating observed gaps as correctable bias.
- Multimodal sleep stage classification and label-free abnormality scoring in mid-to-older adults
Background: Sleep fragmentation and reduced sleep efficiency are markers of disrupted sleep architecture linked to cognitive and age-related decline. Current assessments rely on subjective reports prone to recall bias, limiting their effectiveness for longitudinal monitoring. Data-driven analysis of sleep using physiological signals such as EEG and EMG remains underutilised, particularly in mid-to-older adults. Objective: We present a deep learning pipeline for automated sleep staging and label-free abnormality scoring, with the primary objective of quantifying deviations in sleep architecture to capture progressive sleep disruption and longitudinal change. Methods: Temporal and attention-based models were benchmarked using datasets from the National Sleep Research Resource and PhysioBank. To improve class-specific performance, we introduce a stacking-based ensemble of sleep stage classifiers, each trained to specialise in a different stage. For longitudinal scoring, we develop a reconstruction loss-based abnormality metric using a temporal convolutional autoencoder trained on hypnograms generated by the sleep staging models. Results: Attention-based models, particularly AttnSleep, achieved the highest performance in both multimodal and single-channel settings (accuracy: 0.85 and 0.83; F1: 0.79 and 0.74, respectively). The encoder-decoder ensemble model improved overall classification accuracy by 3% compared to the best-performing biased base classifier, with a modest gain in N1-stage F1 score (0.444). The proposed abnormality score correlated with Pittsburgh Sleep Quality Index components and showed sensitivity to synthetic hypnogram degradation, highlighting its potential as a label-free indicator of sleep disruption. Conclusion: Automated classification and annotation-free scoring enable an end-to-end multimodal pipeline that supports scalable, objective sleep health monitoring, with relevance for future clinical deployment.
- Quantifying Cancer Clinical Trial Eligibility Using Artificial Intelligence-Based Matching
PURPOSE: To develop and validate an artificial intelligence-enabled platform that converts unstructured cancer trial eligibility criteria into structured queries and quantifies trial eligibility across advanced/metastatic cancer trials. METHODS: We downloaded actively recruiting US interventional treatment trials for advanced/metastatic breast cancer, colon cancer, and non-small cell lung cancer from ClinicalTrials.gov. Medical oncologists created 24 synthetic patient vignettes. A large language model converted trial eligibility criteria into Structured Query Language (SQL) code and patient information into structured records, enabling automated matching. Cancer details and treatment history were considered, but not laboratory results or comorbidities. Validation included physician editing of generated eligibility code for 30 trials, and blinded physician eligibility assessment for five trials. We then evaluated how age, ECOG performance status, sex, and ZIP code affected the number of eligible trials. RESULTS: Of 833 candidate trials, 746 met inclusion criteria. In physician review of 30 trials, edits to generated SQL did not change any of 720 trial-patient eligibility determinations for 24 synthetic patients. In blinded validation across 120 trial-patient pairs, automated matching achieved 97% accuracy. Across synthetic patients, eligible trials ranged from 31 to 258 when there were no geographic restrictions. Eligibility decreased markedly with worse performance status and with geographic restriction (both p<0.001). Later-phase, randomized, and molecularly selective trials had fewer eligible patients. CONCLUSION: AI-based structuring of trial eligibility criteria can support accurate, scalable measurement of potential cancer trial eligibility. In this demonstration, performance status, geography, and age were major determinants of eligibility across the active metastatic trial landscape.
- We need to stop AI developing without humans, says Anthropic co-founder
Jack Clark tells BBC's Newsnight AI could get to the point where it develops without human input.