AI News Archive: June 3, 2026 — Part 20
Sourced from 500+ daily AI sources, scored by relevance.
- Pilea AI
Agents turn customer signals into prioritised tasks.
- WriteABookAI
The AI-Native Book Writing Platform
- Easy MCP AI - Claude ChatGPT Connector for Wordpress
The Most Complete End-to-End MCP for WordPress
- leania.ai
MRI scan for Businesses: Find What to Cut, Replace, Automate with AI
- Daivenci AI
Predict future babies and generate professional AI art.
- Leania.ai for AI consultants
Stand out and win more clients will personalised AI recommendations
- Integrating Histology with Spatial Molecular Programs Using a Multimodal Foundation Model
Histopathological assessment remains central to cancer diagnosis and stratification, yet its mechanistic interpretation remains limited without molecular context. To address this, we developed SQUALL, a multimodal foundation model integrating histology with spatial molecular programs. For pretraining, we assembled histMol, a large-scale corpus of 1.76 billion paired histology-spatial transcriptomics spots/bins across 33 tissues and 12 platforms from 3,446 tissue sections. Following pretraining, SQUALL enables transcriptome-wide virtual biomarker profiling, prognostically relevant spatial niches discovery, and integrative disease progression modeling. Leveraging its multimodal embeddings, SQUALL identifies niches associated with tertiary lymphoid structure (TLS) maturation and ovarian cancer relapse, reconstructs molecular trajectories of breast cancer invasion across 325,112 spots, and uncovers underlying transcriptional programs. Applied to whole-slide images from 898 patients, SQUALL outperforms existing pathology foundation models in outcome prediction while enabling interpretable risk stratification. Together, these results establish spatially aligned multimodal pretraining as a new paradigm for extending molecular insights into pathology images.
- ViTAMIn-O: Democratizing computer vision-based machine learning for stem cell research
Deep Learning (DL) holds exciting potential in automating the prediction of organoid differentiation results. Nevertheless, current models lack adaptability, openness, and robustness in performance. Additionally, broad employments of predictive models in wet-lab settings necessitate machine learning expertise, often not readily available in biologically oriented laboratories. To offer an intuitive solution, we present ColabViTAMIn-O, a code-free platform together with ViTAMIn-O. ViTAMIn-O is a fully open organoid-specific DL model trained and tested on a total of 34 organoid categories, incorporating annotated images across transmitted light microscopy (TLM) modalities at single-organoid resolution. It is adaptable to downstream prediction tasks of varying dataset sizes and outperforms established models even with linear-probing. It performs reliably within a few-shot framework and is even extensible to human embryo TLM imaging data at single specimen level. By releasing our platform, centralized model hub, and datasets, we hope to encourage broader deployments of specialized DL models in stem-cell laboratories.
- Automated assessment of neonatal internal capsule maturation on T2-weighted MRI across 7T and 3T
Motivation: Quantitative assessment of neonatal internal capsule (IC) maturation remains largely reliant on qual- itative visual evaluation, limiting objectivity and scalability. Approach: We developed a fully automated 3D deep learning framework for anatomically detailed segmentation of IC subregions and PLIC myelin-related signal from structural T2-weighted MRI, trained on both high-resolution 7T and conventional 3T neonatal datasets. Volumetric and intensity-based metrics were derived, and developmental trajectories were modelled using postmenstrual age (PMA) and postnatal age (PNA), with normative modelling used to quantify individual deviations. Results: The pipeline achieved high segmentation accuracy across field strengths (Dice > 0.95, relative volume difference < 5%). IC metrics showed robust age-related changes, with volumetric measures increasing and intensity- based measures decreasing with PMA. PNA effects indicated prematurity-related modulation at equivalent maturational age. These patterns generalized to 3T, where normative modelling revealed significant deviations in preterm infants, particularly for myelin-related intensity measures. Conclusion: Structural T2-weighted MRI, combined with anatomically informed segmentation, enables quantitative and biologically meaningful assessment of neonatal IC maturation. This provides a scalable framework for studying early white matter development and supports potential clinical translation.
- Simple cumulative weighting of routine surveillance data identifies epidemic wave origins more accurately than a large language model: evidence from eight COVID-19 waves in Japan
Identifying the origin of an emerging epidemic wave within days of onset could enable targeted response before national spread, yet current methods rely on genomic sequencing that lags clinical detection by 2-4 weeks. We analysed daily COVID-19 cases from Japan's 47 prefectures across eight waves (2020-2023), aggregated into 11 regional blocks. Wave onset was defined by the first difference of the K-value (K'). Six surveillance indicators were evaluated with and without cumulative historical weighting ({lambda} = 0.75) and benchmarked against a large language model (Claude Haiku), scored by F1 against genomically confirmed origins. At 14 days after onset, cumulative weighting of peak and cumulative incidence (B1+prior, B3+prior) reached mean F1 = 0.622, exceeding the model (0.524); the gap was largest in Wave 7 (1.000 vs 0.333). Simple cumulative weighting of routine surveillance data identified wave origins more accurately than a language model, without proprietary tools or sequencing.
- Leveraging Digitization, Archiving and Artificial Intelligence to Re-examine Predictors of Sustained Mental Health Care Engagement in Ugandan First-Episode Psychosis Patients: A Study Protocol
Background: We previously examined the burden and predictors of sustained mental health care engagement in Ugandan first episode psychosis patients by retrospective chart review methods. However, the extensive requirements of chart reviews meant that we could only extract data from a random 10% sample of 1677 newly enrolled Ugandan first-episode psychosis patients at Butabika National Referral Mental Hospital in 2018. The Hekima Platform has been designed to transform handwritten files into datasets for analysis. Objectives: This study aims to: (1) utilize the Hekima Platform to transform paper-based clinical charts of all 1677 Ugandan psychosis patients enrolled at Butabika Hospital for the first time in 2018 into a standardized, anonymized longitudinal database and (2) re-examine predictors of sustained MHC engagement in this cohort. Methods: We will digitize and archive all patient charts. We will then use the Hekima Platform to extract handwritten clinical data into machine-readable text using user-trained machine learning and deep learning models and natural language processing (NLP) techniques to generate a structured, anonymized database. A minimum 10% random sample of extracted data will be manually validated using Cohen's kappa. For the analytical aim descriptive statistics bivariate analysis and multivariable logistic regression will model predictors of sustained engagement. Exploratory machine learning approaches are used as a complementary analytical strategy. Ethical approval has been obtained from the Uganda National Council for Science and Technology and Butabika Hospital's Research Ethics Committee. Expected outcomes: Patient clinical charts are a rich data source but there are extensive requirements to be able to use them for research. This study will generate the first AI-assisted standardized longitudinal database from handwritten psychiatric records in Uganda enabling well-powered analyses of predictors of MHC engagement. Findings will inform targeted interventions to improve retention in care and will offer a scalable model for mental health research in low- and middle-income countries.
- Interpretable machine learning for coeliac disease diagnosis: quantitative morphometry of duodenal biopsies
Background Coeliac disease affects approximately 1% of the global population and remains substantially underdiagnosed. Histopathological assessment of duodenal biopsies is the diagnostic gold standard but is subject to approximately 20% inter-observer disagreement. While machine learning approaches show promise, most prior work relies on black-box models with limited interpretability, restricting clinical adoption. Methods We present an interpretable pipeline that follows established histopathological criteria by extracting clinically meaningful morphological features from H&E-stained whole-slide images. Five sequential stages perform pre-processing, semantic segmentation of villi, crypts, intraepithelial lymphocytes (IELs) and enterocytes, crypt morphometry, villus length estimation via a novel polyline-based keypoint model, and coeliac disease classification using three quantitative features: IEL-to-enterocyte ratio, villus-to-crypt area ratio, and villus-length-to-crypt-depth ratio. Training and validation used data from four institutions; independent testing used 1,357 WSIs from two further institutions including one with a previously unseen scanner manufacturer, spanning five diagnostic categories: coeliac disease, normal mucosa, chronic inflammation, gastric metaplasia, and gastric heterotopia. Results Semantic segmentation achieved villus and crypt precision and recall of 87-90%. Villus length estimation correlated strongly with expert annotations (Pearson's r=0.85, mean relative error 13.5% post-calibration). All three morphological features significantly separated coeliac disease from all non-coeliac diagnostic groups across internal and external datasets (p<0.01 in all comparisons). On the test set the diagnostic classifier achieved accuracy 94.5%, PPV 92.9%, NPV 94.7%, and AUC 0.982. Conclusions This interpretable framework achieves strong multi-centre diagnostic performance while producing quantitative morphological outputs, villus length, crypt depth, and IEL-to-enterocyte ratios, that directly reflect established histopathological criteria, representing a meaningful step towards standardised AI-assisted coeliac disease diagnosis.
- Audited large language model triage for systematic review screening in national clinical guideline production: validation and prospective deployment
Title and abstract screening limit the timeliness of systematic reviews used for clinical guidelines. We evaluated audited large language model (LLM) triage at Sweden's National Board of Health and Welfare. Ten LLMs from five model families were tested on 419 Cochrane reviews comprising 26,892 records, and the selected ensemble was externally validated on 133 reviews including 8,501 records matched to planned guideline topics. The same locked model pair was then used prospectively across 24 systematic reviews in two national guideline programmes. On the 419-review selection benchmark, the selected Gemini-3-flash plus GPT-5.1 ensemble achieved 98.0% (95% CI, 97.3-98.7) mean review-level sensitivity, while topic-matched validation yielded 96.7% sensitivity (95% CI, 93.7-98.9). Prospective deployment screened 74,679 records, placed 63,858 (85.5%) in the AI-excluded pool and reduced estimated first-pass screening effort from 415 to 34 person-days. Across 600 randomly sampled AI-excluded records from the migraine and dementia programmes, none was confirmed as a final false negative after post-unblinding adjudication; across the completed 680-record audit, all 38 final retained records had been AI flagged, whereas locked blinded human consensus missed seven. These findings support locked, audited LLM triage, with human oversight and programme-specific monitoring, for systematic reviews used in national guidelines.
- Comfort with AI for HIV Prevention Among Cisgender Women in New York City
Background: Long-acting pre-exposure prophylaxis (PrEP) expands HIV prevention options for women. However, PrEP impact depends on addressing persistent gaps in awareness, access, and use. Artificial intelligence (AI) tools, including conversational agents, are being explored to advance PrEP uptake, but comfort with AI may influence their impact. Thus, we examined women's comfort with AI and its association with PrEP awareness. Methods: We analyzed self-reported data from women aged [≥]18 years in a cross-sectional survey conducted in New York City from August 2023 to August 2024. We performed descriptive analyses, applied latent class analysis to identify AI knowledge/comfort profiles, and estimated unadjusted and adjusted odds ratios to assess associations between profile membership and PrEP awareness. Results: Among 306 respondents without a diagnosis of HIV who completed AI-related survey items, the median age was 36. Most women identified as Hispanic/Latina (60%) or Non-Hispanic Black (18%), had not completed college (53%), and spoke only English or were bilingual (81%). Latent class analysis identified four AI knowledge/comfort profiles that differed by PrEP awareness, race/ethnicity, borough, prior drug use, and technology utilization. Women with varied AI knowledge, broad AI discomfort, and comfort with clinicians maintaining privacy had lower odds of PrEP awareness (OR: 0.35, 95% CI: 0.16-0.75), but this association did not persist after statistical adjustment. Conclusions: PrEP awareness and AI knowledge were limited, yet many women expressed openness to AI-enabled tools when privacy was assured. AI-enabled HIV prevention tools should prioritize trust, transparency, confidentiality, and the lived contexts of the women they intend to serve.
- To RAG, or Not to RAG? A Comparative Evaluation of Retrieval-Augmented Generation for ICD Coding of German Tumor Diagnoses
Introduction Coding tumor diagnoses from free-text clinical documentation currently requires substantial manual effort. Promising approaches for automating this process include large language mod-els (LLMs), embedding models, and retrieval-augmented generation (RAG). While previous studies often focus on a single method, we directly compare these approaches on a real-world dataset of tumor diagnosis descriptions to assess their strengths and limitations. Methods We evaluated nine different embedding models using similarity search and embedding-based classification, as well as LLM-based coding, with and without RAG, on a real-world dataset of 2,024 unique German tumor diagnosis descriptions labeled with ICD-10 and ICD-O topography codes. The retrieval knowledge base was constructed exclusively from stand-ardized Alpha-ID, ICD-10-GM, and ICD-O-3 classifications. Performance was assessed for exact (full-code) and partial (three-character) code prediction. For RAG, we evaluated base and fine-tuned versions of Llama 3.1 8B and Llama 3.3 70B. Results Qwen3-Embedding-8B, the largest embedding model, yielded the best results. It achieved 47.8% exact-match and 72.1% partial-match accuracy for ICD-10 coding with classification, and 42.7% exact-match and 73.5% partial-match accuracy for ICD-O coding with similarity search. The other embedding models, including medically specialized ones, showed varied but lower performance. RAG improved base LLM perfor-mance and outperformed embedding-based approaches on partial-match accura-cy (80.6% partial-match accuracy for ICD-10 and 75.0% for ICD-O with Llama 3.3 70B), but not on exact-match accuracy. Conclusion A direct comparison with embedding-based approaches is essential to determine whether the additional effort of RAG is justified. The strong variation in performance also highlights the importance of model selection. Further advances in embedding-based methods, potential-ly supported by larger and more diverse training data, may offer a promising direction for future work.
- Agentic Authoring of OMOP Concept Sets from Natural Language
Authoring OMOP concept sets from free-text descriptions remains a major bottleneck in scalable computable phenotyping for observational research. Existing tools support parts of this workflow but are designed primarily for interactive expert use rather than autonomous large language model (LLM) agents. We present an agentic framework that automatically generates OMOP concept sets by combining vocabulary tools, ontology extensions (RxClass, LOINC, and Disease Ontology), and procedural guidance. In ablation studies, the best configuration achieved Recall@100 of 0.965 and AP@100 of 0.875 on the development set. Cohort-level validation against OMOP-mapped EHR data yielded precision of 0.970, recall of 0.998, and a Jaccard index of 0.968. On an independent silver-standard benchmark of 457 concept-vocabulary pairs from 15 AD/ADRD target trial emulation studies, Recall@100 reached 0.835 and AP@100 reached 0.786. Task-specific tools outperformed unrestricted SQL access and PHOEBE 2.0, while progressive guidance performed best.
- Medication-Wide Association Study of Alzheimer's Disease and Related Dementias: Identifying Drug Candidates from Electronic Health Records through Explainable AI
Objective: Alzheimer's disease (AD) is a leading cause of death and disability, and treatment options for Alzheimer's disease and related dementias (ADRD) remain limited. We applied a data-driven, mechanism-agnostic Medication-Wide Association Study Plus (MWAS+) framework to identify candidate medications associated with ADRD using longitudinal electronic health record data and explainable artificial intelligence (AI). Methods: We used Veterans Health Administration electronic health record data from January 1999 to May 2022. The initial study population comprised 8,424,715 Veterans aged 65 years or older. Cases were defined by ADRD-related diagnosis codes or ADRD-related medication prescriptions, and controls were free of ADRD diagnosis and ADRD-related medication use. After exclusions and matching on sex, race, age at first encounter, and duration of follow-up, the primary analytic cohort included 505,817 matched case-control pairs (1:1; 1,011,634 Veterans). Longitudinal features were extracted from historical data up to 1 year before the index date and aggregated into 1-year intervals. We developed an upgraded Hybrid Value-Aware Transformer (HVAT 2.0) to jointly learn from longitudinal and nonlongitudinal clinical data while incorporating numerical values associated with clinical concepts, including cumulative medication dose. To enhance interpretability, we applied a medication-specific impact score method to estimate model-derived associations between medication exposure and ADRD risk. Findings: The model demonstrated stable performance across data partitions, with area under the receiver operating characteristic curve values of 0.791 in the training set, 0.772 in the validation set, and 0.775 in the testing set. Metolazone and varenicline were identified as the top 2 candidate medications with negative impact scores, suggesting potentially protective associations with new-onset ADRD. The impact score was -0.196 per unit of cumulative dose for metolazone (1800 mg) and -0.134 per unit for varenicline (280 mg). Although individual-level impact scores varied, most exposed patients had negative scores, including 12,020 of 12,480 metolazone users (96%) and 8,341 of 8,786 varenicline users (95%). Implications: This study demonstrates the feasibility of combining a medication-wide association framework, longitudinal dose-aware modeling, and explainable AI to identify candidate medications for ADRD from real-world electronic health record data. The findings should be interpreted as signals for hypothesis generation rather than evidence of causality. This framework may support prioritization of repurposing candidates for expert review, follow-up cohort validation, and future clinical investigation.
- Prognostic performance of an AI-based recurrence risk model in clinically low-risk HR+/HER2- early breast cancer
Objective Accurate prognostication of recurrence risk in HR+/HER2- early breast cancer is central for therapeutic decision-making, including identifying patients who may safely avoid adjuvant systemic therapy. However, the performance of existing prognostic tools remains insufficient for effective clinical stratification, motivating the development of artificial intelligence (AI)-based methods to improve risk stratification. Methods Ataraxis Breast CTX (ATX) is a multi-modal AI test that integrates H&E-stained whole-slide images with clinicopathologic features to predict risk of recurrence for individual patients. This study aims to validate ATX in an external dataset enriched for clinically low-risk patients from Dordrecht, the Netherlands. ATX scores were generated for 892 women diagnosed with early HR+/HER2- breast cancer. Of the 892 patients, 299 did not receive adjuvant systemic therapy. The discriminative performance of ATX was assessed using C-index and its stratification ability was evaluated by log-rank tests comparing Kaplan-Meier survival curves across risk groups. Results ATX achieved a C-index of 0.71 and a 5-year time-dependent AUC of 0.71, demonstrating strong discrimination in predicting recurrence-free survival (RFS). Among 299 patients who received no adjuvant therapy, ATX achieved a C-index and time-dependent AUC of 0.78 and 0.81 respectively, suggesting ATX retains prognostic information in the absence of systemic therapy. ATX scores were used to stratify patients into risk groups using a pre-specified threshold, where 656 (74%) were classified as ATX low-risk and 236 (26%) were classified as high-risk. Notably, untreated and treated ATX low-risk patients had comparable 5-year RFS (untreated: 5-year RFS = 96%, 95% CI = 92-97%; treated: 5-year RFS = 96%, 95% CI = 93-97%) with near identical 10-year RFS (86%, 95% CI = 83-92% for both), suggesting ATX low-risk status may identify a subgroup with favorable prognosis independent of treatment exposure. Conclusion ATX provides robust prognostic stratification in an external cohort of clinically low-risk HR+/HER2- early breast cancer and identifies a subgroup of patients who did not receive systemic therapy with favorable observed outcomes. These results support prospective validation of ATX as a decision-support tool for adjuvant therapy de-escalation in HR+/HER2- early breast cancer.
- Agentic Chart Review from Longitudinal Clinical Notes: a Lung Cancer Guideline Concordance Use Case
Clinical chart abstraction extracts structured patient variables from longitudinal clinical notes but is labor-intensive and difficult to scale. We evaluated LLM agents for question-guided chart review using lung cancer molecular testing guideline concordance as a use case. Two configurations were compared: (1) sequential note review using metadata and chronology, and (2) the same framework augmented with keyword-based note search. Gold-standard labels were established by human annotators. The search-enabled agent achieved higher accuracy (92.4% vs. 83.5%) and reduced errors by more than half (41 vs. 89) by retrieving evidence from long, heterogeneous note histories. In guideline concordance evaluation, most determinate patient-rule assessments were concordant (80.7%), while most apparent non-concordance reflected missing molecular testing documentation rather than documented care deviations. These results suggest tool-augmented LLM agents can approximate key aspects of human chart review and support scalable information extraction from longitudinal clinical documentation.
- Signal Quality Screening and Automated Sleep Stage Agreement in Home EEG: A Systematic Comparison of Dreamento and YASA on the Wearanize+ Dataset
Wearable EEG devices such as the Zmax headband offer scalable alternatives to laboratory polysomnography (PSG) for sleep monitoring, but their real-world performance in home settings remains poorly characterised. This study presents a systematic validation of automated sleep staging on the Wearanize+ dataset; a unique multimodal resource providing synchronised full PSG, bilateral Zmax EEG (F7-Fpz/F8-Fpz), and psychiatric phenotyping from 100 participants recorded at home. We first developed and applied an automated signal quality screening framework, revealing that 10% of recordings failed completely due to signal dropout and a further 16% showed partial degradation. We then evaluated two automated staging algorithms; Dreamento and YASA against PSG manual scoring, stratified by signal quality. In technically adequate recordings (N=74), YASA achieved significantly higher agreement than Dreamento (mean {kappa}=0.450 vs 0.371; {Delta}{kappa}=+0.079, p=0.0005), primarily through substantially improved N2 detection (recall: 0.64 vs 0.36). Both algorithms showed a systematic N2/N3 boundary confusion, however in opposite directions: Dreamento over-called N3 (37% of N2 epochs mis-staged as N3), while YASA over-called N2 (35% of N3 epochs mis-staged as N2). Critically, Dreamento showed greater robustness than YASA in degraded-quality recordings (WARN group: {kappa}=0.414 vs 0.330), consistent with its training on Zmax-specific data. Signal quality metrics did not predict staging performance within adequate recordings, indicating that channel topology is the primary limiting factor for frontal single-channel staging. These findings establish the Wearanize+ dataset as a benchmark for wearable sleep staging and motivate the use of PSG manual stage labels for downstream physiological analyses.
- Trump’s EO Furthers Model Exclusivity, Harming Cyber Defenders
The EO could strengthen relations between model providers and Washington, D.C., but comes with some gaps.
- Trump’s AI order gives Washington a look at frontier models, but not much leverage
The most powerful AI models are now treated, at least in Washington, as potential national-security events. Before companies release them to the public, the government wants a chance to see what they can do: whether they can discover software vulnerabilities, assist cyberattacks, or otherwise introduce risks that federal officials may not fully understand until the models are already in use. President Trump’s new executive order, signed Tuesday , is meant to give the government that chance. But the final version leaves AI companies with considerable control over the process. It asks them to voluntarily submit advanced models for government review 30 days before public release, and it does not make release conditional on what agencies find. That is a softer framework than the White House had been considering just last month. A previous draft had mandated a 90-day window, which tech industry executives opposed. The president nearly signed the first version of the order, but after a phone call with former AI and crypto czar David Sacks, the EO was put on hold. During another White House meeting on Monday, Sacks again stressed that longer wait times would stifle domestic development of AI models. The approach drew predictable praise from free-market groups. “The administration deserves credit for recognizing that innovation, not precautionary regulation, is what made America the global leader in AI,” says Competitive Enterprise Institute fellow Wayne Crews. The EO is careful to note that the government assessment program is voluntary for AI companies, and that public release of new models is not conditional on the outcome of the assessments. Given the potential destructive power of new AI models such as Anthropic’s Mythos, the order puts the government in a limited role: close enough to review the systems, but not necessarily empowered to slow them down, some tech policy analysts observed. Critics said the voluntary structure leaves too much power in the hands of the companies being reviewed. The consumer rights advocacy group Public Citizen called the arrangement a form of industry self-regulation, while the pro-regulation nonprofit Future of Life Institute argued that highly capable models such as Mythos require more than a “trust the companies” approach. “My impression is that it does not really establish the strong leadership that the federal government has traditionally had in terms of facilitating public-private partnerships and safeguarding responsibilities that have traditionally been left to the government like critical infrastructure,” Jessica Ji, senior research analyst at Georgetown’s Center for Security and Emerging Technology, tells Fast Company . The order does not prescribe a detailed testing regime. Instead, it sets up a framework and directs agencies to build the process. It calls on the National Security Agency and other security-focused agencies to co-design the model assessment framework and determine cyber-risk thresholds, especially around advanced cyber capabilities and what qualifies as a frontier model for the review regime. The Treasury Department will establish an AI cybersecurity clearinghouse to track the discovery and patching of software vulnerabilities exposed by new AI systems. Government agencies will use the 30 days for “cyber capability evaluations, adversarial testing, and national-security review” of large AI models, the EO states. The Commerce Department’s National Institute of Standards and Technology will play a key role, as will the Center for AI Standards and Innovation, formerly the AI Safety Institute, which already evaluates frontier models. Ji believes the influence of AI companies won’t end with the EO. “I’m personally very interested to see what this dynamic might look like in the future when it comes to who will lead on cybersecurity,” Ji says. “Do the AI companies get to set the terms as they release models, especially with this kind of weakened 30-day voluntary commitment to give the government access ahead of time?” In practice, many AI companies have already begun creating their own versions of early access and pre-release testing. Anthropic gave access to its Mythos model to a modest group of software and cybersecurity partners, and on Tuesday extended access to 150 new partners in more than 15 countries. OpenAI gave early access to its latest GPT-5.5 model to almost 200 trusted partners under its own early testing program, and a cybersecurity-focused version of the model remains available only to trusted partners. Those company-led programs may give some outside experts a look at the most capable new systems before they are widely released. But they also underscore one of the central tensions raised by the EO: whether the government can build an independent assessment process when the companies control much of the access, infrastructure, and technical information needed to evaluate the models. It’s also unclear whether 30 days is enough time for the government to properly assess the risks of an advanced AI model. “It depends on capacity to do evaluations, and I think the organizations best positioned to do those evaluations are the companies themselves,” Ji says. “So obviously we have a bit of a transparency problem: There’s this huge information asymmetry between the companies and everybody else, including the government.” The government might also face challenges in finding the right AI research talent and compute resources, as well as in managing access to the models and working out the details of the partnership with AI companies, Ji says. “I think a month probably does not mean that testers will have 30 days hands-on with the model,” she says. “It might look more like two weeks after they work through all the paperwork. It’s hard to say whether 30 days is adequate.”
- Trump signs order designed to give government early look at powerful AI models
Trump signs order designed to give government early look at powerful AI models The Washington Post
- Anthropic files for US stock market debut after valuation surge
Anthropic files for US stock market debut after valuation surge Computing UK
- Trump demands AI previews
Trump demands AI previews, Microsoft launches Scout personal assistant
- Vexavibes
Instant AI polls & surveys secured by a verifiable ledger
- Knack
Agent Skills Management
- Online Receipt Maker by FDM AI
Generate professional payment receipts instantly.
- React App
Cryptographic identity and post-quantum crypto for AI agents
- QUATTRO — Four tiny tools, one window
AI answers, tasks and calendar without leaving your flow
- AdSights Ads Framework
Programmatic ad production with Claude Code + Remotion
- TripAI
AI travel itineraries with interactive live maps in 30s
- Tarotool
Free AI tarot readings with clear 3-card insights
- Singify
Turn Any Text Into a Song with AI for FREE
- Palette Inspiration
Generate palettes inspired by 3,000 master painters
- Taxorio
Czech invoicing and VAT compliance made simple with AI
- Datailor Preference MCP
Give every AI agent your defaults
- Fiktion
AI Writing Workspace for Fiction Authors
- Pitch N Hire 2.0
The AI-Native ATS, Recruitment & Hiring Infrastructure
- VidGenn · Captions
AI-powered animated captions for videos in 38 styles
- Flowtrace
Watch and steer your AI agent as a live graph
- Rocketship
The only AI app builder with a built-in AI sales team
- FanQuiz
AI personality quizzes for creators, in any language.
- SpeakLearn
Speak English fluently with your personal AI tutor,Lea
- HowFast
The exact steps to do anything, powered by AI
- inbrowser.chat
Private AI chat that runs fully on-device in Chrome
- Strova
AI Standup Manager for Engineering Teams.
- Ones
One supplement, designed by AI from your blood & wearables
- Gen Pen by Obello
Refine designs and images faster with advanced AI editing
- Tubeup
Automate YouTube uploads — unlimited videos, AI Powered.