AI News Archive: June 11, 2026 — Part 22
Sourced from 500+ daily AI sources, scored by relevance.
- OmniCV
AI resume builder
- Savvy HRMS | HR & Payroll Software
AI-powered HR & payroll software for Indian businesses
- Averlon Solutions
Streamline, Automate, and Scale with Averlon
- RemovePK
Remove & Edit Images Backgrounds Instantly
- Ivorycom CRM
Ivorycom CRM · AI-Powered Platform CRM
- AISA - AI Skills Assessment
Measure real life AI skills through conversation.
- LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis
Large language models (LLMs) are increasingly used as interactive assistants for technical problem solving. However, when users provide incomplete descriptions or plausible but unverified explanations, LLMs may prematurely align with these assumptions and propose solutions before collecting sufficie...
- $α$-fair heterogeneous agent reinforcement learning
Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every...
- The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale
The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents dis...
- Tuning Agent-Based Predator-Prey Models Toward Lotka-Volterra Dynamics
Recent growth in compute power has made it increasingly feasible to use large-scale agent-based models to simulate complex adaptive systems. A central difficulty is that such models contain many local rules and parameters, where small changes can lead to runaway behaviour, population collapse, or sa...
- See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignme...
- The Illusion of Multi-Agent Advantage
Prevailing wisdom posits that Multi-Agent Systems (MAS) are superior to Single-Agent Systems (SAS), citing advantages like context protection, parallel processing and distributed decision-making. However, empirical support for this claim relies primarily on comparisons with SAS baselines using bench...
- Exploring How Agent Voice Accents Shape Human-AI Collaboration in K-12 Group Learning
Collaboration is widely recognized as a cornerstone of 21st-century education, yet teachers still encounter persistent challenges in fostering productive peer interaction. LLM conversational peer agents introduce new possibilities for mediating in-person group work, raising questions about how perso...
- "Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System
This paper examines how algorithmic accountability in Canada's visa system is articulated institutionally and experienced by applicants across borders. We analyzed Immigration, Refugees and Citizenship Canada (IRCC)'s Algorithmic Impact Assessment (AIA) for the temporary resident visa (TRV) triage s...
- Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation
The UK government has adopted a pro-AI stance to help transform public service delivery in the face of severe financial pressures, but the path to translate this vision into responsible AI practice remains ill-defined. While UK policy is often set at the national level, local authorities are respons...
- From Prompts to Preferences: An Open-Source Platform for Generative AI-Enhanced Conjoint Analysis
Conjoint analysis is a widely used preference measurement method in marketing research, political science, healthcare, and human-computer interaction. Despite broad adoption, researchers without access to commercial platforms face significant barriers, as existing tools are either expensive or lack ...
- The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems
Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subta...
- PolicyGuard: Towards Test-time and Step-level Adversary Defense for Reinforcement Learning Agent
While real-world applications of reinforcement learning (RL) are becoming increasingly popular, the security of RL systems deserve more attention and exploration. In particular, recent work has revealed that RL agents are vulnerable to backdoor attacks, where a victim agent behaves normally under st...
- A Privacy-Preserving Framework Using Remote Data Science for Inter-Institutional Student Retention Prediction
This study explores privacy-preserving machine learning (PPML) techniques using the PySyft platform to enable collaborative prediction of student retention between institutions. We developed a remote data science (RDS) framework with a semi-air-gapped architecture consisting of high-side and low-sid...
- Detecting Functional Memorization in Code Language Models
Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equiva...
- Differentially Private Hierarchical Heavy Hitters
The task of finding _Hierarchical_ Heavy Hitters (HHH) was introduced by Cormode et al. [VLDB 2003] as a generalisation of the heavy hitter problem. While finding HHH in data streams has been studied extensively, the question of releasing HHH when the underlying data is private remains unexplored. I...
- Efficient, Robust, and Anti-Collusion Fingerprinting of Image Diffusion Models
Model fingerprinting, embedding user-specific identifiers (fingerprints) into generated outputs, has recently emerged as a popular solution to protect the intellectual property rights (IPR) of generative text-to-image (T2I) models and prevent unauthorized redistribution. In this work, we reveal a pr...
- ViPER: Vision-based Packing-Aware Encoder for Robust Malware Detection
Visualization-based malware detection maps raw binary bytes to grayscale images and applies learned visual classifiers, providing an evasion-resistant and disassembly-free alternative to conventional analysis pipelines. However, executable packing remains a critical failure mode: packed binaries pro...
- MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems
Hierarchical multi-agent systems (MAS) are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particularl...
- Semantic Identification of IoT Devices from Behavioral Primitives
Accurate identification of IoT devices is important for security management and policy enforcement. Existing approaches typically learn device signatures from packets or flow records. These methods operate on low-level communication observations whose traffic patterns may vary across deployments, so...
- The Rise of AI-Native Software Engineering: Implications for Practice, Education, and the Future Workforce
Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), and emerging Agentic AI constitute the most disruptive transformation in the history of software engineering (SE), reshaping development processes, required competencies, professional roles, and the educational outcomes that u...
- Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming
Despite strong performance in competitive programming, the role of Large Language Models (LLMs) in supporting human learning in the same setting remains largely unexplored. In this work, we introduce UOJ-Bench, a benchmark designed to evaluate not only the problem-solving ability of LLMs, but also t...
- Mining Architectural Quality Under Agentic AI Adoption: A Causal Study of Java Repositories
AI coding tools are now used by a majority of developers, and agentic use of these tools has popularized the practice colloquially called "vibe coding". Yet causal evidence on their effect on software architecture is scarce. Prior causal work has measured code-level outcomes (complexity, static anal...
- The End of Code Review: Coding Agents Supersede Human Inspection
Code review has been the primary quality gate in software development since Fagan formalised code inspection in 1976. For five decades, having a human examine and comment on a colleague's changes before merge has been a cornerstone practice at organisations of every size. Coding agents are large lan...
- Generating Training Targets for Real-World Speech Enhancement via Close-to-Distant Microphone Projection
Training neural networks (NNs) for speech enhancement (SE) in distant speech-capturing scenarios requires paired distorted and clean reference speech signals. While such data are often generated through simulation, the mismatch between simulated and real recordings significantly limits SE accuracy. ...
- Balancing ASR and diarization in end-to-end LLMs for multi-talker speech recognition
Multi-talker speech recognition is often addressed by combining automatic speech recognition (ASR) and speaker diarization in a pipeline system. Recently, LLM-based approaches have shown promise by jointly modeling semantic and speaker information, but they typically require large-scale multi-talker...
- Endpoint Anticipation for Low-Latency Spoken Dialogue
While low-latency interaction is critical for spoken dialogue, cascaded architectures are often bottlenecked by reactive turn-completion detection. We propose Endpoint Anticipation, shifting from reactive detection to proactive forecasting of end-of-turn signals. Our speech-based model anticipates e...
- OneRetrieval: Unifying Multi-Branch E-commerce Retrieval with an Editable Generative Model
Industrial e-commerce search serves hundreds of millions of items through a multi-branch retrieval stage fused by hand-tuned merging without joint optimization. Generative retrieval (GR) raises the prospect of collapsing this stage into a single model, yet unification is gated by more than retrieval...
- CQC-RAG: Robust Retrieval-Augmented Generation via Cross-Query Consistency
Retrieval-Augmented Generation (RAG) has become a common approach for improving the factuality of Large Language Models (LLMs), yet its reliability remains highly sensitive to how external evidence is retrieved and used. Semantically equivalent queries with different syntactic forms may lead to diff...
- CFALR: Collaborative Filtering-Augmented Large Language Model for Personalized Fashion Outfit Recommendation
Personalized outfit recommendation poses a significant challenge in e-commerce and social media platforms, requiring systems that balance user preferences with aesthetic compatibility. Collaborative filtering (CF) provides a traditional solution for this, but it struggles with data-sparse scenarios ...
- How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation
Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present HieraRAG, a hierarchical framework for studying granularity in RAG benc...
- CoDeR: Local Constraint-Compatible Retrieval Beyond Semantic Similarity
Information retrieval systems have long treated semantic similarity as a proxy for relevance. For constraint-sensitive queries, this proxy can fail when a document is topically close to the query but supports the opposite constraint direction, such as satisfying an attribute that should be excluded ...
- The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman
RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNS...
- Multimodal PET Defines a Goldilocks Thermal Window for Focused Ultrasound Ablation and Immunotherapy Combinations
Background: Thermally ablative focused ultrasound (T-FUS) offers a noninvasive, spatially precise strategy for local tumor destruction, with the added potential to remodel tumor architecture and immune dynamics in ways that influence downstream therapeutic delivery and efficacy. Despite promising preclinical and clinical findings, the T-FUS parameters that best balance tumor debulking with preservation of local biologic, e.g. immunotherapy, penetrance remain unclear. Thermal dose, defined by the relationship between tissue heating, exposure duration, and biological effect, is likely a critical determinant of this balance. Excessive thermal dose may eliminate the vascular and stromal features needed to support immunotherapy access, whereas insufficient thermal dose may fail to achieve meaningful cytoreduction. Here, we deploy multimodal PET, contrast-enhanced ultrasound, and tissue profiling to define a Goldilocks Zone for T-FUS that balances bulk tumor destruction with immunotherapy delivery. Method: Subtotal T-FUS was applied to 4T1 tumors using three thermal dose regimens resolved by in silico modeling. Ablation was quantified by H&E and TTC staining. Post-ablative perfusion and microvascular coverage were assessed by contrast-enhanced ultrasound and immunofluorescence, respectively. Tumor oxygenation was measured by intravenous hypoxyprobe labeling. After T-FUS, mice underwent dynamic [18F]-FDG PET and immunoPET with a model tumor-targeted antibody, [89Zr]-anti-CD47, to relate cytoreduction to antibody penetrance. ImmunoPET findings were further evaluated by ex vivo biodistribution analysis. Results: In silico modeling established three T-FUS regimens that generated distinct thermal dose profiles and were deployed in vivo in a solid breast tumor model. Histopathology, perfusion imaging, and hypoxia analysis revealed dose-dependent and dose-divergent biological effects that informed a candidate Goldilocks thermal window. Low thermal dose produced measurable but limited tumor debulking, whereas high thermal dose caused disproportionate functional perfusion collapse. An intermediate thermal dose achieved robust partial ablation, broad hypoxia relief, and preservation of residual tumor physiology sufficient to support antibody access. Dynamic [18F]-FDG PET confirmed a marked reduction in metabolically active tumor burden after Goldilocks T-FUS. Serial [89Zr]-anti-CD47 immunoPET showed that bulk antibody signal was maintained after ablation, and integration of immunoPET with matched [18F]-FDG PET revealed approximately 3-fold enrichment of antibody exposure within the residual viable tumor compartment of ablated tumors. These findings demonstrate that appropriately tuned thermal ablation can debulk tumor while preserving, and potentially concentrating, immunotherapy access within the remaining targetable tumor niche. Conclusion: This study identifies thermal dose as a critical consideration for T-FUS immunotherapy combinations and establishes a PET-informed framework for balancing cytoreduction with therapeutic delivery. Rather than functioning solely as a local debulking modality, we demonstrate that T-FUS can be tuned to yield a post-ablation tumor state that remains accessible to large biologics. These findings provide timely, translationally relevant guidance for tailoring T-FUS regimens to achieve local tumor destruction while preserving an immunotherapy-permissive niche for combination treatment.
- Controlling metal-carbonate phase, form, and function through de novo protein design
Biomineralization enables living systems to construct hybrid materials by controlling the location, orientation, and polymorph of inorganic crystals with proteins and other biomolecules. Despite decades of study, the molecular principles underlying these processes remain difficult to harness in engineered materials, in part because native biomineralization proteins are often intrinsically disordered, heterogeneous, or insoluble. Here we show that de novo designed protein interfaces can be assembled into reconfigurable two-dimensional arrays which template calcite nanocrystals. By fine-tuning RFdiffusion2 on repeat protein scaffolds, we further enable the design of protein architectures which selectively form aragonite, a metastable polymorph of calcium carbonate, in nucleation conditions that otherwise result in a mixture of phases. Extending beyond inorganics found in biological systems, we show that lattice-matched protein designs template cobalt carbonate formation: a flat helical repeat protein interface promotes unconfined growth, whereas soluble D3 cage assemblies yield more homogenous cobalt carbonate nanocrystals confined to the interior of the cage. These protein-cage cobalt carbonate hybrid materials function as electrocatalysts for alkaline water splitting. Our results demonstrate the potential of deep learning-based methods to unlock the structural and functional activity of protein-mineral composites.
- Multimodal phenotyping defines variant-to-function maps for RBM20 in dilated cardiomyopathy
Multiplex assays of variant effects have linked thousands of genotypes to fitness effects, yet we lack profound understanding of how variants impact molecular phenotypes. Here, we introduce a deep mutational scanning framework that quantifies disease-determining molecular phenotypes in human cells, allowing readouts of protein localization and splicing regulatory function at scale. Applied to the dilated cardiomyopathy (DCM)-associated protein RBM20, we profiled ~4,300 amino acid substitutions across disease-linked protein domains. Complemented by structure-function investigations of RBM20 bound to its nuclear import receptor TNPO3, we discover new variant hotspots affecting protein function. Finally, we systematically probed nuclear relocalization to identify variants that may be amenable to this therapeutic strategy. Together, we create comprehensive variant-to-function maps that predict variant impact, enhance clinical interpretation, and stratify RBM20-mediated DCM into mechanistically distinct therapeutic classes.
- DLDN-Bench: A Benchmark Framework for Deep Learning de Novo Peptide Sequencing in Proteomics
De novo peptide sequencing is an essential approach for analyzing mass spectrometry data because it enables the identification of novel peptides without relying on protein sequence databases. Recent advances in deep learning have substantially improved the performance of de novo sequencing methods, but the rapid emergence of new models has led to heterogeneous evaluation practices and limited comparability. To address this, we introduce DLDN-Bench, a benchmark framework including a set of benchmark datasets derived from human muscle biopsy mass spectrometry data retrieved from PRIDE and annotated through consensus across multiple widely used database search engines. Using these datasets, we systematically benchmark recent deep learning-based de novo sequencing tools alongside traditional approaches. Performance is assessed using established metrics, including precision and coverage relative to a pseudo-ground truth defined by cross-engine agreement. To demonstrate the utility of DLDN-Bench, we benchmark four recent deep learning models and make all results publicly available. This benchmark framework provides a standardized basis for comparing state-of-the-art methods and offers an extensible resource for evaluating future tools in de novo peptide sequencing.
- DyMoTree decodes early cell state transitions and drivers from single-cell transcriptomes using a tree-structured neural network
Inferring early cell fate from single-cell RNA-sequencing data is essential for identifying cellular origins and fate plasticity in development and disease. However, existing methods often fail to exploit tree-structured lineage trajectories, limiting the accuracy and interpretability of fate mapping. Here we present DyMoTree, a computational framework that models cell fate decisions as nonlinear mappings between progenitor and terminal cell states under explicit lineage constraints. By integrating lineage graphs with a tree-structured neural architecture, DyMoTree learns lineage-resolved cell-state transition maps from single-cell transcriptomes, enabling robust inference of early fate bias and identification of fate-specific progenitor substates and driver genes. Across simulations, lineage-tracing experiments, and in vivo systems, DyMoTree outperformed existing methods in resolving early fate biases. Applications to mouse embryogenesis, lung adenocarcinoma progression, and CAR-T immunotherapy revealed regulatory programs underlying developmental and disease-associated transitions. DyMoTree provides a general framework for modeling lineage-resolved cell-state dynamics underlying development and disease progression.
- HalluDesign-NA: Extending HalluDesign for De Novo Nucleic Acid Design
AlphaFold3 has revolutionized the prediction of biomolecular structures and interactions, including atomic-level modeling of nucleic acids. However, the de novo design of structured and functional nucleic acids remains a significant challenge. Here, we extend our HalluDesign framework to nucleic acid design by integrating NA-MPNN for nucleic acid sequence optimization and design. This new framework, HalluDesign-NA, enables iterative sequence-structure co-optimization, facilitating the de novo design of nucleic acids. Computational benchmarking across ssDNA, ssRNA, and aptamer design tasks demonstrates consistent improvements in confidence scores (pLDDT, ipTM), supporting the feasibility of de novo nucleic acid design under various constraints, such as sequence length, symmetry, and protein structure context. We anticipate that HalluDesign-NA will accelerate the de novo design of functional nucleic acids for applications in biotechnology and medicine. The source code for HalluDesign-NA is available at https://github.com/MinchaoFang/HalluDesign_NA.
- Viability of engineered AAVs via protein language models
Capsid engineering has greatly improved the performance of recombinant AAV vectors used for gene therapy. One commonly used strategy is the insertion of a short, 7-mer, peptide into surface-exposed loops to modify receptor interactions and enhance cell entry. While effective in receptor retargeting and improved transduction, these insertions might destabilize the capsid protein, hinder assembly, and thus limit production. While previous attempts have used deep mutational scanning and AI to predict which insertions are viable, there is lack in understanding the structural consequences of these peptide insertions at the amino-acid level. Here we combined experiments, deep sequencing and large protein language models to gain insight on the impact of 7-mer insertions on the VR-VIII region. We first characterize the biochemical properties of viable insertions, thus identifying which residues are well tolerated, and which should instead be avoided. We then focus on the nearby context of those insertions, by studying the effect of the linkers, either for highly diverse libraries or for individual variants known for their efficiency. Next, we study the broader context, by extending our analysis to the whole capsid sequence, and identifying regions that can tolerate insertions without long-ranged structural deformations that could affect capsid functionality. We conclude with a cross-serotype comparison and a viability analysis of tens of previously engineered variants. Our work showcases how AI can uncover structure-function rules governing the success of engineered AAV capsids.
- PCRAgent: A Multi-Agent Framework for Transforming Noisy clinical conversations into Structured Pre-Consultation Medical Records and Reusable Clinical Data Resources
In primary care and outpatient settings, clinically important patient information is often embedded in fragmented, ambiguous, repetitive, and noisy communication between physicians and patients. This limits physicians ability to obtain a clear preconsultation overview of symptoms, history of present illness, and visit intent, while also preventing real world clinical dialogues from being reused in hospital information systems and medical artificial intelligence applications. To address this challenge, we developed PCRAgent, a centrally coordinated multi agent framework for preconsultation clinical information organization. Guided by physician inquiry logic, PCRAgent identifies, extracts, corrects, and standardizes patient-reported information from noisy consultations. Its coordinated modules including error detection, semantic editing, output control, contextual memory, and intent recognition enable robust parallel handling of spelling errors, repetitions, grammatical inconsistencies, medical ambiguities, and non-medical interference. A traceable edit list records intermediate corrections and context, allowing iterative refinement without redundant modifications. PCRAgent generates two complementary outputs. One is a PreConsultation Clinical Report for rapid physician review. The other is a Structured Clinical Conversation Dataset for hospital data construction and downstream AI applications. In evaluations using 220000 strongly perturbed consultations, PCRAgent maintained high robustness, achieving a clinical information accuracy of 4.99 out of 5 and key element completeness of 5 out of 5, outperforming GPT4o. Expert review of Chinese and English dialogues confirmed high clinical accuracy of 4.85 out of 5 and high safety of 4.79 out of 5. Multicenter validation in real-world outpatient workflows further demonstrated practical utility. These findings indicate that PCRAgent can efficiently transform noisy and unstructured consultations into physician ready reports and AI ready structured data, improving outpatient efficiency, reducing cognitive burden, ensuring information completeness, supporting precise decision-making, and enabling high-quality reuse of clinical data.
- Computer Vision Scoring of Figure Copy and Recall
Objective. Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters. Methods. A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters. Results. Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect ({Delta} = +1.18) that continued with repeated tests up to 30 months ({Delta} = +0.75).
- What level of expertise is necessary to generate ACLS training test questions: pre-med students vs. artificial intelligence?
Abstract Introduction In-hospital cardiac arrest carries high mortality despite standardized ACLS training. Educators face increasing time constraints in developing assessment tools for ACLS training. Two possible solutions to this problem are using pre-medical students or using artificial intelligence to generate test questions. This study compared the quality of pre-medical student-generated ACLS test questions vs. AI-generated ACLS test questions, testing the hypothesis that AI-generated questions are non-inferior to student-generated questions. Methods Ten pre-medical students created ACLS questions following predefined criteria, while an AI model (Northwell's Artificial Intelligence Hub) generated comparable questions. A blinded ACLS-certified physician evaluated questions on the qualities of Alignment, Clarity, Cognitive Level, and Question Design using a standardized rubric (Likert scale: 1 = poor quality, 5 = excellent). Student's T-test and Chi-square analysis were used to compare the quality of questions on different rubric domains within each arm (student vs. AI) and within one domain (eg, question Clarity) between arms. The Student's T test was used when 2 comparator groups were compared (eg, Clarity of student-generated vs. AI-generated questions) within one arm. The ANOVA test was used when comparing more than 2 comparator groups (eg, Alignment vs. Clarity vs. Cognitive Level) within one arm. Statistical significance was set as a priority at p <0.05. Results Both student-generated and AI-generated questions were of high quality. AI-generated questions achieved the maximum score in the domains of Alignment, Clarity, and Question Design, but fell short of perfect scores in the domain of Cognitive Level (8 of 50 questions were less than 5). Student-generated questions achieved less-than-perfect scores in each domain. No significant difference was found in overall mean question scores between groups (students = 4.79, AI = 4.81; p = 0.9). However, AI-generated questions had significantly-greater Clarity (students = 4.8, AI = 5; p = .0461), while Alignment, Cognitive level, and Question Design showed no significant differences. Conclusion AI-generated questions demonstrated overall quality comparable to those generated by pre-medical students, supporting the potential role of AI as a scalable tool in ACLS educational assessment development. Further studies are warranted to evaluate additional AI platforms and determine optimal integration of AI in medical education assessment design.
- Visa is connecting with ChatGPT to let AI agents automatically make purchases
ChatGPT can now search for and buy products on your behalf using Visa.
- TCS partners Anthropic, to roll out Claude AI access to 50K employees
TCS partners Anthropic, to roll out Claude AI access to 50K employees Techcircle