Performance of Google NotebookLM for AI-assisted data extraction and consensus statement generation in a heterogenous systematic review on inflammatory bowel disease, obesity, and cardiometabolic comorbidities: A Methodological Report

Background: Large language models (LLMs) offer promise for systematic review data extraction, but performance in complex multidisciplinary domains and utility for clinical statement generation remain insufficiently described. Objectives: To evaluate Google NotebookLM for AI-assisted data extraction and RAND/UCLA consensus statement generation in a systematic review of IBD, obesity, and cardiometabolic comorbidities. Methods: Studies were organized into domain-specific notebooks; structured prompts generated standardized evidence tables. Two independent reviewers validated outputs against full-text articles using a four-category error classification. Cell-level accuracy and critical accuracy (cells free of major factual errors) were the primary metrics; workflow time was compared against a published conventional extraction benchmark. Concordance between AI-generated and expert-finalized statements was assessed. Results: Across 57 articles, 1,710 data cells were extracted; 151 (8.83%) were flagged, yielding 91.17% cell-level accuracy. Major factual errors occurred in only 4 cells (0.23%), for a critical accuracy of 99.77%. Most errors were minor omissions (59.6%) or incomplete extractions (30.5%); domain error rates ranged from 7.08% to 11.33%. The pipeline required 17.7 versus a projected 165.1 person-hours (89.3% reduction). PICO-structured prompting generated 70 candidate statements; 58 of 112 finalized panel statements (51.8%) were AI-derived, and 85.7% were retained in the finalized set. Conclusion: Google NotebookLM demonstrates feasibility as a primary extraction and synthesis tool in a multidisciplinary systematic review, with extractive incompleteness as the principal limitation and substantial time savings over conventional approaches. Its novel application to RAND/UCLA consensus statement generation extends AI-assisted evidence synthesis to clinical consensus generation workflow.

Read Original Article →

Source

https://www.medrxiv.org/content/10.64898/2026.06.16.26355773v1?rss=1