Enhancing Title and Abstract Priority Screening Through SimEd AI Pipeline.

The exhaustive identification of evidence is central to systematic reviews, but the screening of titles and abstracts remains particularly labor intensive. Priority screening, an active learning approach that ranks records by estimated relevance, has emerged as an effective strategy to reduce screening workload. Its efficiency is commonly quantified using work saved over sampling at 100% recall (WSS@100%), representing the percentage reduction in effort compared with random screening. Although modern priority-screening models achieve high efficiency on many benchmark datasets, some reviews still exhibit low WSS@100%, indicating suboptimal retrieval. Our study sought to improve the retrieval of all relevant articles in challenging datasets to ensure better generalization of priority screening. We first showed using SYNERGY benchmark datasets that while the most advanced ELAS_h3 priority screening model from state-of-the-art ASReview LAB v.2 open-source software, efficiently retrieved most relevant articles, it struggled with the rare, final ones in challenging datasets. To address this, we tested a hybrid approach entitled SimEd AI: using ELAS_h3 for early retrieval and then applying supervised fine-tuning to the biomedical transformer BioMed-RoBERTa-base with these relevant articles to enhance the detection of the remaining difficult cases. We found that fine-tuning BioMed-RoBERTa-base model with 10 late-identified relevant and 10 hard irrelevant study titles and abstracts, enabled faster retrieval of articles of interest compared to ELAS_h3 alone. This approach increased WSS@100% from 46.5% (SD0.0%) to 83.3% (SD0.4%), while adding only an average of 22 minutes of computational time for fine-tuning and inference. The SimEd AI priority screening pipeline could be valuable for situations requiring highest possible recall. It could be particularly useful in scoping reviews with broad or diverse topics where traditional priority screening methods may miss subtle relevance signals. Further work should define a data-driven stopping rule for ending screening once the fine-tuned domain-specific transformer is applied at the final stage and assess generalizability across additional challenging datasets.

Read Original Article →

Source

https://www.medrxiv.org/content/10.64898/2026.06.26.26356718v1?rss=1