The500Feed.Live

Everything going on in AI - updated daily from 500+ sources

← Back to The 500 Feed
📄 ResearchJune 4, 2026

R-loop Prediction Reveals Generalization Limits of DNA Foundation Models Beyond Regulatory Genomics

DNA foundation models are increasingly proposed as general-purpose representations for genomic prediction and design, yet their evaluation remains largely centered on conventional regulatory tasks. This leaves a critical question unresolved: do DNA foundation models generalize to sequence biology beyond conventional gene regulation? To answer this question, we introduce RloopBench, a systematic benchmark for R-loop-forming sequence prediction as a biophysically distinct, genome-stability-associated task. We compare rule-based methods, task-specific models, classical sequence encodings, and foundation model representations across in-distribution, cross-platform, consensus-level, and cross-species evaluations. Foundation models achieve strong performance when positive and negative sequences are compositionally separable, but this advantage does not consistently transfer to cross-platform and cross-species settings, where they are often comparable to classical k-mer representations. Unexpectedly, a one-hot classifier baseline shows the strongest overall sensitivity to R-loop-forming sequences, exceeding more complex models across several generalization tests. Rule-based and task-specific models also exhibit limited transfer outside their original training regimes. Performance is further shaped by sequence properties, negative-control design, experimental platform, and species-specific genomic context. Together, RloopBench establishes genome-stability-associated sequence prediction as a complementary direction for DNA foundation model development and evaluation, while underscoring that simple sequence encodings remain necessary baselines for assessing model generalization beyond conventional regulatory tasks.

Read Original Article →

Source

https://www.biorxiv.org/content/10.64898/2026.06.01.729367v1?rss=1