ModCRE-NN: Interpretable Deep Learning Harnesses Structural and Evolutionary Synergy to Predict Transcription Factor Binding Specificity

We present ModCRE-NN, a machine-learning framework and server for predicting transcription-factor (TF) DNA-binding motifs through the integration of structural and evolutionary information. The method combines structure-derived Position Weight Matrices (PWMs) together with PWMs of homologous spanning multiple evolutionary sequence-identity intervals, which are integrated into a unified 20-channel tensor representation. Benchmark datasets were constructed on experimental databases of TF motifs, showing DNA binding specificity, while redundancy reduction and strict train/test partitioning minimized homology leakage. Prediction quality was evaluated on an independent separated set of TFs using the similarity analysis of profiles. Three complementary architectures were implemented and evaluated: an interpretable regression-based model, a convolutional neural network (CNN), and a Transformer-based architecture using self-attention mechanisms. The regression model achieved strong performance in high-homology regimes dominated by closely related PWMs, whereas CNN and Transformer architectures showed superior robustness under low evolutionary similarity and increased structural uncertainty. Importantly, AI-generated motifs consistently improved the similarity-scores while reducing prediction variance relative to the original structural and evolutionary input motifs, indicating that the models effectively denoise heterogeneous motif assemblies and reconstruct stable consensus DNA-binding representations rather than simply transferring PWMs from the nearest homolog. The CNN model exhibited the most balanced attribution profile, suggesting enhanced ability to combine weak structural and evolutionary signals into coherent motif representations. Additionally, we implemented a prediction-reliability framework combining Random Forest regression, exponential interpolation, and hybrid residual-corrected modeling to estimate the quality and uncertainty of the PWMs as functions of evolutionary similarity, motif-cluster consistency, and TF-family context. Overall, our results demonstrate that integrating structural information with deep learning provides a robust framework for large-scale TF-binding specificity prediction under conditions of substantial evolutionary divergence and motif uncertainty.

Read Original Article →

Source

https://www.biorxiv.org/content/10.64898/2026.05.27.728137v1?rss=1