Explainable machine learning reveals an RBP regulatory logic of exon skipping

RNA binding proteins (RBPs) regulate the life cycle of an mRNA, often through RBP-RNA interactions. This life cycle includes splicing, whereby the intronic sequence of a pre-mRNA is removed and the exons are joined together. However, the patterns of RBP binding that lead to different splicing outcomes are still incompletely understood. Here, we build machine learning models from RBP-RNA binding and knockdown RNA-seq data for over 168 RBPs in two cell lines (HepG2 and K562) to better understand the binding patterns that predict exon skipping, the predominant form of alternative splicing in humans. We show that models trained exclusively on RBP binding patterns are indeed predictive and that a more sophisticated machine learning model (XGBoost) outperforms simpler linear models. In addition, we are able to extract a biologically interpretable logic embedded in these models. We show that SHAP, a machine learning explainability technique, captures activating and repressive behavior of RBP binding that is position-specific. In addition, we find that SHAP values are predictive of changes in unseen splicing events and that SHAP interactions between pairs of RBPs are predictive of protein-protein interactions. Our results demonstrate that using machine learning with interpretability techniques can reveal a regulatory logic of RBP binding. By estimating the impact of an RBP binding site on a splicing event, the SHAP values also provide a directly testable scientific hypothesis. We anticipate that models designed around biological processes and focused on interpretability will yield actionable biological insights both in splicing and genomics generally.

Read Original Article →

Source

https://www.biorxiv.org/content/10.64898/2026.05.29.728731v1?rss=1