Vibe Coding Specificity Foundation Models

Molecular recognition - the determination of which agent binds which target - governs adaptive immunity, gene regulation, signal transduction, RNA silencing, enzyme catalysis, and the selectivity of therapeutics. Determining binding specificity remains dependent on experimental screening or domain-specific computational tools that do not generalize across binding modalities. Transformer softmax attention is mathematically identical to the Boltzmann distribution governing molecular binding. This identity, together with five conditions of molecular recognition systems, prescribes a single neural network architecture for cross-modal binding prediction: dual sequence encoders, symmetric contrastive learning, and a learned physical temperature. A Specificity Foundation Model (SFM) is an instance of this physics-derived, sequence-to-sequence architecture that maps any agent-target sequence pair to a binding compatibility score, enabling bidirectional retrieval across molecular recognition domains without requiring structural information. The first SFM for antibody-antigen binding demonstrated ~100,000-fold greater data efficiency than comparable vision-language models. Here we report six SFMs across six molecular recognition domains - transcription factor-DNA, enzyme-substrate, peptide-MHC, CRISPR gRNA-off-target genomic DNA, microRNA-mRNA target, and small molecule drug-target protein - using the identical architecture without modification and trained using publicly available data only. Evaluated by cross-modal retrieval from pools of 512 candidates (random baseline 0.2%), in-distribution R@1 ranges from 27.7% to 98.0% across the six domains. mir-SFM retrieves miRNA targets at 98.0% R@1, including the ~80% of validated interactions that seed-matching tools cannot find. mhcSFM achieves 95.4% R@1 on held-out rare HLA alleles absent from training. Applying crisprSFM to CRISPR off-target prediction improves precision to 94.0% compared to 33.2% from Hamming distance alone. All six SFMs were built by a domain expert with no programming experience using vibe coding - natural-language-directed AI coding agents - with numerical claims independently verified by an orthogonal AI auditor. These results establish SFMs as a physics-derived, sequence-native class of model that augments experimental and computational workflows across molecular recognition domains.

Read Original Article →

Source

https://www.biorxiv.org/content/10.64898/2026.06.04.730134v1?rss=1