Predicting epistasis across proteins by structural logic
Document Type
Journal Article
Publication Date
1-20-2026
Journal
Proceedings of the National Academy of Sciences of the United States of America
Volume
123
Issue
3
DOI
10.1073/pnas.2516291123
Keywords
epistasis; machine learning; variant effects
Abstract
Accurately predicting the phenotypic consequences of genetic variation is a major challenge for precision medicine. The problem is exacerbated by epistatic interactions, nonadditive effects between genetic variants that produce unexpected phenotypes. Here, we explore an understudied form of positive epistasis: intragenic complementation, in which pairs of loss-of-function variants restore near wild-type protein function. Using mutational scanning in yeast, we identify thousands of such interactions in a clinically important enzyme, human argininosuccinate lyase (ASL). Restoration of protein function is not due to the biochemical properties of the substituted amino acids, but rather to a structural feature of the protein, the active site assembly. We develop a machine learning algorithm that uses protein language model embeddings to predict intragenic complementation in ASL with 99.6% accuracy. Additionally, the model trained on ASL generalizes to a structurally related but sequence-divergent enzyme, fumarase, with accuracy over 90%. Our findings reveal a structural basis for this form of epistasis and provide a predictive framework that could extend to at least 4% of human proteins.
APA Citation
Tang, Michelle; Cromie, Gareth A.; Kabir, Anowarul; Timour, Martin S.; Ashmead, Julee; Lo, Russell S.; Corley, Nathaniel; DiMaio, Frank; Morizono, Hiroki; Caldovic, Ljubica; Ah Mew, Nicholas; Gropman, Andrea; Shehu, Amarda; and Dudley, Aimée M., "Predicting epistasis across proteins by structural logic" (2026). GW Authored Works. Paper 8547.
https://hsrc.himmelfarb.gwu.edu/gwhpubs/8547
Department
Pediatrics