Predicting epistasis across proteins by structural logic

Document Type

Journal Article

Publication Date

1-20-2026

Journal

Proceedings of the National Academy of Sciences of the United States of America

Volume

123

Issue

3

DOI

10.1073/pnas.2516291123

Keywords

epistasis; machine learning; variant effects

Abstract

Accurately predicting the phenotypic consequences of genetic variation is a major challenge for precision medicine. The problem is exacerbated by epistatic interactions, nonadditive effects between genetic variants that produce unexpected phenotypes. Here, we explore an understudied form of positive epistasis: intragenic complementation, in which pairs of loss-of-function variants restore near wild-type protein function. Using mutational scanning in yeast, we identify thousands of such interactions in a clinically important enzyme, human argininosuccinate lyase (ASL). Restoration of protein function is not due to the biochemical properties of the substituted amino acids, but rather to a structural feature of the protein, the active site assembly. We develop a machine learning algorithm that uses protein language model embeddings to predict intragenic complementation in ASL with 99.6% accuracy. Additionally, the model trained on ASL generalizes to a structurally related but sequence-divergent enzyme, fumarase, with accuracy over 90%. Our findings reveal a structural basis for this form of epistasis and provide a predictive framework that could extend to at least 4% of human proteins.

Department

Pediatrics

Share

COinS