Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Authors

Daniel Taliun, University of Michigan School of Public Health
Daniel N. Harris, University of Maryland School of Medicine
Michael D. Kessler, University of Maryland School of Medicine
Jedidiah Carlson, University of Michigan, Ann Arbor
Zachary A. Szpiech, Pennsylvania State University
Raul Torres, University of California, San Francisco
Sarah A.Gagliano Taliun, University of Michigan School of Public Health
André Corvelo, New York Genome Center
Stephanie M. Gogarten, University of Washington, Seattle
Hyun Min Kang, University of Michigan School of Public Health
Achilleas N. Pitsillides, School of Public Health
Jonathon LeFaive, University of Michigan School of Public Health
Seung been Lee, University of Washington, Seattle
Xiaowen Tian, University of Washington, Seattle
Brian L. Browning, University of Washington, Seattle
Sayantan Das, University of Michigan School of Public Health
Anne Katrin Emde, New York Genome Center
Wayne E. Clarke, New York Genome Center
Douglas P. Loesch, University of Maryland School of Medicine
Amol C. Shetty, University of Maryland School of Medicine
Thomas W. Blackwell, University of Michigan School of Public Health
Albert V. Smith, University of Michigan School of Public Health
Quenna Wong, University of Washington, Seattle
Xiaoming Liu, University of South Florida Health
Matthew P. Conomos, University of Washington, Seattle
Dean M. Bobo, Icahn School of Medicine at Mount Sinai
François Aguet, Broad Institute
Christine Albert, Massachusetts General Hospital
Alvaro Alonso, Rollins School of Public Health
Kristin G. Ardlie, Broad Institute
Dan E. Arking, Johns Hopkins School of Medicine
Stella Aslibekyan, The University of Alabama at Birmingham

Document Type

Journal Article

Publication Date

2-11-2021

Journal

Nature

Volume

590

Issue

7845

DOI

10.1038/s41586-021-03205-y

Abstract

© 2021, The Author(s). The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Share

COinS