AESurv: autoencoder survival analysis for accurate early prediction of coronary heart disease

Authors

Yike Shen, Department of Earth and Environmental Sciences, University of Texas at Arlington, 500 Yates Street, Arlington, TX, 76019, USA.
Arce Domingo-Relloso, Department of Chronic Diseases Epidemiology, National Center for Epidemiology, Carlos III Health Institute, C. de Melchor Fernández Almagro, 5, Fuencarral-El Pardo, 5, Madrid, 28029, Spain.
Allison Kupsco, Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 West 168th Street, New York, NY, 10032, USA.
Marianthi-Anna Kioumourtzoglou, Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 West 168th Street, New York, NY, 10032, USA.
Maria Tellez-Plaza, Department of Chronic Diseases Epidemiology, National Center for Epidemiology, Carlos III Health Institute, C. de Melchor Fernández Almagro, 5, Fuencarral-El Pardo, 5, Madrid, 28029, Spain.
Jason G. Umans, Department of Medicine, Georgetown-Howard Universities Center for Clinical and Translational Science, 4000 Reservoir Road NW, Washington, DC, 20007, USA.
Amanda M. Fretts, Department of Epidemiology, University of Washington, 3980 15th Ave NE, Seattle, WA, 98195, USA.
Ying Zhang, Center for American Indian Health Research, Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, 801 N.E. 13th Street, Oklahoma City, OK, 73104, USA.
Peter F. Schnatz, Department of OB/GYN and Internal Medicine, Reading Hospital/Tower Health & Drexel University, 301 S 7th Ave, West Reading, PA, 19611, USA.
Ramon Casanova, Department of Biostatistics and Data Science, Wake Forest University School of Medicine, 475 Vine St, Winston Salem, NC, 27101, USA.
Lisa Warsinger Martin, Department of Medicine, Division of Cardiology, George Washington University, 2300 Eye Street, NW, Washington, DC, 20037, USA.
Steve Horvath, Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles (UCLA), 695 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
JoAnn E. Manson, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 900 Commonwealth Ave, Boston, MA, 02215, USA.
Shelley A. Cole, Population Health Program, Texas Biomedical Research Institute, 8715 W. Military Dr., San Antonio, TX, 78227, USA.
Haotian Wu, Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 West 168th Street, New York, NY, 10032, USA.
Eric A. Whitsel, Department of Epidemiology, Gillings School of Global Public Health and Department of Medicine, School of Medicine, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA.
Andrea A. Baccarelli, Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 West 168th Street, New York, NY, 10032, USA.
Ana Navas-Acien, Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, 722 West 168th Street, New York, NY, 10032, USA.
Feng Gao, Department of Environmental Health Sciences, Fielding School of Public Health, University of California Los Angeles (UCLA), 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.

Document Type

Journal Article

Publication Date

9-23-2024

Journal

Briefings in bioinformatics

Volume

25

Issue

6

DOI

10.1093/bib/bbae479

Keywords

autoencoder survival analysis; cohort studies; coronary heart disease; deep learning; epigenetics

Abstract

Coronary heart disease (CHD) is one of the leading causes of mortality and morbidity in the United States. Accurate time-to-event CHD prediction models with high-dimensional DNA methylation and clinical features may assist with early prediction and intervention strategies. We developed a state-of-the-art deep learning autoencoder survival analysis model (AESurv) to effectively analyze high-dimensional blood DNA methylation features and traditional clinical risk factors by learning low-dimensional representation of participants for time-to-event CHD prediction. We demonstrated the utility of our model in two cohort studies: the Strong Heart Study cohort (SHS), a prospective cohort studying cardiovascular disease and its risk factors among American Indians adults; the Women's Health Initiative (WHI), a prospective cohort study including randomized clinical trials and observational study to improve postmenopausal women's health with one of the main focuses on cardiovascular disease. Our AESurv model effectively learned participant representations in low-dimensional latent space and achieved better model performance (concordance index-C index of 0.864 ± 0.009 and time-to-event mean area under the receiver operating characteristic curve-AUROC of 0.905 ± 0.009) than other survival analysis models (Cox proportional hazard, Cox proportional hazard deep neural network survival analysis, random survival forest, and gradient boosting survival analysis models) in the SHS. We further validated the AESurv model in WHI and also achieved the best model performance. The AESurv model can be used for accurate CHD prediction and assist health care professionals and patients to perform early intervention strategies. We suggest using AESurv model for future time-to-event CHD prediction based on DNA methylation features.

Department

Medicine

Share

COinS