Biostatistics and Bioinformatics Faculty Publications

Machine learning approaches to predict lupus disease activity from gene expression data

Brian Kegerreis, RILITE Research Institute and AMPEL BioSolutions
Michelle D. Catalina, RILITE Research Institute and AMPEL BioSolutions
Prathyusha Bachali, RILITE Research Institute and AMPEL BioSolutions
Nicholas S. Geraci, RILITE Research Institute and AMPEL BioSolutions
Adam C. Labonte, RILITE Research Institute and AMPEL BioSolutions
Chen Zeng, The George Washington University
Nathaniel Stearrett, Milken Institute School of Public Health
Keith A. Crandall, Milken Institute School of Public Health
Peter E. Lipsky, RILITE Research Institute and AMPEL BioSolutions
Amrie C. Grammer, RILITE Research Institute and AMPEL BioSolutions

Document Type

Journal Article

Publication Date

12-1-2019

Journal

Scientific Reports

Volume

Issue

DOI

10.1038/s41598-019-45989-0

Abstract

The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.

Recommended Citation

Kegerreis, B., Catalina, M., Bachali, P., Geraci, N., Labonte, A., Zeng, C., Stearrett, N., Crandall, K., Lipsky, P., & Grammer, A. (2019). Machine learning approaches to predict lupus disease activity from gene expression data. Scientific Reports, 9 (1). http://dx.doi.org/10.1038/s41598-019-45989-0

This document is currently not available here.

COinS

Biostatistics and Bioinformatics Faculty Publications

Machine learning approaches to predict lupus disease activity from gene expression data

Document Type

Publication Date

Journal

Volume

Issue

DOI

Abstract

Recommended Citation

Search

Browse

Author Corner

Links

Biostatistics and Bioinformatics Faculty Publications

Machine learning approaches to predict lupus disease activity from gene expression data

Authors

Document Type

Publication Date

Journal

Volume

Issue

DOI

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Links