Pathology Faculty Publications

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets

Yinglei Lai, The George Washington University
Fanni Zhang, The George Washington University
Tapan K. Nayak, The George Washington University
Reza Modarres, The George Washington University
Norman H. Lee, George Washington University Medical Center
Timothy A. McCaffrey, George Washington University Medical Center

Document Type

Journal Article

Publication Date

12-1-2017

Journal

Bioinformatics

Volume

Issue

DOI

10.1093/bioinformatics/btx061

Abstract

© The Author 2017. Published by Oxford University Press. All rights reserved. Motivation: We have proposed a mixture model based approach to the concordant integrative analysis of multiple large-scale two-sample expression datasets. Since the mixture model is based on the transformed differential expression test P-values (z-scores), it is generally applicable to the expression data generated by either microarray or RNA-seq platforms. The mixture model is simple with three normal distribution components for each dataset to represent down-regulation, up-regulation and no differential expression. However, when the number of datasets increases, the model parameter space increases exponentially due to the component combination from different datasets. Results: In this study, motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we focus on the concordant components and assume that the proportions of non-concordant components follow a special structure. We discuss the exchangeable, multiset coefficient and autoregressive structures for model reduction, and their related expectation-maximization (EM) algorithms. Then, the parameter space is linear with the number of datasets. In our previous study, we have applied the general mixture model to three microarray datasets for lung cancer studies. We show that more gene sets (or pathways) can be detected by the reduced mixture model with the exchangeable structure. Furthermore, we show that more genes can also be detected by the reduced model. The Cancer Genome Atlas (TCGA) data have been increasingly collected. The advantage of incorporating the concordance feature has also been clearly demonstrated based on TCGA RNA sequencing data for studying two closely related types of cancer.

APA Citation

Lai, Y., Zhang, F., Nayak, T., Modarres, R., Lee, N., & McCaffrey, T. (2017). An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets. Bioinformatics, 33 (23). http://dx.doi.org/10.1093/bioinformatics/btx061

This document is currently not available here.

COinS

Pathology Faculty Publications

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets

Document Type

Publication Date

Journal

Volume

Issue

DOI

Abstract

APA Citation

Search

Browse

Author Corner

Links

Pathology Faculty Publications

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets

Authors

Document Type

Publication Date

Journal

Volume

Issue

DOI

Abstract

APA Citation

Share

Search

Browse

Author Corner

Links