Medicine Faculty Publications

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Yinglei Lai, The George Washington University
Fanni Zhang, The George Washington University
Tapan K. Nayak, The George Washington University
Reza Modarres, The George Washington University
Norman H. Lee, George Washington University Medical Center
Timothy A. McCaffrey, George Washington University Medical Center

Document Type

Journal Article

Publication Date

1-24-2014

Journal

BMC Genomics

Volume

DOI

10.1186/1471-2164-15-S1-S6

Abstract

© 2014 Lai et al. Background: Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Methods: We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. Results: We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. Conclusions: This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

APA Citation

Lai, Y., Zhang, F., Nayak, T., Modarres, R., Lee, N., & McCaffrey, T. (2014). Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets. BMC Genomics, 15 (). http://dx.doi.org/10.1186/1471-2164-15-S1-S6

Link to Full Text

COinS

Medicine Faculty Publications

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Document Type

Publication Date

Journal

Volume

DOI

Abstract

APA Citation

Search

Browse

Author Corner

Links

Medicine Faculty Publications

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

Authors

Document Type

Publication Date

Journal

Volume

DOI

Abstract

APA Citation

Share

Search

Browse

Author Corner

Links