Batch effects removal for microbiome data via conditional quantile regression

Authors

Wodan Ling, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
Jiuyao Lu, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, 21205, Baltimore, USA.
Ni Zhao, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, 21205, Baltimore, USA. nzhao10@jhu.edu.
Anju Lulla, Nutrition Research Institute and Department of Nutrition, University of North Carolina, 500 Laureate Way, 28081, Kannapolis, USA.
Anna M. Plantinga, Department of Mathematics and Statistics, Williams College, 18 Hoxsey St, 01267, Williamstown, USA.
Weijia Fu, Department of Biostatistics, School of Public Health, University of Washington, 1705 NE Pacific St, 98195, Seattle, USA.
Angela Zhang, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
Hongjiao Liu, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
Hoseung Song, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
Zhigang Li, Department of Biostatistics, College of Public Health & Health Professions, College of Medicine, University of Florida, 2004 Mowry Rd, 32611, Gainesville, USA.
Jun Chen, Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, 55905, Rochester, USA.
Timothy W. Randolph, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA.
Wei Li Koay, Children's National Hospital, 111 Michigan Ave NW, 20010, Washington DC, USA.
James R. White, Resphera Biosciences, 1529 Lancaster St, 21231, Baltimore, USA.
Lenore J. Launer, Laboratory of Epidemiology and Population Science, NIA, NIH, 7201 Wisconsin Ave, 20814, Bethesda, USA.
Anthony A. Fodor, Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, 28223, Charlotte, USA.
Katie A. Meyer, Nutrition Research Institute and Department of Nutrition, University of North Carolina, 500 Laureate Way, 28081, Kannapolis, USA.
Michael C. Wu, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, 98109, Seattle, USA. mcwu@fredhutch.org.

Document Type

Journal Article

Publication Date

9-15-2022

Journal

Nature communications

Volume

13

Issue

1

DOI

10.1038/s41467-022-33071-9

Abstract

Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batch-removed zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.

Department

Pediatrics

Share

COinS