Article number 262
The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.
Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.
Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at:http://sourceforge.net/projects/pathoscope/ webcite.
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.
Byrd, A.L., Perez-Rogers, J.F., Manimaran, S., Castro-Nallar, E., Toma, I. et al. (2014). Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data. BMC Bioinformatics, 15:262.
Workflow employed to develop the Clinical PathoScope pipeline
Viral genomes with human ribosomal RNA contamination.txt (1 kB)
Viral genomes with human ribosomal RNA contamination
Simulated data summary & code.xlsx (37 kB)
Simulated data summary & code
Alignment optimization variables and methods.pdf (75 kB)
Alignment optimization variables and methods
Commands and versions of alignment algorithms evaluated.docx (21 kB)
Commands and versions of alignment algorithms evaluated
Results of all alignment runs.xlsx (40 kB)
Results of all alignment runs
Subtraction and filtration optimization methods.pdf (39 kB)
Subtraction and filtration optimization methods
Overview of clinical datasets used to evaluate Clinical PathoScope.xlsx (10 kB)
Overview of clinical datasets used to evaluate Clinical PathoScope
List of candidate primers and adapters used for quality control filtering.txt (1 kB)
List of candidate primers and adapters used for quality control filtering
Phylogeny of 16S genes for genera found in clinical samples.pdf (176 kB)
Phylogeny of 16S genes for genera found in clinical samples
Read coverage for 16S genes and nearest phylogenetic neighbors.pdf (1666 kB)
Read coverage for 16S genes and nearest phylogenetic neighbors.