School of Medicine and Health Sciences Poster Presentations
Poster Number
266
Document Type
Poster
Keywords
HTS, quasispecies
Publication Date
4-2017
Abstract
The high level of genetic variability of Human Immunodeficiency Virus type 1 (HIV-1) is caused by the low fidelity of its replication machinery. This leads to evolution of swarm-like viral populations often described as quasispecies. High throughput sequencing (HTS) technology provides higher resolution over Sanger sequencing, enabling detection of low frequency variant genomes. However, quasispecies analysis is still a challenge due to the systematic noise, introduced by HTS technology. This leads to the increase in type I errors (also known as false positives) and the underlying genetic diversity, which can lead to mathematically insolvable type II errors (also known as false negatives). We have developed a pipeline using the tools in the High-performance Integrated Virtual Environment (HIVE), an HTS platform designed for big data analysis and management, to analyze viral populations within each sample and identify their subtype classification and recombination patterns of recombinants. RNA was extracted from 70 plasma samples of chronic HIV-1 infected patients. The 3’ half genomes of HIV-1 were amplified using RT-PCR and PCR products were sequenced using Illumina MiSeq. The paired end reads for each sample were assembled using Geneious software and analyzed for presence of HIV-1 quasispecies using HIVE tools. Subtype analysis of 70 samples using Geneious software identified 17 A1s, 4 Bs, 30 Cs, 1 D, 6 CRF02_AG, and 12 unique recombinant forms (URFs). Additionally, we found up to 178 ambiguous bases in the consensus sequences from 41 viral samples (58.6%), suggesting the presence of viral subpopulations. However, Geneious could not determine the major viral populations in each sample. We analyzed the same HTS reads using the HIV-1 quasispecies analysis pipeline and found one predominant population in 11 samples (15.7 %), two to ten distinct populations in 45 samples (64.3%), 11-20 in 13 samples (18.16%), and 26 in one sample (1.4 %). Interestingly, two equally major viral populations that were not detected by Geneious were identified in five samples (7.1%) by HIVE. The HIV-1 quasispecies analysis pipeline is reliable and more sensitive in its ability to identify distinct viral populations and the recombination patterns not identified by the Geneious software.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Open Access
1
Included in
Biochemistry, Biophysics, and Structural Biology Commons, Bioinformatics Commons, Virus Diseases Commons
Analysis of HIV-1 quasispecies sequences generated by High Throughput Sequencing (HTS) using HIVE
The high level of genetic variability of Human Immunodeficiency Virus type 1 (HIV-1) is caused by the low fidelity of its replication machinery. This leads to evolution of swarm-like viral populations often described as quasispecies. High throughput sequencing (HTS) technology provides higher resolution over Sanger sequencing, enabling detection of low frequency variant genomes. However, quasispecies analysis is still a challenge due to the systematic noise, introduced by HTS technology. This leads to the increase in type I errors (also known as false positives) and the underlying genetic diversity, which can lead to mathematically insolvable type II errors (also known as false negatives). We have developed a pipeline using the tools in the High-performance Integrated Virtual Environment (HIVE), an HTS platform designed for big data analysis and management, to analyze viral populations within each sample and identify their subtype classification and recombination patterns of recombinants. RNA was extracted from 70 plasma samples of chronic HIV-1 infected patients. The 3’ half genomes of HIV-1 were amplified using RT-PCR and PCR products were sequenced using Illumina MiSeq. The paired end reads for each sample were assembled using Geneious software and analyzed for presence of HIV-1 quasispecies using HIVE tools. Subtype analysis of 70 samples using Geneious software identified 17 A1s, 4 Bs, 30 Cs, 1 D, 6 CRF02_AG, and 12 unique recombinant forms (URFs). Additionally, we found up to 178 ambiguous bases in the consensus sequences from 41 viral samples (58.6%), suggesting the presence of viral subpopulations. However, Geneious could not determine the major viral populations in each sample. We analyzed the same HTS reads using the HIV-1 quasispecies analysis pipeline and found one predominant population in 11 samples (15.7 %), two to ten distinct populations in 45 samples (64.3%), 11-20 in 13 samples (18.16%), and 26 in one sample (1.4 %). Interestingly, two equally major viral populations that were not detected by Geneious were identified in five samples (7.1%) by HIVE. The HIV-1 quasispecies analysis pipeline is reliable and more sensitive in its ability to identify distinct viral populations and the recombination patterns not identified by the Geneious software.
Comments
Poster presented at GW Annual Research Days 2017.