Biochemistry and Molecular Medicine Faculty Publications

Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

Charles Cole, George Washington University
Konstantinos Krampis, J. Craig Venter Institute, Rockville, MD
Konstantinos Karagiannis, George Washington University
Jonas Almeida, University of Alberta
William J. Faison, George Washington University
Mona Motwani, George Washington UniversityFollow
Quan Wan, George Washington UniversityFollow
Anton Golikov, US Food and Drug Administration, Rockville, MD
Yang Pan, George Washington University
Vahan Simonyan, US Food and Drug Administration, Rockville, MDFollow
Raja Mazumder, George Washington University

Document Type

Journal Article

Publication Date

1-27-2014

Journal

BMC Bioinformatics

Volume

Volume 15

Inclusive Pages

Article number 28

DOI

10.1186/1471-2105-15-28

Keywords

High-Throughput Nucleotide Sequencing--methods; Neoplasms--genetics; Proteome--genetics; Proteomics--methods

Abstract

Background

Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it.

Results

To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (hive.biochemistry.gwu.edu/tools/csr/SRARecords_Curated.php).

Conclusions

Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.

Comments

Reproduced with permission of BioMed Central Bioinformatics.

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

APA Citation

Cole, C., Krampis, K., Karagiannis, K., Almeida, J., Faison, W.J. et al. (2014). Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data. BMC Bioinformatics, 15:28.

Peer Reviewed

Open Access

List of proteins with loss of functional sites due to nsSNVs in cancer case and control samples.xlsx (20 kB)
Table S1: List of proteins with loss of functional sites due to nsSNVs in cancer case and control samples

Download

Included in

Biochemistry, Biophysics, and Structural Biology Commons

COinS

Biochemistry and Molecular Medicine Faculty Publications

Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

Document Type

Publication Date

Journal

Volume

Inclusive Pages

DOI

Keywords

Abstract

Background

Results

Conclusions

Comments

Creative Commons License

APA Citation

Peer Reviewed

Open Access

Included in

Search

Browse

Author Corner

Links

Biochemistry and Molecular Medicine Faculty Publications

Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

Authors

Document Type

Publication Date

Journal

Volume

Inclusive Pages

DOI

Keywords

Abstract

Background

Results

Conclusions

Comments

Creative Commons License

APA Citation

Peer Reviewed

Open Access

Included in

Share

Search

Browse

Author Corner

Links