Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium
Cysticercosis remains a major neglected tropical disease of humanity in many regions, especially in sub-Saharan Africa, Central America and elsewhere. Owing to the emerging drug resistance and the inability of current drugs to prevent re-infection, identification of novel vaccines and chemotherapeutic agents against Taenia solium and related helminth pathogens is a public health priority. The T. solium genome and the predicted proteome were reported recently, providing a wealth of information from which new interventional targets might be identified. In order to characterize and classify the entire repertoire of protease-encoding genes of T. solium, which act fundamental biological roles in all life processes, we analyzed the predicted proteins of this cestode through a combination of bioinformatics tools. Functional annotation was performed to yield insights into the signaling processes relevant to the complex developmental cycle of this tapeworm and to highlight a suite of the proteases as potential intervention targets.
Within the genome of this helminth parasite, we identified 200 open reading frames encoding proteases from five clans, which correspond to 1.68% of the 11,902 protein-encoding genes predicted to be present in its genome. These proteases include calpains, cytosolic, mitochondrial signal peptidases, ubiquitylation related proteins, and others. Many not only show significant similarity to proteases in the Conserved Domain Database but have conserved active sites and catalytic domains. KEGG Automatic Annotation Server (KAAS) analysis indicated that ~60% of these proteases share strong sequence identities with proteins of the KEGG database, which are involved in human disease, metabolic pathways, genetic information processes, cellular processes, environmental information processes and organismal systems. Also, we identified signal peptides and transmembrane helices through comparative analysis with classes of important regulatory proteases. Phylogenetic analysis using Bayes approach provided support for inferring functional divergence among regulatory cysteine and serine proteases.
Numerous putative proteases were identified for the first time in T. solium, and important regulatory proteases have been predicted. This comprehensive analysis not only complements the growing knowledge base of proteolytic enzymes, but also provides a platform from which to expand knowledge of cestode proteases and to explore their biochemistry and potential as intervention targets.
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.
Yan, H., Lou, Z., Li, L., Brindley, P.J., Zheng, Y. et al. (2014). Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium. BMC Genomics, 15:428.
Sequences of Taenia solium proteases sequences that have significant similarity and active site to known proteases
KAAS analysis KEGG pathway assignment and KEGG orthology.doc (44 kB)
KAAS analysis: KEGG pathway assignment and KEGG orthology number (KO number) for Taenia solium proteases
C1_S1 family catalytic residues - active sites shown in black of blue.doc (63 kB)
C1_S1 family catalytic residues - active sites shown in black of blue
Reproduced with permission of BMC Genomics.