School of Medicine and Health Sciences Poster Presentations

A Machine Learning Approach to Mapping Co-regulated Variant Loci and Gene Expression over Time

Document Type

Poster

Abstract Category

Basic Biomedical Sciences

Keywords

bioinformatics, statistics, cancer

Publication Date

Spring 5-1-2019

Abstract

We developed a machine learning analysis pipeline to discover functional gene variants by examining the effect of RNA containing single nucleotide variants (SNVs) on gene expression at cis- and trans- genomic locations over time. This reflects a hypothesis of genetic co-regulation where, as the relative presence of a particular variant allele seen in RNA transcription changes over time (due to changing cellular requirements), gene expression elsewhere in the genome is affected as a result . We believe this analysis pipeline can give novel mechanistic insights into a wide range of basic and translational cell biology questions, particularly on the evolution of drug resistance in cancer cells. In order to measure changing cellular requirements in cancer cell lines, we conducted paired-end RNA sequencing on human melanoma cell line WM164 under 4 experimental conditions: with and without histone deacytelase (HDAC) inhibition, and with and without IFN-gamma treatment. This was done over 8 time points each, for a total of 32 samples. We then aligned the RNA-sequencing reads to the human genome and called variants. Our pipeline starts with clustering in Graphia Professional, which uses a graph-based approach to build relationships between genes. Each VAF or gene expression “time-course” groups with another if it meets a minimum Pearson correlation of 0.96. As a result, genes with similar expression trends and VAF trends are connected to each other in a structured graph. We then used a machine learning method called the Markov Cluster algorithm (MCL) to partition the graph into formal clusters by looking for packs of highly interconnected genes. We then built custom R modules to scan through the clusters to find pairs of VAF and gene expression profiles that cluster together in two samples or more. Once found, each pair was categorized as a cis- relationship if they were less than one million base-pairs apart, and trans- otherwise. All 60,963 VAF and gene expression profiles were clustered into 3,730 clusters, where each cluster represents a certain pattern of RNA regulation through time. Using custom R scripts, we discovered 440 co-regulated VAF containing positions and gene expression profiles. Further work will include Protein-Protein Interaction (PPI) analyses to validate findings, especially in a larger data set.

Open Access

1

Comments

Presented at Research Days 2019.

This document is currently not available here.

Share

COinS
 

A Machine Learning Approach to Mapping Co-regulated Variant Loci and Gene Expression over Time

We developed a machine learning analysis pipeline to discover functional gene variants by examining the effect of RNA containing single nucleotide variants (SNVs) on gene expression at cis- and trans- genomic locations over time. This reflects a hypothesis of genetic co-regulation where, as the relative presence of a particular variant allele seen in RNA transcription changes over time (due to changing cellular requirements), gene expression elsewhere in the genome is affected as a result . We believe this analysis pipeline can give novel mechanistic insights into a wide range of basic and translational cell biology questions, particularly on the evolution of drug resistance in cancer cells. In order to measure changing cellular requirements in cancer cell lines, we conducted paired-end RNA sequencing on human melanoma cell line WM164 under 4 experimental conditions: with and without histone deacytelase (HDAC) inhibition, and with and without IFN-gamma treatment. This was done over 8 time points each, for a total of 32 samples. We then aligned the RNA-sequencing reads to the human genome and called variants. Our pipeline starts with clustering in Graphia Professional, which uses a graph-based approach to build relationships between genes. Each VAF or gene expression “time-course” groups with another if it meets a minimum Pearson correlation of 0.96. As a result, genes with similar expression trends and VAF trends are connected to each other in a structured graph. We then used a machine learning method called the Markov Cluster algorithm (MCL) to partition the graph into formal clusters by looking for packs of highly interconnected genes. We then built custom R modules to scan through the clusters to find pairs of VAF and gene expression profiles that cluster together in two samples or more. Once found, each pair was categorized as a cis- relationship if they were less than one million base-pairs apart, and trans- otherwise. All 60,963 VAF and gene expression profiles were clustered into 3,730 clusters, where each cluster represents a certain pattern of RNA regulation through time. Using custom R scripts, we discovered 440 co-regulated VAF containing positions and gene expression profiles. Further work will include Protein-Protein Interaction (PPI) analyses to validate findings, especially in a larger data set.