Final Projects

Bio331 final projects explore some computational aspect of biological networks. They analyze a real-world biological network (or build a biological network from experimental data) with a significant Python programming component.

Fall 2016

Finding Non-LEE Targets of PerC in E. Coli - Vikram Chan-Herur

Diarrheal disease is a major cause of mortality worldwide; pathogenic Escherichia coli, such as the enteropathogenic strain (EPEC), are a cause of diarrhea. In an effort to predict more non-LEE targets of EPEC's PerC regulatory protein from existing RNA-seq differential gene expression data, these data were merged, as weights, into a recently built E. coli K-12 protein-protein interaction network. A novel node-choosing step was incorporated into an existing greedy clique partitioning algorithm and used to find cliques in the network, with varying consideration given to the gene expression data and the node degree. Gene ontology enrichment of the largest resulting cliques showed differences in enriched terms with varying incorporation of expression data. Some of these changes, such as cell cycle terms, were consistent with experimental observations, while others, such as a lack of fimbrial genes, were opposite experimental and RNA-seq-based expectations. Integrating these transcriptomic and proteomic datasets through graphs remains an important question.

Mission Control: An Open Source Usability Package for GraphSpace in Python - Nick Franzese

GraphSpace is a highly customizable platform for graph visualization with a suite of helpful features. These features offer great potential for a variety of academic uses, but actualization of this potential is dampened by usability issues. Formatting data for visualization with GraphSpace can be daunting for those unfamiliar with coding, and can be a chore even for experienced programmers as each graph requires custom built code to visualize. For this reason I created Mission Control, an open source usability package for GraphSpace in Python. The package aims to significantly lower the usability barrier of GraphSpace while maintaining customizability. In the present paper I detail the user API of the Mission Control package and showcase a few graphs that I was able to visualize quickly and effortlessly through the package. [report.pdf] [GitHub Repo]

Node Stability of Matrilineal Groups in a Killer Whale Social Network - Amy Rose Lazarte

The focus of this project was to explore how each killer whale matriline contributed to the stability of an entire whale social network. Centrality measures are commonly used to rank the importance of individual nodes on a graph, and this project attempted to combine several different centrality measures in order to definitively compare each matrilineal groups' contribution to the stability of the entire network. While the project originally hoped to draw conclusions about common demographics (size, presence of calves, presence of males, etc) found in the most essential matrilines, the nodes of interest did not appear to show any demographic similarities. However, matrilines that interacted outside of their pod added the most stability to the graph.

Parallel Programming with Prim's Algorithm - Erik Lopez

Parallel Programming can be a very useful way to work through big data sets and get results much quicker than had you used a serial implementation of an algorithm. Not only can it be more efficient but it can also push the architecture of your system to the maximum. I will explore what multithreading and multiprocessing python modules can do for us when using them on an embarrassingly parallel problem and then scale up to Prims, a minimum spanning tree graph algorithm. [report.pdf] [GitHub Repo]

Modularity and the Louvain Algorithm - Yasmina Marden

I applied the Louvain algorithm to two datasets and found that node ordering had a notable effect on the outputted clusterings. I then ran simulations with random node order in order to find the most frequently outputted clustering. This clustering, however, was not only not the optimum clustering, according to modularity, but a poor clustering with a significantly lower modularity than the optimum clustering. This poor clustering better-reflected the optimum clustering if it was parsed for its sub-clusters. In order to generate more accurate results from the algorithm without parsing, I then ran more simulations with random node order and recorded the node pairs that appeared most frequently in the same cluster. Creating a clustering from these most frequent node pairings led to increased accuracy, although results deteriorated after approximately 100 simulations.

Grouping Badger Social Networks - Karl Menzel

Understanding badger social networks can be important for understanding the distribution of tuberculosis in the badger populations. To understand these networks, I used interaction data from 51 badgers in multiple setts. I calculated both node and edge flow-betweenness for the network using the Ford-Fulkerson method. I also attempted to cluster the badgers into the defined social groups using the Girvan-Newman method. The clustering did not fully accurately represent the given social groups using either the raw data or the edge flow-betweenness. [report.pdf] [GitHub Repo]

Clique Finding in Tetraselmis Subcordiformis - Eli Spiliotopoulos

In this project a predicted protein-protein interactome was seeded with transcriptome read data. This seeded data was taken and read through a maximal clique algorithm to find maximal cliques that included these known to be higher expressed genes. The goal of this project was to identify higher expressed gene groups through these more related groups of genes within the larger set. [report.pdf] [GitHub Repo]