Measuring Gene Expression With Next Generation Sequencing Technology
While a PhD student in Dr. Gabor Marth's laboratory, I have had primary responsibility for two projects focused on using RNA-Seq to measure differential gene expression. In the first project we used RNA-Seq to identify differentially expressed genes in four yeast species and I analyzed the findings in terms of the evolution of gene expression. In this experiment, gene expression was measured using two biological replicates of each species of yeast. While we had several interesting biological findings, during the analysis we dealt with several statistical issues that were caused by the experiment's low number of replicates. The cost of sequencing has decreased rapidly since this experiment's design and many of these statistical issues can now practically be avoided by sequencing a greater number of samples. However, there is little guidance in the literature as to how to intelligently design an RNA-Seq experiment in terms of the number of replicates that are required and how deeply each replicate must be sequenced. My second project, therefore, was to develop Scotty, a web-based program that allows users to perform power analysis for RNA-Seq experiments. The yeast project resulted in a highly accessed first author publication in BMC Genomics in 2011. I have structured my dissertation as follows: The first chapter, entitled General Issues in RNA-Seq, is intended to synthesize the themes and issues of RNA-Seq statistical analysis that were common to both papers. In this section, I have discussed the main findings from the two papers as they relate to analyzing RNA-Seq data. Like the Scotty application, this section is designed to be "used" by wet-lab biologists who have a limited background in statistics. While some background in statistics would be required to fully understand the following chapters, the essence of this background can be gained by reading this first chapter. The second and third chapters contain the two papers that resulted from the two RNA-Seq projects. Each chapter contains both the original manuscript and original supplementary methods and data section. Finally, I include brief summaries of my contributions to the two papers on which I was a middle author. The first was a functional analysis of the genomic regions affected by mobile element insertions as a part of Chip Stewart's paper with the 1000 Genome Consortium. This paper was published in Plos Genetics. The second was a cluster analysis of microarray gene expression in Toxoplasma gondii, which was included as part of Alexander Lorestani et al.'s paper, Targeted proteomic dissection of Toxoplasma cytoskeleton sub-compartments using MORN1. This paper is currently under review. The yeast project was a collaborative effort between Jesse Gray, Michael Springer, and Allen Costa at Harvard Medical School, Jeffery Chuang here at Boston College, and members of the Marth lab. Jesse Gray conceived of the project. While I wrote the draft for the manuscript, many people, particularly Gabor Marth, provided substantial guidance on the actual text. I conceived of and implemented Scotty and wrote its manuscript with only editorial assistance from my co-authors. I produced all figures for the two manuscripts. Chip Stewart provided extensive guidance and mentorship to me on all aspects of my statistical analyses for both projects.