HarvardX: PH525x: Data Analysis for Genomics

School: HarvardX
Course Code: PH525x
Classes Start: 7 Apr 2014
Estimated effort: 2 to 4 hours/week

Prerequisites:

PH207x, programming skills, basic familiarity with the R programming language, readiness self-assessment. 

We will not cover: population genetics, comparative genomics, sequence alignment...

see more...
Data Analysis for Genomics

Data Analysis for Genomics

Data Analysis for Genomics will teach students how to harness the wealth of genomics data arising from new technologies, such as microarrays and next generation sequencing, in order to answer biological questions, both for basic cell biology and clinical applications.

About this Course

The purpose of this course is to enable students to analyze and interpret data generated by modern genomics technology, specifically microarray data and next generation sequencing data. We will focus on applications common in public health and biomedical research: measuring gene expression differences between populations, associated genomic variants to disease, measuring epigenetic marks such as DNA methylation, and transcription factor binding sites.

The course covers the necessary statistical concepts needed to properly design experiments and analyze the high dimensional data produced by these technologies. These include estimation, hypothesis testing, multiple comparison corrections, modeling, linear models, principle component analysis, clustering, nonparametric and Bayesian techniques. Along the way, students will learn to analyze data using the R programming language and several packages from the Bioconductor project.

Currently, biomedical research groups around the world are producing more data than they can handle. The training and skills acquired by taking this course will be of significant practical use for these groups. The learning that will take place in this course will allow for greater success in making biological discoveries and improving individual and population health.

Before your course starts, try the new edX Demo where you can explore the fun, interactive learning environment and virtual labs. Learn more.

Ways to take this edX course:

Simply Audit this Course

Audit this course for free and have complete access to all of the course material, tests, and the online discussion forum. You decide what and how much you want to do.

or

Pursue a Verified Certificate of Achievement

Plan to use your completed coursework for job applications, promotions or school applications? Then you may prefer to work towards a verified Certificate of Achievement to document your accomplishment.

Course Staff

  • Rafael Irizarry

    Rafael Irizarry

    Dr. Irizarry received his bachelor’s in mathematics in 1993 from the University of Puerto Rico and his Ph.D. in statistics in 1998 from the University of California, Berkeley. He joined the faculty of the Department of Biostatistics in the Bloomberg School of Public Health in 1998 and was promoted to Professor in 2007. He is now Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute and a Professor of Biostatistics at Harvard School of Public Health. Dr. Irizarry has worked on the analysis and pre-processing of microarray, next-generation sequencing, and genomic data, and is currently interested translational work, developing diagnostic tools and discovering biomarkers. Dr. Irizarry is one of the founders of the Bioconductor Project, an open source and open development software project for the analysis of genomic data.

  • Michael Love

    Michael Love

    Michael Love is a postdoctoral fellow with Dr. Irizarry in the Department of Biostatistics at the Dana Farber Cancer Institute and Harvard School of Public Health. Dr. Love received his bachelor’s in mathematics in 2005 from Stanford University, his master’s in statistics in 2010 from Stanford University, and his Ph.D. in Computational Biology in 2013 from the Department of Mathematics and Computer Science of the Freie Universität Berlin. His research focuses on inferring biologically meaningful patterns from high-throughput sequencing read counts. Dr. Love develops open-source statistical software for the analysis of exome sequencing and RNA sequencing experiments for the Bioconductor Project

Prerequisites

PH207x, programming skills, basic familiarity with the R programming language, readiness self-assessment. 

We will not cover: population genetics, comparative genomics, sequence alignment algorithms, systems biology, genome assembly, Python, or Perl.