Prerequisites

First courses in statistics, linear algebra, and computing.

About this course

Skip About this course

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.

The lectures cover all the material in An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). The pdf for this book is available for free on the book website.

What you'll learn

Skip What you'll learn
  • Overview of statistical learning
  • Linear regression
  • Classification
  • Resampling methods
  • Linear model selection and regularization
  • Moving beyond linearity
  • Tree-based methods
  • Support vector machines
  • Unsupervised learning

Meet your instructors

Trevor Hastie
John A. Overdeck Professor, Professor of Statistics and of Biomedical Data Sciences
Stanford University
Robert Tibshirani
Professor of Biomedical Data Science and Statistics
Stanford University

Pursue a Verified Certificate to highlight the knowledge and skills you gain $50.00

View a PDF of a sample edX certificate
  • Official and Verified

    Receive an instructor-signed certificate with the institution's logo to verify your achievement and increase your job prospects

  • Easily Shareable

    Add the certificate to your CV or resume, or post it directly on LinkedIn

  • Proven Motivator

    Give yourself an additional incentive to complete the course

  • Support our Mission

    EdX, a non-profit, relies on verified certificates to help fund free education for everyone globally

Frequently asked questions

Do I need to buy a textbook?

No, a free online version of An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013) is available from that website. Springer has agreed to this, so no need to worry about copyright. Of course you may not distribiute printed versions of this pdf file.

Is R and RStudio available for free.

Yes. You get R for free from http://cran.us.r-project.org/. Typically it installs with a click. You get RStudio from http://www.rstudio.com/, also for free, and a similarly easy install.

How many hours of effort are expected per week?

We anticipate it will take approximately 3-5 hours per week to go through the materials and exercises in each section.