• Length:
    14 Weeks
  • Effort:
    8–10 hours per week
  • Price:

    Add a Verified Certificate for $99 USD

  • Institution
  • Subject:
  • Level:
  • Language:
  • Video Transcript:


  • Machine learning and data mining concepts
  • Proficient programming and system skills in Scala, Python and Java
  • Proficient knowledge and experience in with data and understand the ETL process

About this course

Skip About this course

Data science plays an important role in many industries. In facing massive amounts of heterogeneous data, scalable machine learning and data mining algorithms and systems have become extremely important for data scientists. The growth of volume, complexity and speed in data drives the need for scalable data analytic algorithms and systems.

In this course, we study such algorithms and systems in the context of healthcare applications.

In healthcare, large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). This data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment.

In this course, we introduce the characteristics of medical data and associated data mining challenges in dealing with such data. We cover various algorithms and systems for big data analytics. We focus on studying those big data techniques in the context of concrete healthcare analytic applications such as predictive modeling, computational phenotyping and patient similarity.

What you'll learn

Skip What you'll learn
  • Understand health data and big data analytic technology;
  • Health data standards;
  • Scalable machine learning algorithms such as online learning and fast similarity search;
  • Big data analytic systems such as Hadoop family (Hive, Pig, HBase), Spark and Graph DB;
  • Deep learning models and packages such as tensorflow.

Week 1: Intro to Big Data Analytics/Course Overview
Week 2: Predictive Modeling
Week 3: MapReduce
Week 4/5: Classification evaluation metrics/ Classification ensemble methods/ Phenotyping & Clustering
Week 6: Spark
Week 7: Medical ontology
Week 8: Graph analysis
Week 9: Dimensionality Reduction
Week 10: Patient similairty
Week 11: AWS
Week 12: AZURE
Week 13: Peer Review for Draft
Week 14: Final Project (code+presentation+ final paper)
Week 15: Final Exam Week

Meet your instructors

Jimeng Sun
Associate Professor
Georgia Institute of Technology

Pursue a Verified Certificate to highlight the knowledge and skills you gain $99.00

View a PDF of a sample edX certificate
  • Official and Verified

    Receive an instructor-signed certificate with the institution's logo to verify your achievement and increase your job prospects

  • Easily Shareable

    Add the certificate to your CV or resume, or post it directly on LinkedIn

  • Proven Motivator

    Give yourself an additional incentive to complete the course

  • Support our Mission

    EdX, a non-profit, relies on verified certificates to help fund free education for everyone globally