Skip to main content

Data Science and Machine Learning Capstone Project

Provided by IBM
See prerequisites
3–4 hours
per week, for 6 weeks

$99 USD for graded exams and assignments, plus a certificate

Create a project that you can use to showcase your Data Science skills to prospective employers. Apply various data science and machine learning techniques to analyze and visualize a data set involving a real life business scenario and build a predictive model.

Before you start

Course opens: Jan 28, 2019
Course ends: Jan 15, 2020

What you will learn

  • Demonstrate knowledge of Data Science and Machine Learning
  • Apply Data Science process to a real life scenario
  • Explore New York City - 311 Complaints and Housing datasets
  • Analyze and Visualize data using Python
  • Perform feature engineering exercise using Python
  • Build and validate predictive machine learning model using Python
  • Create and share Actionable Insights to real life data problems


Employers really care about how well can you apply your knowledge and skills to solve real world problems. Now that you've taken several courses on Data Science and Machine Learning, its time to put your learning to practice and work on a data problem involving a real life scenario.

New Yorkers use 311 system to report complaints for the non-emergency problems they face. Various agencies in New York get assigned to these problems. The data related to these Complaints are available in New York City Open Dataset. On investigation one can see that in last few years the 311 complaints coming to The Department of Housing Preservation and Development in New York City has increased significantly.

In this Capstone project your task would be to find out answers to some questions that would help The Department of Housing Preservation and Development in New York City to effectively tackle 311 complaints coming to them. You need to use Python and Data Science and Machine Learning techniques such as Data Ingestion, Data Exploration, Data Visualization, Feature Engineering, Probabilistic Modeling, Model Validation, etc.

By the end of this course you will have used real world Data Science tools to create a showcase project and demostrate to employers that you are job ready and a worthy candidate in the field of Data Science.

Meet your instructors

Sourav Mazumder
Data Science Thought Leader
Linda Liu
Data Science Architect & Evangelist

Who can take this course?

Unfortunately, learners from one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. EdX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

View Courses
This course is part of:

Earn a Professional Certificate in 2-4 months if courses are taken one at a time.

View the program
  1. 10–20 hours of effort

    In this course, you will learn how to analyze data in Python using multi-dimensional arrays in numpy, manipulate DataFrames in pandas, use SciPy library of mathematical routines, and perform machine learning using scikit-learn!

  2. Data Science and Machine Learning Capstone Project
  3. 10–20 hours of effort

    Data visualization is the graphical representation of data in order to interactively and efficiently convey insights to clients, customers, and stakeholders in general.

  4. 20–30 hours of effort

    Machine Learning can be an incredibly beneficial tool to uncover hidden insights and predict future trends. This Machine Learning with Python course will give you all the tools you need to get started with supervised and unsupervised learning.

  5. 2–5 hours of effort

    This Python course provides a beginner-friendly introduction to Python for Data Science. Practice through lab exercises, and you'll be ready to create your first Python scripts on your own!

Get started in data analysis & statistics

Browse over 200 data analysis & statistics courses
Of all edX learners:
73% are employed
Of all edX learners:
45% have children
Based on internal survey results
407,323 people are learning on edX today