Skip to main content

IBM: Data Engineering Capstone Project

This Capstone Project is designed for you to apply and demonstrate your Data Engineering skills and knowledge in SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools and Big Data.

Data Engineering Capstone Project
6 weeks
2–3 hours per week
Progress at your own speed
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Dec 4

About this course

Skip About this course

In this Capstone you’ll demonstrate your ability to perform like a Data Engineer. Your mission is to design, implement, and manage a complete data and analytics platform consisting of relational and non-relational databases, data warehouses, data pipelines, big data processing engines, and Business Intelligence (BI) tools.

This Capstone project will require that you apply and sharpen the skills and knowledge you developed in the various courses in the IBM Data Engineering Professional Certificate and utilize multiple tools and technologies to design databases, collect data from multiple sources, extract, transform and load data into a data warehouse, and utilize a cloud-based BI tool to create analytic reports and visualizations. You will also implement predictive analytics and machine learning models using big data tools and techniques.

This capstone requires significant amount of hands-on lab effort throughout the course. You’ll exhibit your knowledge and proficiency working with Python, Bash scripts, SQL, NoSQL, RDBMSes, ETL, MySQL, PostgreSQL, Db2, MongoDB, Apache Airflow, Apache Spark, and Cognos Analytics.

Upon successfully completing this Capstone, you should have the confidence and portfolio to take on real-world data engineering projects and showcase your abilities to perform as an entry-level data engineer.

At a glance

  • Language: English
  • Video Transcript: English
  • Associated skills: Extract Transform Load (ETL), Data Collection, Bash (Scripting Language), MongoDB, Apache Airflow, PostgreSQL, NoSQL, Python (Programming Language), IBM DB2, Machine Learning, Data Engineering, Business Intelligence, MySQL, Apache Spark, Relational Databases, Data Warehousing, Big Data, Relational Database Management Systems, SQL (Programming Language), Predictive Analytics

What you'll learn

Skip What you'll learn
  • Build a complete data and analytics platform.
  • Setup, manage and query relational and NoSQL databases.
  • Create data pipelines and ETL processes using Apache Airflow.
  • Design and populate a star/snowflake schema data warehouse and query it using SQL.
  • Analyze warehouse data using Business Intelligence (BI) tool Cognos Analytics to create reports and dashboards.
  • Deploy a big data machine learning model using Apache Spark.

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

This course is part of Data Engineering Professional Certificate Program

Learn more 
Expert instruction
14 skill-building courses
Progress at your own speed
1 year 2 months
3 - 4 hours per week

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.