What you will learn
- Differentiate between the four main categories of NoSQL repositories and work hands-on with MongoDB, Cassandra and IBM Cloudant.
- Apply your knowledge of the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools, including Hadoop, HDFS, Hive and HBase.
- Describe parallel programming using Resilient Distributed Datasets (RDDs), DataFrames and SparkSQL. Understand how Catalyst and Tungsten benefit Spark programmer and see how ETL work using DataFrames.
- Acquire real-world data engineering and machine learning skills using Spark Structured Streaming, DataFrames, GraphFrames, Spark ML, Regression, Classification, and clustering, including the k-means algorithm and ETL using Spark.
- Gain hands-on experience using SparkSQL, Apache Spark on IBM Cloud.
- Learn about scaling out using the IBM Spark Environment in Watson Studio, running Spark on Kubernetes, setting Spark configurations, and performing monitoring and performance tuning.
Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.
Starting with NoSQL Database Basics, this course introduces you to NoSQL fundamentals, including the four key non-relational database categories. By the end of the course, you will have hands-on skills working with MongoDB, Cassandra, and IBM Cloudant NoSQL databases.
A crucial aspect of data engineering is the acquisition and management of Big Data and Big Data Analytics scalability and performance. When you enroll in Big Data, Hadoop, and Spark Basics, you'll discover the characteristics, features, benefits, limitations, and applications of some of the more popular Big Data processing tools. You explore the open-source ecosystem of Apache tools, including Apache Hadoop, Apache Hive, and Apache Spark, including Spark on Kubernetes. Discover how to leverage Spark to deliver reliable insights. You'll gain hands-on data analysis skills using PySpark and Spark SQL and create a streaming analytics application using Spark Streaming, and more.
Then enroll in Apache Spark for Data Engineering and Machine Learning to discover how data and machine learning engineers use Spark Structured Streaming, GraphFrames, Regression, Classification, and clustering. Learn about clustering and how to apply the k-means clustering algorithm using Spark MLlib. Extraction Transformation and Loading, (ETL) is at the heart of data and machine learning engineering, and you'll gain skills using Spark to perform extract, transform and load (ETL) tasks. This course culminates with a hands-on Spark project.
This Professional Certificate does not require any prior programming or data science skills; however, prior basic data literacy and SQL skills will prove valuable in completing this program.
A program subscription gives you full verified access to all courses and materials within the program you’ve enrolled in, for as long as your subscription is active. Monthly subscription pricing can help you manage your enrollment costs — instead of paying more up front, you pay a smaller amount per month for only as long as you need access. You can cancel your subscription at any time for no additional fee.
Courses in this program
IBM's NoSQL, Big Data and Spark Fundamentals Professional Certificate
- 2–3 hours per week, for 5 weeks
This course introduces you to the fundamentals of NoSQL, including the four key non-relational database categories. By the end of the course you will have hands-on skills for working with MongoDB, Cassandra and IBM Cloudant NoSQL databases.
- 2–3 hours per week, for 6 weeks
This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on.
- 2–3 hours per week, for 3 weeks
This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.
- The Dice Tech Job Report lists Data Engineering as the fastest-growing tech occupation with year-over-year growth of 50%.
- Data engineering jobs are listed as one of the top 10 jobs in Glassdoor's best jobs in America for 2020.
- Jefferson Parker lists NoSQL second in its list of the top eight demand Big Data Skills. Multiple sources report expected NoSQL growth of 30% through 2026, with, based on PayScale rankings, with salaries of more than 107K annually.
- In a Towards Data Science 2020 analysis of major site job listings, Apache Spark appears in half of job listings for data engineers. Spark programming language is the third most requested Big Data technology skill by employers.
Meet your instructors from IBM
Experts from IBM committed to teaching online learning
Grow your career. Start your program subscription today.
- Immediate access to all 3 courses in this program
- Course videos, lectures, and readings
- Practice problems and assessments
- Graded assignments and exams
- edX learner support
- Shareable verified certificates after successfully completing a course or program