About this courseSkip About this course
Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.
What you'll learnSkip What you'll learn
- Where to start with a Data Lake?
- How to build a secure and scalable Data Lake?
- What are the common components of a Data Lake?
- Why do you need a Data Lake and what it's value?
Week 1: Hello World, I mean, Hello Data Lakes!
- Video: Meet the Instructors
- Video: Introduction to Week 1
- Video: Why Data Lakes?
- Video: Characteristics of a Data Lake
- Video: Data Lake Components
- Reading: Data Lake Characteristics and Components
- Video: Comparison of a Data Lake to a Data Warehouse
- Reading: Data Lakes and Data Warehouses
- Video: Discussing sample Data Lake Architectures
- Quiz/Assessment: Week 1 quiz
Week 2: AWS data related services
- Video: Introduction to Week 2
- Video: AWS Data Lake related services
- Video: Amazon S3
- Video: AWS Glue Data Catalog
- Reading: S3 and Glue Data Catalog
- Video: AWS Services used for data movement
- Reading: Kinesis, API Gateway, etc
- Video: AWS Services for Data processing
- Video: AWS Services for Analytics
- Video: AWS Services used for Predictive Analytics and Machine Learning
- Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
- Video: Introduction to AWS LakeFormation
- Reading: LakeFormation
- Lab: Get familiar with AWS Services and create your first simple data lake
Week 3: Ingesting the rivers
- Video: Introduction to Week 3
- Video: Use the right tool for the job
- Video: Understanding Data Structure and when to process data
- Video: Data Streaming ingestion with Amazon Kinesis Services
- Video: Diving Deep on Amazon Kinesis
- Demo: Batch Data Ingestion with AWS Transfer Family
- Reading: Batch Data Ingestion with AWS Services
- Video: Data Cataloging
- Demo: Using Glue Crawlers
- Reading: The importance of data cataloging
- Video: Reviewing the ingestion part of some Data Lake architectures
- Lab: Ingesting Web Logs
Week 4: Processing and Analyzing data that sits in the Data Lake
- Video: Introduction to Week 4
- Video: Data prep and AWS Glue jobs
- Video: File optimizations
- Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
- Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
- Video: Introduction to Data Lake security
- Reading: Security and compliance
- Video: The power of data visualization
- Video: Introduction to Amazon QuickSight
- Demo: Amazon Quicksight
- Reading: Data visualization, Amazon QuickSight
- Video: Registry of Open Data on AWS
- Lab: Create an end-to-end datalake with AWS Services
- Video: Course wrap-up!
Meet your instructors
Pursue a Verified Certificate to highlight the knowledge and skills you gain$169 USD
Official and Verified
Receive an instructor-signed certificate with the institution's logo to verify your achievement and increase your job prospects
Add the certificate to your CV or resume, or post it directly on LinkedIn
Give yourself an additional incentive to complete the course
Support our Mission
edX, a non-profit, relies on verified certificates to help fund free education for everyone globally
Frequently asked questions
Q. Are there any costs associated with this course?
A. Learners can register for the course in an Audit track or Verified Certificate track. The Audit track is free, but limits the duration of access to 6 weeks from registration. The Verified Certificate track costs $169 and provides full access to course content for the duration. Please visit edx.org for more information.
In addition to course registration costs, this course provides optional hands-on exercises which may have an associated charge in your AWS account. Please familiarize yourself with the AWS Free Tier at aws.amazon.com/free/.
Please note that the AWS Free Tier also has a limit on the amount of resources that you can consume before you begin accruing charges. If you perform these hands-on exercises, there is a chance you may incur charges on your AWS account. Please visit the AWS Free Tier page for more information.
Q. Do I need a credit card to create an AWS Account?
A. Yes, you will need a credit card to activate your AWS account.
Q. How much time will this course require?
A. If following the weekly schedule, learners should plan to spend 2-4 hours per week on this course. However, learners may complete the course at their own pace.
Q. Will I receive a certificate for this course?
A. Learners enrolled in the Verified Certificate path will receive a certificate upon successful completion of the course.
Q. What is the grading policy for this course?
A. All learners may take weekly quizzes, which are not graded and allow unlimited retries.
Learners in the Verified Certificate track are able to take the final course assessment in the course. Passing the final assessment is required to obtain the Verified Certificate.
Learners in the Audit track will not have access to the final assessment, and will not be able to earn a certificate.
Q. How are discussions used in this course?
A. This course has discussion groups aligned to each week of the course. We encourage learners to ask questions or offer suggestions and feedback. AWS Instructors will monitor the discussion groups to answer questions specific to the exercises and topics covered in the course.
Q. Will this course help me prepare for an AWS Certification?
A. Earning an AWS Certification typically requires both knowledge and experience. While this course, if taken in isolation, will provide you with relevant information and skills, it likely will not equip you to earn an AWS Certification. For more information about AWS Certifications, including recommended training and experience requirements, visit aws.amazon.com/certification.
Who can take this course?
Unfortunately, learners from one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.