Skip to main content

AWS: Introduction to Designing Data Lakes on AWS

In this class, we will help you understand how to create and operate a data lake in a secure and scalable way, without previous knowledge of data science!

5 weeks
1–4 hours per week
Self-paced
Progress at your own speed
Free
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Mar 28

About this course

Skip About this course

Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.

At a glance

  • Institution: AWS
  • Subject: Data Analysis & Statistics
  • Level: Intermediate
  • Prerequisites:

    1-3 years of software development experience

    *You will need a credit card to create an AWS account.

  • Language: English
  • Video Transcript: English
  • Associated programs:
  • Associated skills:Infrastructure, Data Lakes, Scalability, Data Processing, Amazon Web Services, DevOps, Data Science

What you'll learn

Skip What you'll learn
  • Where to start with a Data Lake?
  • How to build a secure and scalable Data Lake?
  • What are the common components of a Data Lake?
  • Why do you need a Data Lake and what it's value?

Week 1: Hello World, I mean, Hello Data Lakes!

  • Video: Meet the Instructors
  • Video: Introduction to Week 1
  • Video: Why Data Lakes?
  • Video: Characteristics of a Data Lake
  • Video: Data Lake Components
  • Reading: Data Lake Characteristics and Components
  • Video: Comparison of a Data Lake to a Data Warehouse
  • Reading: Data Lakes and Data Warehouses
  • Video: Discussing sample Data Lake Architectures
  • Quiz/Assessment: Week 1 quiz

Week 2: AWS data related services

  • Video: Introduction to Week 2
  • Video: AWS Data Lake related services
  • Video: Amazon S3
  • Video: AWS Glue Data Catalog
  • Reading: S3 and Glue Data Catalog
  • Video: AWS Services used for data movement
  • Reading: Kinesis, API Gateway, etc
  • Video: AWS Services for Data processing
  • Video: AWS Services for Analytics
  • Video: AWS Services used for Predictive Analytics and Machine Learning
  • Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
  • Video: Introduction to AWS LakeFormation
  • Reading: LakeFormation
  • Lab: Get familiar with AWS Services and create your first simple data lake

Week 3: Ingesting the rivers

  • Video: Introduction to Week 3
  • Video: Use the right tool for the job
  • Video: Understanding Data Structure and when to process data
  • Video: Data Streaming ingestion with Amazon Kinesis Services
  • Video: Diving Deep on Amazon Kinesis
  • Demo: Batch Data Ingestion with AWS Transfer Family
  • Reading: Batch Data Ingestion with AWS Services
  • Video: Data Cataloging
  • Demo: Using Glue Crawlers
  • Reading: The importance of data cataloging
  • Video: Reviewing the ingestion part of some Data Lake architectures
  • Lab: Ingesting Web Logs

Week 4: Processing and Analyzing data that sits in the Data Lake

  • Video: Introduction to Week 4
  • Video: Data prep and AWS Glue jobs
  • Video: File optimizations
  • Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
  • Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
  • Video: Introduction to Data Lake security
  • Reading: Security and compliance
  • Video: The power of data visualization
  • Video: Introduction to Amazon QuickSight
  • Demo: Amazon Quicksight
  • Reading: Data visualization, Amazon QuickSight
  • Video: Registry of Open Data on AWS
  • Lab: Create an end-to-end Data Lake with AWS Services
  • Video: Course wrap-up!

Learner testimonials

Skip Learner testimonials

'This is an excellent introduction to Data Lakes that let me understand the flexibility and power of building a data lake using a serverless approach to achieve two goals: pay for value and seamlessly scale' - Course Alumnus

Frequently Asked Questions

Skip Frequently Asked Questions

Q. Are there any costs associated with this course?
A. Learners can register for the course in an Audit track or Verified Certificate track. The Audit track is free, but limits the duration of access to 6 weeks from registration. The Verified Certificate track costs $169 and provides full access to course content for the duration. Please visit edx.org for more information.

In addition to course registration costs, this course provides optional hands-on exercises which may have an associated charge in your AWS account. Please familiarize yourself with the AWS Free Tier at aws.amazon.com/free/.

Please note that the AWS Free Tier also has a limit on the amount of resources that you can consume before you begin accruing charges. If you perform these hands-on exercises, there is a chance you may incur charges on your AWS account. Please visit the AWS Free Tier page for more information.

Q. Do I need a credit card to create an AWS Account?
A. Yes, you will need a credit card to activate your AWS account.

Q. How much time will this course require?
A. If following the weekly schedule, learners should plan to spend 2-4 hours per week on this course. However, learners may complete the course at their own pace.

Q. Will I receive a certificate for this course?
A. Learners enrolled in the Verified Certificate path will receive a certificate upon successful completion of the course.

Q. What is the grading policy for this course?
A. All learners may take weekly quizzes, which are not graded and allow unlimited retries.

Learners in the Verified Certificate track are able to take the final course assessment in the course. Passing the final assessment is required to obtain the Verified Certificate.

Learners in the Audit track will not have access to the final assessment, and will not be able to earn a certificate.

Q. How are discussions used in this course?
A. This course has discussion groups aligned to each week of the course. We encourage learners to ask questions or offer suggestions and feedback. AWS Instructors will monitor the discussion groups to answer questions specific to the exercises and topics covered in the course.

Q. Will this course help me prepare for an AWS Certification?
A. Earning an AWS Certification typically requires both knowledge and experience. While this course, if taken in isolation, will provide you with relevant information and skills, it likely will not equip you to earn an AWS Certification. For more information about AWS Certifications, including recommended training and experience requirements, visit aws.amazon.com/certification.

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

This course is part of Cloud Solutions Architecture Professional Certificate Program

Learn more 
Expert instruction
4 skill-building courses
Self-paced
Progress at your own speed
4 months
2 - 4 hours per week

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.