Skip to main content

TsinghuaX: Advanced Big Data Systems | 高级大数据系统

高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。

Advanced Big Data Systems | 高级大数据系统
16 weeks
3–5 hours per week
Self-paced
Progress at your own speed
Free
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Mar 28
Ends Sep 1

About this course

Skip About this course

本课程将重点讲解高级大数据系统的实现、优化和应用,包括分布式文件系统、MapReduce/Spark、Storm/Spark streaming、Mahout等系统的原理、实现、策略优化。

近年来,人工智能技术正在快速地渗透进各个不同领域。因大数据系统是当今数据驱动人工智能的基础,而变得至关重要。本课程旨在引导学生了解大数据系统的基本概念,包括如何有效地存储、处理和分析数据。课程从分布式系统设计的一般原理出发。之后我们提供了如何在大数据系统中评定存储、计算和网络功能的框架。最后,为了使这些设计原则便于理解,我们的案例研究将使用真实的工业系统来演示基本设计原则如何应用于实际系统,以及该如何分析它们的性能以及局限性。

Recent years have witnessed the rapid increase of the penetration of AI technology into different areas in the industry. Big data systems, the foundation that enables today’s data-driven AI, are thus becoming critically important. This course is dedicated to lead students into the basic concepts of big data systems, covering how data is effectively stored, processed and analyzed. We start from the general principles in the design of distributed systems; then we provide frameworks on how storage, computation, and network capabilities are scaled in big data systems; finally, to make such design principles easy to follow, our case studies use real industrial systems to demonstrate how the basic design principles are applied in real-world systems as well as how their performance and limitation are analyzed.

At a glance

  • Language: 中文
  • Video Transcript: 中文
  • Associated programs:
  • Associated skills:Design Elements And Principles, Big Data, Artificial Intelligence, Apache Spark

What you'll learn

Skip What you'll learn
  • Basic concepts of big data systems
  • Principelsof designing distributed systems
  • Frameworks on scaling storage, computaion and network capabilities
  • Case studeis of recent industrial big data systems, including GFS, MapReduce and Spark
  • Big data processing pipelines such as NoSQL, streaming, and graph data processing

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

This course is part of 数据科学 | Data Science Professional Certificate Program

Learn more 
Expert instruction
6 skill-building courses
Self-paced
Progress at your own speed
2 years
3 - 5 hours per week

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.