Ir al contenido principal

Learn how to use Spark

Programa XSeries en
Data Science and Engineering with Spark

Lo que aprenderás

  • How to use Spark and its libraries to solve big data problems
  • How to approach large scale data science and engineering problems
  • Spark's APIs, architecture, and many internal details
  • The trade-offs between communication and computation in a distributed environment
  • Use cases for Spark

The Data Science and Engineering with Spark XSeries, created in partnership with Databricks, will teach students how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present an integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.

This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra and calculus are prerequisites for two of the courses in this series.

Capacitación de la mano de expertos
3 cursos de alta calidad
Dictados por instructores
Las tareas y los exámenes tienen fechas de entrega específicas
3 meses
5 - 10 horas por semana
Para obtener la experiencia completa del programa

Cursos en este programa

  1. Programa XSeries en Data Science and Engineering with Spark de BerkeleyX

  2. Inició el Aug 15, 2016
    5–10 horas por semana durante 4 semanas
    Learn how to apply data science techniques using parallel programming in Apache Spark to explore big data.
  3. Inició el Jul 11, 2016
    5–10 horas por semana durante 4 semanas
    Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark.
  4. Inició el Jun 15, 2016
    5–10 horas por semana durante 2 semanas
    Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals.
  5. This series is ideally taken in sequence, but each course can be taken individually.

Conoce a tus instructores

de University of California, Berkeley (BerkeleyX)
Jon Bates
Spark Instructor
University of California, Berkeley
Ameet Talwalkar
Assistant Professor of Computer Science
University of California, Los Angeles
Anthony D. Joseph
Professor in Electrical Engineering and Computer Science
University of California, Berkeley

Expertos de BerkeleyX comprometidos con el aprendizaje en línea


Impulsa tu carrera profesional con programas de crédito respaldados por universidades y certificados verificados.


Estudia y demuestra tu conocimiento a tu ritmo


Prueba un curso antes de pagar


Estudia con compañeros universitarios y colegas de todo el mundo