Skip to main content

Understanding the World Through Data

Learn how to leverage data and basic machine learning algorithms to understand the world.

...
Understanding the World Through Data

There is one session available:

82 already enrolled!
After a course session ends, it will be archivedOpens in a new tab.
Starts Oct 18
Ends Dec 20

Understanding the World Through Data

Learn how to leverage data and basic machine learning algorithms to understand the world.

Understanding the World Through Data
9 weeks
3–6 hours per week
Instructor-paced
Instructor-led on a course schedule
Free
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Oct 18
Ends Dec 20

About this course

Skip About this course

Whether you're a high school student or someone switching careers, all you need to get started in this course is a curiosity about the topic of machine learning and a willingness to tinker around with your computer. The course is taught by modules. Within each module, you'll have access to videos, short exercises, and a final capstone project.

In Module 1, you'll begin by looking at different kinds of data. To help you explore the data, you'll dive right into some programming with the Python programming language. You don't need to have any programming background, we will guide you on how to leverage Python to explore and visualize any data.

One kind of data you'll work with is data that relates one variable to another. Coming up with a relationship between two variables—one depending on the other—is at the center of Module 2. In that module, you'll build up some core concepts before seeing your first machine learning algorithm. The goal is to use programming to create models that describe mathematical relationship between data. You'll be able to see how good the model is and use it to make predictions about new data.

In Module 3, you'll see a discussion about where imperfections in collected data might come from. You rarely have perfectly “clean” data sets, so it's important to understand how imperfections impact the model that an algorithm might come up with. To this end, we will introduce the notion of data distributions and build up to the concepts of biased and unbiased noise.

Another kind of data you'll work with is data that belongs in different groups (or classes). Creating a model that predicts what group data belongs in is at the center of Module 4. You'll work through different ways of thinking about this problem and see three different ways of approaching making such groupings (classification).

At a glance

  • Institution: MITx
  • Subject: Computer Science
  • Level: Introductory
  • Prerequisites:

    High school (grade 8) math

    • equations of lines and polynomial curves
    • finding average and standard deviation
  • Language: English
  • Video Transcript: English

What you'll learn

Skip What you'll learn

Module 1:

  • The Python programming language and the Colab notebook programming environment
  • Loading datafiles in Colab as dataframes and performing simple operations (selecting rows or columns, filtering data by specific conditions, grouping data, applying functions on the resulting groups)
  • Finding the correlation between columns of the dataframe
  • Visualizing the data using line plots, scatter plots, histograms, correlation matrix

Module 2:

  • Dependent and independent variables and how they correspond to real life scenarios
  • Creating linear and polynomial regression models
  • Perform linear regression on data using Python libraries
  • Compare the quality of different models (mean-squared-error and R^2 values)
  • Learn about the concept of overfitting

Module 3:

  • How to recognize a uniform distribution
  • How to describe a Gaussian distribution
  • Calculate the distribution mean and standard deviation
  • Observe noise in distributions and tell the difference between biased and unbiased noise

Module 4

  • A high-level overview of what it means to create categories given some data
  • Use linear regression to classify a new datapoint as above or below the best fit line
  • Learn how to perform classification using support vector machines
  • Learn how to perform classification using logistic regression
  • Dividing data into training and test sets

Module 1:

  • Examples of numerical data
  • The Python programming language and the Colab notebook programming environment
  • Loading datafiles in Colab as dataframes and performing simple operations (selecting rows or columns, filtering data by specific conditions, grouping data, applying functions on the resulting groups)
  • Finding the correlation between columns of the dataframe
  • Visualizing the data using line plots, scatter plots, histograms, correlation matrix

Module 2:

  • Dependent and independent variables and how they correspond to real life scenarios
  • Intuition for what a linear model is
  • Intuition for what a polynomial model is
  • Python libraries that can perform the linear regression on data
  • Compare the quality of different models (mean-squared-error and R^2 values)
  • Fitting higher order polynomials
  • Overfitting

Module 3:

  • Uniform distributions
  • Gaussian distributions
  • Distribution mean and standard deviation
  • Noise in distributions (biased and unbiased noise)

Module 4

  • Categorizing data based on particular conditions being met
  • Using linear regression to classify a new datapoint as above or below the best fit line
  • Using a support vector classifier to separate two groups of data and classifying a new datapoint into a group
  • Using logistic regression to classify data into two groups and finding the probabilities of a new datapoint falling into each group
  • Understanding how to divide data into training and test sets

About the instructors

Frequently Asked Questions

Skip Frequently Asked Questions

Do I need to know any programming to take this course?
No

Is there a textbook for this course?
No

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.