Skip to main content

String Processing and Pattern Matching Algorithms

Provided by The University of California, San Diego (UCSanDiegoX)
Intermediate
See prerequisites
8–10 hours
per week, for 4 weeks
Free

$150 USD for graded exams and assignments, plus a certificate

Learn about pattern matching and string processing algorithms and how they apply to interesting applications.

Before you start

Basic knowledge of at least one programming language. Successful completion of the Algorithmic Designs and Techniques and Data Structures Fundamentals courses.

Course opens: Sep 1, 2018
Course ends: Sep 1, 2020

What you will learn

  • Key ideas for pattern matching and suffix trees
  • Suffix arrays
  • Burrows-Wheeler Transform for compression
  • Applications of string algorithms in bioinformatics

Weeks 1 and 2: Suffix Trees
How would you search for a longest repeat in a string in LINEAR time? In 1973, Peter Weiner came up with a surprising solution that was based on suffix trees, the key data structure in pattern matching. Computer scientists were so impressed with his algorithm that they called it the Algorithm of the Year. In this lesson, we will explore some key ideas for pattern matching that will - through a series of trials and errors - bring us to suffix trees.

Week 3 and 4: Burrows-Wheeler Transform and Suffix Arrays
Although EXACT pattern matching with suffix trees is fast, it is not clear how to use suffix trees for APPROXIMATE pattern matching. In 1994, Michael Burrows and David Wheeler invented an ingenious algorithm for text compression that is now known as Burrows-Wheeler Transform. They knew nothing about genomics, and they could not have imagined that 15 years later their algorithm will become the workhorse of biologists searching for genomic mutations. But what text compression has to do with pattern matching??? In this lesson you will learn that the fate of an algorithm is often hard to predict – its applications may appear in a field that has nothing to do with the original plan of its inventors.

Overview

The world and internet are full of textual information. We search for information using textual queries and read websites, books and e-mails.

These are all strings from a computer science point of view. To make sense of all this information and make search efficient, search engines use many string algorithms. Moreover, the emerging field of personalized medicine uses many search algorithms to find disease-causing mutations in the human genome.

In this course, part of the Algorithms and Data Structures MicroMasters program, you will learn about:

  • suffix trees;
  • suffix arrays;
  • how other brilliant algorithmic ideas help doctors to find differences between genomes;
  • power lightning-fast Internet searches.

Meet your instructors

Pavel Pevzner
Ronald R. Taylor Professor of Computer Science
The University of California, San Diego
Michael Levin
Chief Data Scientist
Yandex.Market

Learner testimonials

“This class makes string algorithms a very interesting and enjoyable topic. I would highly suggest taking this course. All the material is very well explained so you can really understand why it works and not only how! The programming assignments give you a hands-on experience implementing the different algorithm.”
-- Previous Student

Who can take this course?

Unfortunately, learners from one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. EdX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

View Courses
This course is part of:

Earn a MicroMasters® Program Certificate in 1 year if courses are taken one at a time.

View the program
  1. 48–60 hours of effort

    Learn how to design algorithms, solve computational problems and implement solutions efficiently.

  2. 48–60 hours of effort

    Learn about data structures that are used in computational thinking – both basic and advanced.

  3. 48–60 hours of effort

    Learn how to use algorithms to explore graphs, compute shortest distance, min spanning tree, and connected components.

  4. 24–30 hours of effort

    Learn about NP-complete problems, known as hard problems that can’t be solved efficiently, and practice solving them using algorithmic techniques.

  5. String Processing and Pattern Matching Algorithms
  6. 32–40 hours of effort

    Learn how dynamic programming and Hidden Markov Models can be used to compare genetic strings and uncover evolution.

  7. 24–30 hours of effort

    Learn how graphs are used to assemble millions of pieces of DNA into a contiguous genome and use these genomes to construct a Tree of Life.

  8. 32–40 hours of effort

    Synthesize your knowledge of algorithms and biology to build your own software for solving a biological challenge.

Get started in computer science

Browse over 600 computer science courses
Of all edX learners:
73% are employed
Of all edX learners:
45% have children
Based on internal survey results
394,009 people are learning on edX today