Skip to main content

IBM: SRE Infrastructure, Resiliency and Deployment Automation

Discover the importance of reliability engineering and resiliency for services and how the deployment pipeline can be used to help with automation. Explore various infrastructure types, troubleshoot common service issues, including Kubernetes and Openshift clusters.

SRE Infrastructure, Resiliency and Deployment Automation
6 weeks
2–3 hours per week
Self-paced
Progress at your own speed
Free
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Mar 28
Ends Dec 31

About this course

Skip About this course

Site Reliability Engineers must have the right tools and strategies to perform in a fast-paced technical environment. Nine competency areas guide the successful practice of IBM Cloud SREs.

Applying Site Reliability Engineering principles

Operations

Monitoring and incident management

Security and compliance

Compute infrastructure

Networking

Storage and data management

Reliability and resiliency

Deployment automation

In this second course of the three-part Professional Certificate in Site Reliability Engineering (SRE), you will focus on the following five SRE competencies:

Compute infrastructure

Networking

Storage and data management

Reliability and resiliency

Deployment automation

NOTE: The remaining four SRE competencies are covered in Course 1: SRE Fundamentals and Security.

This course covers approximately 50% of the required content to help you prepare for the “IBM Certified Professional SRE - Cloud V2” certification exam.

If you are interested in pursuing the “IBM Certified Professional SRE - Cloud V2” certification, to improve your passing success, we recommend that you complete all three offerings of the Professional Certificate in Site Reliability Engineering (SRE) to ensure a successful certification exam experience.

At a glance

  • Institution: IBM
  • Subject: Computer Science
  • Level: Intermediate
  • Prerequisites:

    At least 1 year experience in SRE or technology.

    Understanding of:

    DevOps practices

    Software engineering principles

    System administration

    Network and OSI model

    Incident management

    Root cause analysis

    Recommended courses:

  • Language: English
  • Video Transcript: English
  • Associated programs:
  • Associated skills:Resilience, Automation, Reliability Engineering, IBM Cloud Computing, Resilience Planning, OpenShift, Troubleshooting (Problem Solving), Kubernetes, Reliability, Site Reliability Engineering

What you'll learn

Skip What you'll learn

Compute infrastructure

● Troubleshoot VMs, IBM Kubernetes Service (IKS), Red Hat OpenShift and serverless services on IBM Cloud

Configure for high availability and scalability

Explain the impact of compute on service performance

Networking

Troubleshoot external connections to IBM Cloud

Troubleshoot inter service connectivity on IBM Cloud

Explain the reliability ramifications of IBM Cloud networking features

Explain the impact of networking on service performance

Storage and data management

Manage storage and data attributes

Manage data replication and retention

Explain the impact of storage on service performance

● Monitor data security and compliance

● Identify storage data durability and capacity management

Reliability and resiliency

Design and improve reliability for the system/service

Design for failure and recovering from failure

Deployment automation

Design non-disruptive deployment

Troubleshoot provisioning of IBM Cloud resources

Implement Infrastructure as Code

Explain the responsibilities of the SRE to the CI/CD Pipelines

Troubleshoot CI/CD pipelines

Module 1: Compute Infrastructure

You will cover the following topics:

IBM Cloud service models: IaaS, PaaS, and FaaS

Troubleshooting VMs on IBM Cloud

Troubleshooting clusters on IBM Kubernetes Service

Troubleshooting clusters on Red Hat OpenShift on IBM Cloud

Troubleshooting serverless services

Module 2: Networking

You will cover the following topics:

Applying IBM Cloud networking features

Implementing and managing virtual networks on IBM Cloud

Configuring name resolution on IBM Cloud

Managing performance on IBM Cloud

Troubleshooting external connections on IBM Cloud

Troubleshooting interservice connectivity on IBM Cloud

Module 3: Storage and data management

You will cover the following topics:

Managing storage and data attributes

Managing storage accounts

Managing data on IBM Cloud

Managing data replication and retention

Module 4: Reliability and resiliency

You will cover the following topics:

Importance of reliability and resiliency for services

Designing and improving Reliability for systems and services

Designing for failure and recovering from failure

Module 5: Deployment automation

You will cover the following topics:

Deployment automation

Implement Infrastructure as Code

SRE responsibilities to CI/CD pipeline

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

This course is part of Site Reliability Engineering (SRE) Professional Certificate Program

Learn more 
Expert instruction
3 skill-building courses
Self-paced
Progress at your own speed
4 months
2 - 3 hours per week

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.