Skip to main content

IBM: SRE Fundamentals and Security

Learn foundational principles and terminology needed to understand the new and growing discipline of Site Reliability Engineering. Explore operation strategies and best practices for monitoring and managing services health and security.

SRE Fundamentals and Security
5 weeks
2–3 hours per week
Self-paced
Progress at your own speed
Free
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Mar 28
Ends Dec 31

About this course

Skip About this course

Site Reliability Engineers must have the right tools and strategies to perform in a technical, fast-paced environment. IBM Cloud SRE is guided by nine competency areas that lead to the successful practice of the discipline:

Applying Site Reliability Engineering principles

Operations

Monitoring and incident management

Security and compliance

Compute infrastructure

Networking

Storage and data management

Reliability and resiliency

Deployment automation

In this first course of the three-part Professional Certificate in Site Reliability Engineering (SRE), you will focus on the first four SRE competencies:

Applying Site Reliability Engineering principles

Operations

Monitoring and incident management

Security and compliance

NOTE: The remaining five SRE competencies are covered in Course 2: SRE Infrastructure, Resiliency and Deployment Automation.

This course covers approximately 50% of the content required to help you prepare for the “IBM Certified Professional SRE - Cloud V2” certification exam.

If you are interested in pursuing the “IBM Certified Professional SRE - Cloud V2” certification, we recommend that you complete all three offerings of the Professional Certificate in Site Reliability Engineering (SRE) to ensure a successful certification exam experience.

At a glance

  • Institution: IBM
  • Subject: Computer Science
  • Level: Intermediate
  • Prerequisites:

    At least 1 year experience in SRE or technology.

    Understanding of:

    DevOps practices

    Software engineering principles

    System administration

    Network and OSI model

    Incident management

    Root cause analysis

    Recommended courses:

  • Language: English
  • Video Transcript: English
  • Associated programs:
  • Associated skills:Site Reliability Engineering, Resilience Planning, IBM Cloud Computing

What you'll learn

Skip What you'll learn

Applying Site Reliability Engineering principles

Manage the trade-off between change, velocity, and reliability of services

Negotiate service level objectives, service level indicators, and error budgets

Design and deploy automation strategies

Leverage IBM Cloud tools and technology across the software development life cycle

Understand the roles and responsibilities for SRE effectiveness

Operations

Monitor resource utilization

Perform operational readiness review (ORR)

Employ cost-optimization strategies

Identify key metrics for service health

Monitoring and incident management

Create and maintain metrics, traces, and alerts

Collect, analyze, and manage logs on IBM Cloud

Manage incidents

Perform post incident review

Recognize and differentiate performance and availability metrics

Perform statistical analysis and create actionable outcomes

Security and compliance

Monitor security threats

Implement and manage security policies

Implement encryption models

Manage role-based access control (RBAC) on IBM Cloud

● Define the shared responsibility model ****

Module 1: Welcome and Introduction

You will cover the following topics:

An introduction to the IBM Professional SRE role

Module 2: SRE Fundamentals and Terminology

You will cover the following topics:

Deeper dive into SRE role

SRE principles

Managing trade-offs between change, velocity, and reliability

Negotiating service level objectives, service level indicators, error budgets and the user experience

IBM Cloud tools and technology across the Software Development Life Cycle

Applying software engineering principles to drive reliability

Module 3: Operations

You will cover the following topics:

Performing operational readiness reviews (ORR) on IBM Cloud

Creating ORR checklist

Employing cost-optimization strategies

Managing backups and recoveries on IBM Cloud

Module 4: Monitoring

You will cover the following topics:

Monitoring overview

Creating and maintaining metrics, traces, and alerts on IBM Cloud

Collecting, analyzing, and managing logs on IBM Cloud

Identifying key metrics for service health on IBM Cloud

Using performance and availability metrics to measure the health of services on IBM Cloud

Module 5: Incident Management

You will cover the following topics:

Managing incidents on IBM Cloud

Developing a balanced action plan to mitigate future incidents

Performing the post-incident review

Module 6: Security and Compliance

You will cover the following topics:

Monitoring and managing security threats on IBM Cloud

Implementing and managing security policies on IBM Cloud

Implementing encryption models

Managing role-based access control on IBM Cloud

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

This course is part of Site Reliability Engineering (SRE) Professional Certificate Program

Learn more 
Expert instruction
3 skill-building courses
Self-paced
Progress at your own speed
4 months
2 - 3 hours per week

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.