What you will learn
- The core principles of Site Reliability Engineering.
- How to design automation strategies; perform operational readiness reviews; employ cost-optimization strategies; and manage backups and recoveries.
- Approaches for cloud monitoring; identifying key metrics and measuring service health.
- How to identify and manage incidents; develop action plans to mitigate future risk; and perform post incident reviews.
- The key concepts to monitor and manage security threats.
- How to troubleshoot common IBM Cloud issues.
- How to design and improve reliability for systems and cloud services and employ best practices to automate deployments.
This Professional Certificate program in Site Reliability Engineering (SRE) helps you build the skills and knowledge required to work independently as a Site Reliability Engineer. This program from the IBM Center for Cloud Training covers operations, monitoring, troubleshooting, incident management, security and deployments on the IBM Cloud and key skills for any SRE professional. You will be trained on SRE principles and the tools that you can use to help organizations gain greater resiliency, availability, and reliability for their cloud-based workloads. This program can help you earn the IBM Cloud Professional Site Reliability Engineer certification, which validates your skills and can expand your career opportunities.
The first two courses provide interactive and applied learning on SRE principles, operational readiness, service health monitoring, root cause analysis, implementation and management of compute, networking and storage options, reliability and resiliency services and deployment automation.
The third course, the Capstone, includes certification exam preparation study materials and practice exercises in a virtual lab environment.
Upon completion of all three courses, you should have acquired the skills to operate services that sustain service level objectives and engineer scalable, secure, and highly reliable and resilient services in the IBM Cloud. The program will help you prepare for the IBM Cloud Professional Site Reliability Engineer v2 certification exam.
Courses in this program
IBM's Site Reliability Engineering (SRE) Professional Certificate
- 2–3 hours per week, for 5 weeks
Learn foundational principles and terminology needed to understand the new and growing discipline of Site Reliability Engineering. Explore operation strategies and best practices for monitoring and managing services health and security.
- 2–3 hours per week, for 6 weeks
Discover the importance of reliability engineering and resiliency for services and how the deployment pipeline can be used to help with automation. Explore various infrastructure types, troubleshoot common service issues, including Kubernetes and Openshift clusters.
- 2–3 hours per week, for 4 weeks
The SRE Capstone offers interactive study guides and flash cards that will help you prepare for the Professional SRE - Cloud V2 certification exam. All enrolled learners will receive a discount code for 50% off the certification exam cost.
Also included are hands-on lab exercises that allow you to put the knowledge you gained from the SRE Fundamentals and Security and SRE Infrastructure, Resiliency and Deployment Automation courses into action.
- The Site Reliability Engineer is a vital role to lead and drive changes to team processes and culture.
- Site Reliability Engineer ranked #5 in LinkedIn’s 2020 Emerging Jobs Report, with 34% annual job growth.
- According to Glassdoor, the national average salary for a Site Reliability Engineer is $127,718 per year in the United States.