Speed, Agility, Resilience
Trusted Experts in Microservices, Cloud Native & Chaos Engineering
  • Home
  • EBooks
  • Contact Us
  • Consultancy
    • One-to-One Online Consultancy
    • Onsite Consultancy
  • Training
    • One-to-One Online Training
    • Building Reliable Systems
    • Building Antifragile Systems with Microservices Course
    • Fast Track to Cloud Native Java
    • Fast Track to Applying DDD for Effective Microservices
    • Fast Track to Running Production Microservices
    • Fast Track to Chaos Engineering
    • Autumn of Cloud Native
  • Speaking
    • Schedule
    • Slides and Videos
    • Brown Bag Events
  • Blog
  • FAQ
  • Client Feedback
  • Gallery
  • (Print) Books
  • Essais

Want to build systems that evolve fast and that your users can rely on?

The "Building Reliable Systems Masterclass" course covers:

SYNOPSIS:
Users want reliability. Your business wants speed and agility. You need to invest in resilience, and this is the best course to get you rolling.

Teaching patterns, practices and hard-won lessons from the trenches, this course takes you through effective Site Reliability Engineering, Reliable Delivery Strategies, Designing for Failure, Observability, Engineering Resilience and Chaos Engineering.

It's the course for the modern software owner that gives you the patterns, practices and tools to enable your own organisation's Resilience Engineering capability, helping you build systems that are reliable and evolve at speed.

This course is for you if you are:
  • A software developer with a traditional background and you need to start taking responsibility for your code in production.
  • A site reliability engineer (SRE) with a little experience of managing production and you need to be proactive about finding system weaknesses before your customers do.
  • A system administrator who is responsible for the availability of production and you need a proactive technique for surfacing system weaknesses before your customers experience them.
  • A product owner who is responsible for delivering a business-critical product or service and you need to know how to gain trust and confidence in your system’s reliability.

TOPICS COVERED:
  • How to successfully apply the best parts of Site Reliability Engineering to your organisation
  • How to build a reliable delivery pipleline, without slowing down the pace of change.
  • How to Design for Failure, and Incorporate Observability into your systems.
  • How to Engineer for Resilience through enabling Learning Loops, Blameless Post-mortems and Chaos Engineering.

PREREQUISITES:
No prerequisites are required to get full value out of this course. The samples and practical examples explored use the Chaos Toolkit and work upon a system that comprises Kubernetes as the platform and Spring Boot/Cloud/Java as the sample implementation but no prior knowledge of these technologies is expected.

FORMAT OF THE COURSE:
A 2 or 3 day course duration options. The course is 50% theory, 50% labs to explore the different concepts being discussed. Please bring your laptop.

COURSE OVERVIEW:

  • Introduction
    • The need for the Reliability
    • Reliability and Safety Explained
    • The science of speed and reliability
    • Learning, Resilience and Reliability
    • Ethics and Conditions of Learning
    • Capabilities vs. Maturities
    • Defining Reliability, Resilience and Chaos
    • Cynefin and understanding the pressures on your systems
    • Introducing the Socio-technical system
    • Thinking in Systems and Mental Models
    • Capturing the mental models of your system
    • The Law of Requisite Variety and You.
  • Resilience Engineering Explained
    • Being Poised to Adapt
    • Building and Sustaining the Ability to Continuously Adapt
    • Adaptive Capacities
    • Continuous Adaptability, and Being Prepared to Adapt
  • Site Reliability Engineering Distilled
    • DevOps in a Nutshell
    • SRE in a Nutshell
    • Designing for Operability and Reliability
    • The metrics that matter
    • Getting to grips with Capacity Planning
    • Change Management and Automation
    • Readying the Stage for Emergency Response
    • Identifying, Prioritising and Reducing Toil
  • Building Reliable Delivery Pipelines
    • Software Defined Delivery Goals
    • One thing at once.
    • Deliver with Confidence with Green/Blue and friends
    • Having a “Big Red Button”
  • Observability & Monitoring for Reliability
    • Defining Observability
    • Why a system needs Observability
    • Who needs Observability?
    • The Three Pillars of Observability
    • The Four Golden Signals of Monitoring
    • How Monitoring and Observability Work Together
    • The Extents of Observability
    • Common Observability Measures and Practices
    • How to Successfully Apply Coding for Failure to a Production System
    • Applying Observability to Your Own System.
    • The Dangers of Complacency
  • Engineering Resilience
    • Being Poised to Adapt
    • Building and Sustaining the Ability to Continuously Adapt
    • Adaptive Capacities
    • Continuous Adaptability, and Being Prepared to Adapt
    • Effective Post Mortem Learning: Being Blameless and Embracing the Human Condition
    • Effective Pre-Mortem Learning: Chaos Engineering​

    Enquire about attending this course

Submit

Quotes

  • Adrian J  Nov 27
  • Mark Meehan ‏@MarkMhn  
Very useful and got out of it good engineering concepts
Not one to miss - great insights and opportunity to put them in to practice - no prescription of Java :)

Products

EBooks
​(Print) Books
Consultancy
Training
​
Speaking

Company

Essais
FAQ
Client feedback
Gallery

Support

Contact
Picture
© COPYRIGHT 2018. ALL RIGHTS RESERVED.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.