January 5, 2015

Post mortem documentations or how to build knowledge during failures

Few months ago I tried to get my head around Post-Mortem documentations and found it particularly hard to fill the gap between the publicly available documentations and my aim to have company internal documentations which teams could use to share knowledge and learn from past mistakes. During my research I came across lot’s of publicly available information which helped me to dive into the topic. But unfortunately information was widely distributed and I though that sharing my link collection could help to shorten your way a bit.

Basics

Some good reads if you want to learn what Post-Mortems are:

Postmortem reviews: purpose and approaches in software engineering (Time invest 30 mins)
O’Reilly “Web Operations - Keeping the Data On Time“ — Chapter 13 “How to Make Failure Beautiful: The Art and Science of Postmortems” (Time invest 20 mins)
The Project Post-Mortem: A Valuable Tool for Continuous Improvement (Time invest 5-15 mins)

Foundations

If you want to look further into the topic, you’ve to deal with human error and failure. These will give you some idea how large this topic is:

How Complex Systems Fail - Richard Cook (Time invest 15mins)
Velocity 2012 (Video): How Complex Systems Fail - Richard Cook (Time invest 30 mins)
Fallible Humans (Time invest 35 mins)
The Human Side of Postmortems (Time invest 45 mins)
Field Guide to Understanding Human Error - Sidney Dekker

Instructions

Adding up on top of that, there are lot’s of blog-posts, interviews and descriptions on how post-mortems should be conducted:

Say Goodbye to Post-Mortems. Say Hello to Effective Problem Management -(Time invest 30mins)
(Video) John Allspaw (Etsy) Interview - Velocity Santa Clara 2014 (Time invest 30 mins)
(Slides) How to Run a Post-Mortem With Humans (Not Robots) (Time invest 10 mins)
Don’t Repeat your Mistakes: Conducting Post-mortems (Time invest 7 mins)
Extending Agile Methods: Postmortem Reviews as Extended Feedback (Time invest 20 mins)
(Slides) It’s not your fault (Time invest 5 mins)
(Video) How to write an Incident Report / Postmortem (Time invest 5 mins)
The Three Ingredients of a Great Postmortem (Time invest 5 mins)
Blameless PostMortems and a Just Culture (Time invest 10 mins)
Morgue: Helping Better Understand Events by Building a Post Mortem Tool - Bethany Macri (Time invest 33 mins)
(Slides) Human Factors and PostMortems (5 mins)
Blameless Post-Mortems (Time invest 5 mins)
What Adopting Blameless Post-Mortems Has Taught Me About Culture - Mathias Meyer (Time invest 7 mins)
DevOps: To increase reliability you need to have more outages (Time invest 7 mins)
What blameless really means (Time invest 3 mins)
Postmortems, sans finger-pointing: The O’Reilly Radar Podcast (Time invest 30 mins)

Tools

Once you discovered all of that and you want to apply it in your team, there are even some tools available:

Etsy Morgue (Github)
Post Mortem Documents (incl. Excel template)
Post Mortem Template

With all of those you get a great insight in what type of culture you should establish in your team and essentially this makes up a good internal documentation and brings up good input for public statements. Which kind of filled the gap for me.