Author

Nick Wright

[guest author fields > title]


Warning: Undefined array key “carousel” in /srv/users/atlassian/apps/atlassianstaging/public/wp-content/plugins/curator/dist/feed/render.php on line 68

Warning: Trying to access array offset on value of type null in /srv/users/atlassian/apps/atlassianstaging/public/wp-content/plugins/curator/dist/feed/render.php on line 68

Inside Atlassian: how our site reliability engineers do incident management

“Ohhhhh $#!τ. We broke Confluence.” In one of our first Confluence Cloud releases in 2016, we broke our users’ ability to edit pages. As the head of Atlassian’s site reliability engineering group, this kind of thing falls right into my wheelhouse. In this post, I’ll walk you through how we responded to the situation to get Confluence working again. I’ll give an insider’s view of our incident management process, as well as how we’ve configured Atlassian tools to support this work.