Łukasz Filut — Large love simplicity — how build metrics on few example incidents (45 minutes)

The talk is about four cases of different availability and performance incidents. Problems has been detected and resolved in large scale systems (two on large payment provider, one large bank and one in large high school portal). All cases will be occasion to introduce GQM methodology in terms of defining proper and understable metrics. During talk I would like to convince audience (especially developers and devops) to take responsibility for this is very crucial element of any monitoring ecosystem. Each case will be described and followed by methods that helped me during incident. On close up I will present two things. I will introduce GQM as method itself and show how this method could help build better, more readable and reliable, monitoring dashboards with GQM.


