Microsoft Ignite - The Tour - Amsterdam 2019 - Day 2 - Part 2

Part 2 of my second day of the Microsoft Ignite - The Tour conference in Amsterdam on Thursday march 21 in RAI Amsterdam. See my post on the first day here and part one of the second day here.

Diagnosing failure in the cloud

12:10 - 13:10 | SRE30 | Elicium 2 | Jason Hand

“Tailwind Trader’s modern monitoring and alerting processes are working great. So great they have detected some issues with our application and how it is behaving in the cloud. It’s time to make sense of what’s going on and how to resolve trouble. In this module, we will explore the processes and tools available to us to troubleshoot issues as they come up with running applications and infrastructure on Azure. Participants will gain exposure to querying log data found in Log Analytics as well as what to look for within Application Insights and Network Monitor to lead engineers to understanding and solving problems quickly.”

Jason did not forget to tweet a selfie with the audience for this session:

https://twitter.com/jasonhand/status/1108712279623852034

Besides some open doors (check if it’s only you via https://azure.microsoft.com/en-us/status/) his demo of creating alerts via Service Health was really interesting!

The “Service Health“ service (“Personalized alerts and guidance when Azure service issues affect you.”) - never been there!

His demo inspired me to try to create an alert via a Webhook to our Operations channel of Teams whenever an Azure service issues affects us. That’s something I tried before via the RSS feed of Azure status but that contained too much noise (and not enough Actionable Alerting) because we couldn’t filter on region and resource type. The webhook has been created, so hopefully we have some Azure outages soon to see the posts in our Teams channel! :-)

Creating the perfect Azure Service Health Alert

This part is not in the Powerpoint so I hope my notes reflect his points correctly!

  • Don’t create too many or too few - try to find the sweet spot
  • Make sure they don’t overlap
  • Alert on production issues preferentially
  • Create them people-first
  • Separate alerts from planned maintenance and advisory notifications

Application Map

From my notes: “slow requests by name”. Sounds good, but where is that then dear notes? :-)

Log Analytics Queries

Looking at this demo (and the power of the Kusto query language) I made a mental node that I could use these whenever the log queries of our Management Portal let us down!

Azure monitor workbooks and troubleshooting guides

  • Combine text, queries, metrics and parameters into rich interactive reports.
  • Create Azure Monitor Troubleshoot Guides for people who are on call
  • You should deliver the workbook with the actionable alert
  • It supports editing with Markdown! (+1)
  • “One of the greatest best kept secrets of Azure”

The other two of the sessions of this day are covered in part three of this blog post.