SoundCloud for Developers

Discover, connect and build

We use cookies for various purposes including analytics and personalized marketing. By continuing to use the service, you agree to our use of cookies as described in the Cookie Policy.

Backstage Blog RSS

You're browsing posts of the category Prometheus

  • June 4th, 2019 Monitoring Prometheus SRE Alerting on SLOs like Pros By Björn “Beorn” Rabenstein

    If there is anything like a silver bullet for creating meaningful and actionable alerts with a high signal-to-noise ratio, it is alerting based on service-level objectives (SLOs). Fulfilling a well-defined SLO is the very definition of meeting your users’ expectations. Conversely, a certain level of service errors is OK as long as you stay within the SLO — in other words, if the SLO grants you an error budget. Burning through this error budget too quickly is the ultimate signal that some rectifying action is needed. The faster the budget is burned, the more urgent it is that engineers get involved.

    This post describes how we implemented this concept at SoundCloud, enabling us to fulfill our SLOs without flooding our engineers on call with an unsustainable amount of pages.

    Read more...