SoundCloud for Developers

Discover, connect and build

We use cookies for various purposes including analytics and personalized marketing. By continuing to use the service, you agree to our use of cookies as described in the Cookie Policy.

Backstage Blog RSS

You're browsing posts of the category Monitoring

  • June 4th, 2019 Monitoring Prometheus SRE Alerting on SLOs like Pros By Björn “Beorn” Rabenstein

    If there is anything like a silver bullet for creating meaningful and actionable alerts with a high signal-to-noise ratio, it is alerting based on service-level objectives (SLOs). Fulfilling a well-defined SLO is the very definition of meeting your users’ expectations. Conversely, a certain level of service errors is OK as long as you stay within the SLO — in other words, if the SLO grants you an error budget. Burning through this error budget too quickly is the ultimate signal that some rectifying action is needed. The faster the budget is burned, the more urgent it is that engineers get involved.

    This post describes how we implemented this concept at SoundCloud, enabling us to fulfill our SLOs without flooding our engineers on call with an unsustainable amount of pages.

    Read more...

  • August 31st, 2018 Engineering Monitoring Testing Hands-Off Deployment with Canary By Jorge Creixell and Tobias Schmidt

    At SoundCloud, we follow best practices around continuous delivery, i.e. deploying small incremental changes often (many times a day). In order to improve the user experience, we’ve been exploring different ways of reducing the impact and the Mean Time to Recovery (MTTR) of faulty deployments. Enter canary releases.

    Read more...

  • July 19th, 2016 Announcements Open Source Monitoring Go Prometheus has come of age – a reflection on the development of an open-source project By Björn "Beorn" Rabenstein

    On Monday this week, the Prometheus authors have released version 1.0.0 of the central component of the Prometheus monitoring and alerting system, the Prometheus server. (Other components will follow suit over the next months.) This is a major milestone for the project. Read more about it on the Prometheus blog, and check out the announcement of the CNCF, which has recently accepted Prometheus as a hosted project.

    Read more...

  • January 26th, 2015 Announcements Open Source Monitoring Go Prometheus: Monitoring at SoundCloud By Julius Volz, Björn Rabenstein

    In previous blog posts, we discussed how SoundCloud has been moving towards a microservice architecture. Soon we had hundreds of services, with many thousand instances running and changing at the same time. With our existing monitoring set-up, mostly based on StatsD and Graphite, we ran into a number of serious limitations. What we really needed was a system with the following features:

    • A multi-dimensional data model, so that data can be sliced and diced at will, along dimensions like instance, service, endpoint, and method.

    • Operational simplicity, so that you can spin up a monitoring server where and when you want, even on your local workstation, without setting up a distributed storage backend or reconfiguring the world.

    • Scalable data collection and decentralized architecture, so that you can reliably monitor the many instances of your services, and independent teams can set up independent monitoring servers.

    • Finally, a powerful query language that leverages the data model for meaningful alerting (including easy silencing) and graphing (for dashboards and for ad-hoc exploration).

    All of these features existed in various systems. However, we could not identify a system that combined them all until a colleague started an ambitious pet project in 2012 that aimed to do so. Shortly thereafter, we decided to develop it into SoundCloud's monitoring system: Prometheus was born.

    Read more...