At SoundCloud, we follow best practices around continuous delivery, i.e. deploying small incremental changes often (many times a day). In order to improve the user experience, we’ve been exploring different ways of reducing the impact and the Mean Time to Recovery (MTTR) of faulty deployments. Enter canary releases. Read more...
Backstage Blog RSS
You're browsing posts of the category Monitoring
August 31st, 2018 Engineering Monitoring Testing Hands-Off Deployment with Canary By Jorge Creixell and Tobias Schmidt
July 19th, 2016 Announcements Open Source Monitoring Go Prometheus has come of age – a reflection on the development of an open-source project By Björn "Beorn" Rabenstein
On Monday this week, the Prometheus authors have released version 1.0.0 of the central component of the Prometheus monitoring and alerting system, the Prometheus server. (Other components will follow suit over the next months.) This is a major milestone for the project. Read more about it on the Prometheus blog, and check out the announcement of the CNCF, which has recently accepted Prometheus as a hosted project.
January 26th, 2015 Announcements Open Source Monitoring Go Prometheus: Monitoring at SoundCloud By Julius Volz, Björn Rabenstein
In previous blog posts, we discussed how SoundCloud has been moving towards a microservice architecture. Soon we had hundreds of services, with many thousand instances running and changing at the same time. With our existing monitoring set-up, mostly based on StatsD and Graphite, we ran into a number of serious limitations. What we really needed was a system with the following features:
A multi-dimensional data model, so that data can be sliced and diced at will, along dimensions like instance, service, endpoint, and method.
Operational simplicity, so that you can spin up a monitoring server where and when you want, even on your local workstation, without setting up a distributed storage backend or reconfiguring the world.
Scalable data collection and decentralized architecture, so that you can reliably monitor the many instances of your services, and independent teams can set up independent monitoring servers.
Finally, a powerful query language that leverages the data model for meaningful alerting (including easy silencing) and graphing (for dashboards and for ad-hoc exploration).
All of these features existed in various systems. However, we could not identify a system that combined them all until a colleague started an ambitious pet project in 2012 that aimed to do so. Shortly thereafter, we decided to develop it into SoundCloud's monitoring system: Prometheus was born.