Once upon a time, we had a single monolith of software, one mothership running everything. At SoundCloud, the proliferation of microservices came from moving functionality out of the mothership. There are plenty of benefits to splitting up features in this way. We want the same benefits for our data as well, by defining ownership of datasets and ensuring that the right teams own the right datasets.
Building and operating services distributed across a network is hard. Failures are inevitable. The way forward is having resiliency as a key part of design decisions.
This post talks about two key aspects of resiliency when doing RPC at scale - the circuit breaker pattern, and its power combined with client-side load balancing.
In a previous series of blog posts, we covered our decision to move away from a monolithic architecture, replacing it with microservices, interacting synchronously with each other over HTTP, and asynchronously using events. In this post, we review our progress toward this goal, and talk about the conditions and strategy required to decommission our monolith.
Since we started breaking our monolith and introduced a microservices architecture we rely a lot on synchronous request-response style communication. In this blog post we’ll go over our current status and some of the lessons we learned.
In the first two parts of this series, we talked about how SoundCloud started breaking away from a monolithic Ruby on Rails application into a microservices architecture. In this part we will talk a bit more about the platforms and languages in which we tend to write these microservices.
At the same time that we started the process of building systems outside the Mothership (our Rails monolith) we started breaking our large team of engineers into smaller teams that focused on one specific area…
In the previous post, we talked about how we enabled our teams to build microservices in Scala, Clojure, and JRuby without coupling them with our legacy monolithic Rails system. After the architecture changes were made, our teams were free to build their new features and enhancements in a much more flexible environment. An important question remained, though: how do we extract the features from the monolithic Rails application called Mothership?
Splitting a legacy application is never easy, but…
Most of SoundCloud’s products are written in Scala, Clojure, or JRuby. This wasn’t always the case. Like other start-ups, SoundCloud was created as a single, monolithic Ruby on Rails application running on the MRI, Ruby’s official interpreter, and backed by memcached and MySQL.
We affectionately call this system Mothership. Its architecture was a good solution for a new product used by several hundreds of thousands of artists to share their work, collaborate on tracks, and be discovered by the…
SoundCloud has a service-oriented architecture, which allows us to use different languages for different services. With concurrency and scaling in mind, we started to build some services in Clojure due to its interoperability with the JVM, the availability of good quality libraries, and we just plain like it as a language.
How do you build distributed, robust, and scalable micro-services in Clojure? Read what Joseph Wilk, an engineer and Clojure enthusiast at SoundCloud, has to say.
Search is front-and-center in the new SoundCloud, key to the consumer experience. We’ve made the search box one of the first things you see, and beefed it up with suggestions that allow you to jump directly to people, sounds, groups, and sets of interest. We’ve also added a brand-new Explore section that guides you through the huge and dynamic landscape of sounds on SoundCloud. We’ve also completely overhauled our search infrastructure, which helps us provide more relevant results, scale with…
This is a story of how we adapted our architecture over time to accomodate growth.
Scaling is a luxury problem and surprisingly has more to do with organization than implementation. For each change we addressed the next order of magnitude of users we needed to support, starting in the thousands and now we’re designing for the hundreds of millions. We identify our bottlenecks and addressed them as simply as possible by introducing clear integration points in our infrastructure to divide and…