SoundCloud for Developers

Discover, connect and build

Backstage Blog RSS

You're browsing posts of the category Architecture

  • June 11th, 2014 Scala Finagle Ruby Architecture Building Products at SoundCloud —Part I: Dealing with the Monolith By Phil Calçado

    Most of SoundCloud's products are written in Scala, Clojure, or JRuby. This wasn't always the case. Like other start-ups, SoundCloud was created as a single, monolithic Ruby on Rails application running on the MRI, Ruby's official interpreter, and backed by memcached and MySQL.

    We affectionately call this system Mothership. Its architecture was a good solution for a new product used by several hundreds of thousands of artists to share their work, collaborate on tracks, and be discovered by the industry.

    The Rails codebase contained both our Public API, used by thousands of third-party applications, and the user-facing web application. With the launch of the Next SoundCloud in 2012, our interface to the world became mostly the Public API —we built all of our client applications on top of the same API partners and developers used.

    Diagram 1

    These days, we have about 12 hours of music and sound uploaded every minute, and hundreds of millions of people use the platform every day. SoundCloud combines the challenges of scaling both a very large social network with a media distribution powerhouse.

    To scale our Rails application to this level, we developed, contributed to, and published several components and tools to help run database migrations at scale, be smarter about how Rails accesses databases, process a huge number of messages, and more. In the end we have decided to fundamentally change the way we build products, as we felt we were always patching the system and not resolving the fundamental scalability problem.

    The first change was in our architecture. We decided to move towards what is now known as a microservices architecture. In this style, engineers separate domain logic into very small components. These components expose a well-defined API, and implement a Bounded Context —including its persistence layer and any other infrastructure needs.

    Big-bang refactoring has bitten us in the past, so the team decided that the best approach to deal with the architecture changes would not be to split the Mothership immediately, but rather to not add anything new to it. All of our new features were built as microservices, and whenever a larger refactoring of a feature in the Mothership was required, we extract the code as part of this effort.

    This started out very well, but soon enough we detected a problem. Because so much of our logic was still in the Rails monolith, pretty much all of our microservices had to talk to it somehow.

    One option around this problem was to have the microservices accessing directly the Mothership database. This is a very common approach in some corporate settings, but because this database is a Public, but not Published Interface, it usually leads to many problems when we need to change the structure of shared tables.

    Instead, we went for the only Published Interface we had, which was the Public API. Our internal microservices would behave exactly like the applications developed by third-party organizations integrate with the SoundCloud platform.

    Diagram 2

    Soon enough, we realized that there was a big problem with this model; as our microservices needed to react to user activity. The push-notifications system, for example, needed to know whenever a track had received a new comment so that it could inform the artist about it. At our scale, polling was not an option. We needed to create a better model.

    We were already using AMQP in general and RabbitMQ in specific — In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Sebastian Ohm and Tomás Senart presented the details of how we use AMQP, but over several iterations we developed a model called Semantic Events, where changes in the domain objects result in a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.

    Diagram 3

    This architecture enabled Event Sourcing, which is how many of our microservices deal with shared data, but it did not remove the need to query the Public API —for example, you might need all fans of an artist and their email addresses to notify them about a new track.

    While most of the data was available through the Public API, we were constrained by the same rules we enforced on third-party applications. It was not possible, for example, for a microservice to notify users about activity on private tracks as users could only access public information.

    We explored several possible solutions to the problem. One of the most popular alternatives was to extract all of the ActiveRecord models from the Mothership into a Ruby gem, effectively making the Rails model classes a Published Interface and a shared component. There were several important issues with this approach, including the overhead of versioning the component across so many microservices, and that it became clear that microservices would be implemented in languages other than Ruby. Therefore, we had to think about a different solution.

    In the end, the team decided to use Rails' features of engines (or plugins, depending on the framework's version) to create an Internal API that is available only within our private network. To control what could be accessed internally, we used Oauth 2.0 when an application is acting on behalf of a user, with different authorisation scopes depending on which microservice needs the data.

    Diagram 4

    Although we are constantly removing features from the Mothership, having both a push and pull interface to the old system makes sure that we do not couple our new microservices to the old architecture. The microservice architecture has proven itself crucial to developing production-ready features with much shorter feedback cycles. External-facing examples are the visual sounds, and the new stats system.

  • September 20th, 2013 Architecture Clojure Building Clojure Services at Scale By Duana Stanley

    SoundCloud has a service-oriented architecture, which allows us to use different languages for different services. With concurrency and scaling in mind, we started to build some services in Clojure due to its interoperability with the JVM, the availability of good quality libraries, and we just plain like it as a language.

    How do you build distributed, robust, and scalable micro-services in Clojure? Read what Joseph Wilk, an engineer and Clojure enthusiast at SoundCloud, has to say.

  • December 4th, 2012 Search and Discovery Architecture Architecture behind our new Search and Explore experience By Petar Djekic

    Search is front-and-center in the new SoundCloud, key to the consumer experience. We’ve made the search box one of the first things you see, and beefed it up with suggestions that allow you to jump directly to people, sounds, groups, and sets of interest. We’ve also added a brand-new Explore section that guides you through the huge and dynamic landscape of sounds on SoundCloud. We’ve also completely overhauled our search infrastructure, which helps us provide more relevant results, scale with ease, and experiment quickly with new features and models.

    In the beginning

    In SoundCloud’s startup days, when we were growing our product at super speed, we spent just two days implementing a straightforward search feature, integrating it directly into our main app. We used Apache Solr, as was the fashion at the time, and built a master-slave cluster with the common semantics: writes to the master, reads from the slaves. Besides a separate component for processing deletes, our indexing logic was pull-based: the master Solr instance used a Solr DataImportHandler to poll our database for changes, and the slaves polled the master.

    At first, this worked well. But as time went on, our data grew, our entities became more complex, and our business rules multiplied. We began to see problems.

    The problems

    • The index was only getting updated on the read slaves about every fifteen minutes. Being real-time is crucial in the world of sound, as it’s important that our users’ sounds are searchable immediately after upload.
    • A complete re-index of the master node from the database took up to 24 hours, which made making a simple schema change for a new feature or bugfix a Sisyphean task.
    • Worse, during those complete-reindex periods, we couldn’t do incremental indexing because of limitations in the DataImportHandler. That meant the search index was essentially frozen in time: not a great user experience.
    • Because the search infrastructure was tightly coupled in the main application, low-level Solr or Lucene knowledge leaked everywhere. If we wanted to make a slight change, such as to a filter, we had to touch 25 parts of the codebase in the main application.
    • Because the application directly interacted with the master, if that single-point-of-failure failed, we lost information about updates, and our only reliable recovery was a complete reindex, which took 24 hours.
    • Since Solr uses segment or bulk replication, our network was flooded when the slaves pulled updates from the master, which caused additional operational issues.
    • Solr felt like it was built in a different age, for a different use-case. When we had performance problems, it was opaque and difficult to diagnose.

    In early 2012, we knew we couldn’t go forward with many new features and enhancements on the existing infrastructure. We formed a new team and decided we should rebuild the infrastructure from scratch, learning from past lessons. In that sense, the Solr setup was an asset: it could continue to serve site traffic, while we were free to build and experiment with a parallel universe of search. Green-field development, unburdened by legacy constraints: every engineer’s dream.

    The goals

    We had a few explicit goals with the new infrastructure.

    * Velocity: we want high velocity when working on schema-affecting bugs and features, so a complete reindex should take on the order of an hour. * Reliability: we want to grow to the next order of magnitude and beyond, so scaling the infrastructure—horizontally and vertically—should require minimum effort. * Maintainability: we don’t want to leak implementation details beyond our borders, so we should strictly control access through APIs that we design and control.

    After a survey of the state of the art in search, we decided to abandon Solr in favor of ElasticSearch, which would theoretically address our reliability concerns in their entirety. We then just had to address our velocity and maintenance concerns with our integration with the rest of the SoundCloud universe. To the whiteboard!

    The plan

    On the read side, everything was pretty straightforward. We’d already built a simple API service between Solr and the main application, but had hit some roadblocks when integrating it. We pivoted and rewrote the backend of that API to make ElasticSearch queries, instead: an advantage of a lightweight service-oriented architecture is that refactoring services in this way is fast and satisfying.

    On the write side—conceptually, at least—we had a straightforward task. Search is relatively decoupled from other pieces of business and application logic. To maintain a consistent index, we only need an authoritative source of events, ie. entity creates, updates, and deletes, and some method of enriching those events with their searchable metadata. We had that in the form of our message broker. Every event that search cared about was being broadcast on a well-defined set of exchanges: all we had to do was listen in.

    But depending on broker events to materialize a search index is dangerous. We would be susceptible to schema changes: the producers of those events could change the message content, and thereby break search without knowing, potentially in very subtle ways. The simplest, most robust solution was to ignore the content of the messages, and perform our own hydration. That carries a runtime cost, but it would let us better exercise control over our domain of responsibility. We believed the benefits would outweigh the costs.

    Using the broker as our event source, and the database as our ground truth, we sketched an ingestion pipeline for the common case. Soon, we realized we had to deal with the problem of synchronization: what should we do if an event hadn’t yet been propagated to the database slave we were querying? It turned out that if we used the timestamp of the event as the “external” version of the document in ElasticSearch, we could lean on ElasticSearch’s consistency guarantees to detect, and resolve, problems of sync.

    Once it hit the whiteboard, it became clear we could re-use the common-case pipeline for the bulk-index workflow; we just had to synthesize creation events for every entity in the database. But could we make it fast enough? It seemed at least possible. The final box-diagram looked like this:

    Implementation and optimization began. Over a couple of weeks, we iterated over several designs for the indexing service. We eked out a bit more performance in each round, and in the end, we got where we wanted: indexing our entire corpus against a live cluster took around 2 hours, we had no coupling in the application, and ElasticSearch itself gave us a path for scaling.

    The benefits

    We rolled out the new infrastructure in a dark launch on our beta site. Positive feedback on the time-to-searchability was immediate: newly-posted sounds were discoverable in about 3 seconds (post some sounds and try it out for yourself!). But the true tests came when we started implementing features that had been sitting in our backlog. We would start support for a new feature in the morning, make schema changes into a new index over lunch, and run A/B tests in the afternoon. If all lights were green, we could make an immediate live swap. And if we had any problems, we could rollback to the previous index just as fast.

    There’s no onerous, manual work involved in bootstrapping or maintaining nodes: everything is set up with ElasticSearch’s REST API. Our index definitions are specified in flat JSON files. And we have a single dashboard with all the metrics we need to know how search is performing, both in terms of load and search quality.

    In short, the new infrastructure has been an unqualified success. Our velocity is through the roof, we have a clear path for orders-of-magnitude growth, and we have a stable platform for a long time to come.

    Enter DiscoRank

    It’s not the number of degrees of separation between you and John Travolta: DiscoRank is our modification of the PageRank algorithm to provide more relevant search results: short for Discovery Rank. We match your search against our corpus and use DiscoRank to rank the results.

    The DiscoRank algorithm involves creating a graph of searchable entities, in which each activity on SoundCloud is an edge connecting two nodes. It then iteratively computes the ranking of each node using the weighted graph of entities and activities. Our graph has millions and millions of nodes and a lot more edges. So, we did a lot of optimizations to adapt PageRank to our use case, to be able to keep the graph in memory, and to recalculate the DiscoRank quickly, using results from previous runs of the algorithm as priors. We keep versioned copies of the DiscoRank so that we can swap between them when testing things out.

    How do we know we’re doing better?

    Evaluating the relevance of search results is a challenge. You need to know what people are looking for (which is not always apparent from their query) and you need to get a sample that’s representative enough to judge improvement Before we started work on the new infrastructure, we did research and user testing to better understand how and why people were using search on SoundCloud.

    Our evaluation baseline is a set of manually-tagged queries. To gather the first iteration of that data, we built an internal evaluation framework for SoundCloud employees: thumbs up, thumbs down on results. In addition, we have metrics like click positions, click engagement, precision, and recall, all constantly updating in our live dashboard. This allows us to compare ranking algorithms and other changes to the way we construct and execute queries. And we have mountains of click log data that we need to analyse.

    We have some good improvements so far, but there’s still a lot of tuning that can be done.

    Search as navigation

    The new Suggest feature lets you jump straight to the sound, profile, group, or set you’re looking for. We’ll trigger your memory if you don’t remember how something is spelled, or exactly what it’s called. Since we know the types of the results returned, we can send you straight to the source. This ability to make assumptions and customizations is a consequence of knowing the structure and semantics of our data, and gives a huge advantage in applications like this.

    Architecture-wise, we decided to separate the suggest infrastructure from the main search cluster, and build a custom engine based on Apache Lucene’s Finite State Transducers. As ever, delivering results fast is crucial. Suggestions need to show up right after the keystroke. It turned out we are competing with the speed of light for this particular use-case.

    The fact that ElasticSearch doesn’t come with a suggest engine turned out to be a non-issue and rather forced us to build this feature in isolation. This separation proved to be a wise decision, since update-rate, request patterns and customizations are totally different, and would have made a built-in solution hard to maintain.

    Search as a crystal ball

    Similar to suggest, Explore is based on our new search infrastructure, but with a twist: our main goal for Explore is to showcase the sounds in our community at this very moment. This led to a time-sensitive ranking, putting more emphasis on the newness of a sound.

    So far, so good

    We hope you enjoy using the new search features of SoundCloud as much as we enjoyed building them. Staying focused on a re-architecture is painful when you see your existing infrastructure failing more and more each day, but we haven’t looked back—especially since our new infrastructure allows us to deliver a better user experience much faster. We now can roll out new functionality with ease, and the new Suggest and Explore along with an improved Search itself are just the first results.

    Keep an eye out for more improvements to come!

    Apache Lucene and Solr are trademarks of the Apache Software Foundation

  • August 30th, 2012 API Architecture Open Source Evolution of SoundCloud’s Architecture By Sean Treadway

    This is a story of how we adapted our architecture over time to accomodate growth.

    Scaling is a luxury problem and surprisingly has more to do with organization than implementation. For each change we addressed the next order of magnitude of users we needed to support, starting in the thousands and now we’re designing for the hundreds of millions.  We identify our bottlenecks and addressed them as simply as possible by introducing clear integration points in our infrastructure to divide and conquer each problem individually.

    By identifying and extracting points of scale into smaller problems and having well defined integration points when the time arrived, we are able to grow organically.

    Product conception

    From day one, we had the simple need of getting each idea out of our heads and in front of eyeballs as quickly as possible. During this phase, we used a very simple setup:

    Apache was serving our image/style/behavior resources, and Rails backed by MySQL provided an environment where almost all of our product could be modeled, routed and rendered quickly. Most of our team understood this model and could work well together, delivering a product that is very similar to what we have today.

    We consciously chose not to implement high availability at this point, knowing what it would take when that time hopefully arrived. At this point we left our private beta, revealing SoundCloud to the public.

    Our primary cost optimization was for opportunity, and anything that got in the way of us developing the concepts behind SoundCloud were avoided. For example, when a new comment was posted, we blocked until all followers were notified knowing that we could make that asynchronous later.

    In the early stages we were conscious to ensure we were not only building a product, but also a platform. Our Public API was developed alongside our website from the very beginning. We’re now driving the website with the same API we were offering to 3rd party integrations.

    Growing out of Apache

    Apache served us well, but we were running Rails app servers on multiple hosts, and the routing and virtual host configuration in Apache was cumbersome to keep in sync between development and production.

    The Web tier’s primary responsibility is to manage and dispatch incoming web requests, as well as buffering outbound responses so to free up an application server for the next request as quickly as possible. This meant the better connection pooling and content based routing configuration we had, the stronger this tier would be.

    At this point we replaced Apache with Nginx and reduced our web tier’s configuration complexity, but our architecture didn’t change.

    Load distribution and a little queue theory

    Nginx worked great, but as we were growing, we found that some workloads took significantly more time compared to others (in the order of hundreds of milliseconds).

    When you’re working on a slow request when a fast request arrives, the fast request will have to wait until the slow request finishes, called “head of the line blocking problem”. When we had multiple applications servers each with its own listen socket backlog, analogous to a grocery store, where you inevitably stand at one register and watch all the other registers move faster than your own.

    Around 2008 when we first developed the architecture, concurrent request processing in Rails and ActiveRecord was fairly immature. Even though we felt confident that we could audit and prepare our code for concurrent request processing, we did not want to invest the time to audit our dependencies. So we stuck with the model of a single concurrency per application server process and ran multiple processes per host.

    In Kendall’s notation once we’ve sent a request from the web server to the application server, the request processing can be modeled by a M/M/1 queue. The response time of such a queue depends on all prior requests, so if we drastically increase the average work time of one request the average response time also drastically increases.

    Of course, the right thing to do is to make sure our work times are consistently low for any web request, but we were still in the period of optimizing for opportunity, so we decided to continue with product development and solve this problem with better request dispatching.

    We looked at the Phusion passenger approach of using multiple child processes per host but felt that we could easily fill each child with long-running requests. This is like having many queues with a few workers on each queue, simulating concurrent request processing on a single listen socket.

    This changed the queue model from M/M/1 to M/M/c where c is the number of child processes for every dispatched request. This is like the queue system found in a post office, or a “take a number, the next available worker will help you” kind of queue. This model reduces the response time by a factor of c for any job that was waiting in the queue which is better, but assuming we had 5 children, we would just be able to accept an average of 5 times as many slow requests. We were already seeing a factor of 10 growth in the upcoming months, and had limited capacity per host, so adding only 5 to 10 workers was not enough address the head of the line blocking problem.

    We wanted a system that never queued, but if it did queue, the wait time in the queue was minimal. Taking the M/M/c model to the extreme, we asked ourselves “how can we make c as large as possible?”

    To do this, we needed to make sure that a single Rails application server never received more than one request at a time. This ruled out TCP load balancing because TCP has no notion of an HTTP request/response. We also needed to make sure that if all application servers were busy, the request would be queued for the next available application server. This meant we must maintain complete statelessness between our servers. We had the latter, but didn’t have former.

    We added HAProxy into our infrastructure, configuring each backend with a maximum connection count of 1 and added our backend processes across all hosts, to get that wonderful M/M/c reduction in resident wait time by queuing the HTTP request until any backend process on any host becomes available. HAProxy entered as our queuing load balancer that would buffer any temporary back-pressure by queuing requests from the application or dependent backend services so we could defer designing sophisticated queuing in other components in our request pipeline.

    I heartily recommend Neil J. Gunther’s work Analyzing Computer System Performance with Perl::PDQ to brush up on queue theory and strengthen your intuition on how to model and measure queuing systems from HTTP requests all the way down to your disk controllers.

    Going asynchronous

    One class of request that took a long time was the fan-out of notifications from social activity. For example, when you upload a sound to SoundCloud, everyone that follows you will be notified. For people with many followers, if we were to do this synchronously, the request times would exceed the tens of seconds. We needed to queue a job that would be handled later.

    Around the same time we were considering how to manage our storage growth for sounds and images, and had chosen to offload storage to Amazon S3 keeping transcoding compute in Amazon EC2.

    Coordinating these subsystems, we needed some middleware that would reliably queue, acknowledge and re-deliver job tickets on failure. We went through a few systems, but in the end settled on AMQP because of having a programmable topology, implemented by RabbitMQ.

    To keep the same domain logic that we had in the website, we loaded up the Rails environment and built a lightweight dispatcher class with one queue per concern.  The queues had a namespace that describes estimated work times. This created a priority system in our asynchronous workers without requiring adding the complexity of message priorities to the broker by starting one dispatcher process for each class of work that bound to multiple queues in that work class. Most of our queues for asynchronous work performed by the application are namespaced with either “interactive” (under 250ms work time) or “batch” (any work time). Other namespaces were used specific to each application.


    When we approached the hundreds of thousands user mark, we saw we were burning too much CPU in the application tier, mostly spent in the rendering engine and Ruby runtime.

    Instead of introducing Memcached to alleviate IO contention in the database like most applications, we aggressively cached partial DOM fragments and full pages. This turned into an invalidation problem which we solved by maintaining the reverse index of cache keys that also needed invalidation on model changes in memcached.

    Our highest volume request was one specific endpoint that was delivering data for the widget. We created a special route for that endpoint in nginx and added proxy caching to that stack, but wanted to generalize caching to the point where any end point could produce proper HTTP/1.1 cache control headers and would be treated well by an intermediary we control. Now our widget content is served entirely from our public API.

    We added Memcached and much later Varnish to our stack to handle backend partially rendered template caching and mostly read-only API responses.


    Our worker pools grew, handling more asynchronous tasks. The programming model was similar for all of them: take a domain model and schedule a continuation with that model state to be processed at a later state.

    Generalizing this pattern, we leveraged the after-save hooks in ActiveRecord models in a way we call ModelBroadcast. The principle is that when the business domain changes, events are dropped on the AMQP bus with that change for any asynchronous client that is interested in that class of change. This technique of decoupling the write path from the readers enables the next evolution of growth by accommodating integrations we hadn’t foreseen.

    after_create do |r|
      broker.publish("models", "create.#{}", r.attributes.to_json)
    after_save do |r|
      broker.publish("models", "save.#{}", r.changes.to_json)
    after_destroy do |r|
      broker.publish("models", "destroy.#{}", r.attributes.to_json)

    This isn’t perfect, but it added a much needed non-disruptive, generalized, out-of-app integration point in the course of a day.


    Our most rapid data growth was the result of our Dashboard. The Dashboard is a personalized materialized index of activities inside of your social graph and the primary place to personalize your incoming sounds from the people you follow.

    We have always had a storage and access problem with this component. Looking at the read and write paths separately, the read path needs to be optimized for sequential access per user over a time range. The write path needs to be optimized for random access where one event may affect millions of users’ indexes.

    The solution required a system that could reorder writes from random to sequential and store in sequential format for read that could be grown to multiple hosts. Sorted string tables are a perfect fit for the persistence format, and add the promise of free partitioning and scaling in the mix, we chose Cassandra as the storage system for the Dashboard index.

    The intermediary steps started with the model broadcast and used RabbitMQ as a queue for staged processing, in three major steps: fan-out, personalization, and serialization of foreign key references to our domain models.

    • Fan-out finds the areas of the social graph where an activity should propagate.
    • Personalization looks at the relationship between the originator and destination users as well as other signals to annotate or filter the index entry.
    • Serialization persists the index entry in Cassandra for later lookup and joining against our domain models for display or API representations.


    Our search is conceptually a back-end service that exposes a subset of data store operations over an HTTP interface for queries. Updating of the index is handled similarly to the dashboard via ModelBroadcast with some enhancement from database replicas with index storage managed by Elastic Search.

    Notifications and Stats

    To make sure users are properly notified when their dashboard updates, whether this is over iOS/Android push notifications, email or other social networks we simply added another stage in the Dashboard workflow that receives messages when a dashboard index is updated. Agents can get that completion event routed to their own AMQP queues via the message bus to initiate their own logic. Reliable messages at the completion of persistence is part of the eventual consistency we work with throughout our system.

    Our statistics offered to logged in users at also integrates via the broker, but instead of using ModelBroadcast, we emit special domain events that are queued up in a log then rolled up into a separate database cluster for fast access across the various time ranges.

    What’s next

    We have established some clear integration points in the broker for asynchronous write paths and in the application for synchronous read and write paths to backend services.

    Over time, the application server’s codebase has collected both integration and functional responsibilities. As the product development settles, we have much more confidence now to decouple the function from the integration to be moved into backend services that can be consumed à la carte by not only the application but by other backend services, each with a private namespace in the persistence layer.

    The way we develop SoundCloud is to identify the points of scale then isolate and optimize the read and write paths individually, in anticipation of the next magnitude of growth.

    At the beginning of the product, our read and write scaling limitations were consumer eyeballs and developer hours. Today, we’re engineering for the realities of limited IO, network and CPU. We have the integration points set up in our architecture, all ready for the continued evolution of SoundCloud!