Building Products at SoundCloud—Part III: Microservices in Scala and Finagle

In the first two parts of this series, we talked about how SoundCloud started breaking away from a monolithic Ruby on Rails application into a microservices architecture. In this part we will talk a bit more about the platforms and languages in which we tend to write these microservices.

At the same time that we started the process of building systems outside the Mothership (our Rails monolith) we started breaking our large team of engineers into smaller teams that focused on one specific area of our platform.

It was a phase of high experimentation, and instead of defining which languages or runtimes these teams should use, we had the rule of thumb write it in whatever you feel confident enough putting in production and being on-call for.

This led to a Cambrian Explosion of languages, runtimes and skills. We had systems being developed in everything from Perl to Julia, including Haskell, Erlang, and node.js.

While this process proved quite productive in creating new systems, we started having problems when maintaining them. The bus factor for several of our systems was very low, and we eventually decided to consolidate our tools.

Based on the expertise and preferences across teams, and an assessment of the industry and our peers, we decided to stick to the JVM and select JRuby, Clojure, and Scala as our company-wide supported languages for product development. For infrastructure and tooling, we also support Go and Ruby.

Turns out that selecting the runtime and language is just one step in building products in a microservices architecture. Another important aspect an organization has to think about is what stack to use for things like RPC, resilience, and concurrency.

After some research and prototyping, we ended up with three alternatives: a pure Netty implementation, the Netflix stack, and the Finagle stack.

Using pure Netty was tempting at first. The framework is well documented and maintained, and the support for HTTP, our main protocol for RPC, is good. After a while, though, we found ourselves implementing abstractions on top of it to do basic things for the concurrency and resilience requirements of our systems. If such abstractions were to be required, we would rather use something that exists than re-invent the wheel.

We tried the Netflix stack, and a while back Joseph Wilk wrote about our experience with Hystrix and Clojure. Hystrix does very well in the resilience and concurrency requirements, but its API based on the Command pattern was a turnoff. In our experience, Hystrix commands do not compose very well unless you also use RxJava, and although we use this library for several back-end systems and our Android application, we decided that the reactive approach was not the best for all of our use cases.

We then started trying out Finagle, a protocol-agnostic RPC system developed by Twitter and used by many companies our size. Finagle does very well in our three requirements, and its design is based on a familiar and extensible Pipes-and-Filters meets Futures model.

The first issue we found with Finagle is that, as opposed to the other alternatives, it is written in Scala, therefore the language runtime jar file is required even for a Clojure or JRuby application. We decided that this wasn’t too important, though it adds about 5MB to the transitive dependencies’ footprint, the language runtime is very stable and does not change often.

The other big issue was to adapt the framework to our conventions. Twitter uses mostly Thrift for RPC; we use HTTP. They use ZooKeeper for Service Discovery; we use DNS. They use a Java properties-based configuration system; we use environment variables. They have their own telemetry system; we have our own telemetry system (we’re not ready to show it just yet, but stay tuned for some exciting news there). Fortunately, Finagle has some very nice abstractions for these areas, and most of the issues were solved with very minimal changes and there was no need to patch the framework itself.

We then had to deal with the very messy state of Futures in Scala. Heather Miller, from the Scala core team, explained the history and changes introduced by newer versions of the language in a great presentation. But in summary, what we have across the Scala ecosystem are several different implementations of Futures and Promises, with Finagle coupled to Twitter’s Futures. Although Scala allows for compatibility between these implementations, we decided to use Twitter’s everywhere, and invest time in helping the Finagle community move closer to the most recent versions of Scala rather than debug weird issues that this interoperability might spawn.

With these issues addressed, we focused on how best to develop applications using Finagle. Luckly, Finagle’s design philosophy is nicely described by Marius Eriksen, one of its core contributors, in his paper Your Server as a Function. You don’t need to follow these principles in your userland code, but in our experience everything integrates much better if you do. Using a Functional programming language like Scala makes following these principles quite easy, as they map very well to pure functions and combinators.

We have used Finagle for HTTP, Thrift, memcached, Redis, and MySQL. Every request to the SoundCloud platform is very likely hitting at least one of our Finagle-powered microservices, and the performance we have from these is quite amazing.

In the last part of this series of blog posts, we will be talking about how Finagle and Scala are being used to move away from a one-size-fits-all RESTful API to optmized back-ends for our applications.