Playback on Web at SoundCloud

Maestro is a library we have developed to handle all playback across SoundCloud web applications. It successfully handles tens of millions of plays per day across soundcloud.com, our mobile site, our widget, Chromecast, and our Xbox application. We are considering open sourcing it, and this blog post is a technical overview of what we’ve achieved thus far with Maestro.

What We Support

At SoundCloud, we aim to support all modern web browsers, mobile browsers, and IE 11. Our goal is to provide the best possible playback experience using the functionality provided by your browser.

What We Stream

We currently stream three codecs:

  • mp3
  • opus
  • aac

Our primary protocol is HLS (HTTP Live Streaming). This means the contents of a track are split up into short segments, and we have a separate file (playlist), which contains URLs to all of the segments, along with their corresponding times in the track. You can find more information about HLS here.

What the Browser Provides

We make use of the <audio /> element, Media Source Extensions (MSE), and the Web Audio API.

The minimum support we require is the <audio /> element and the ability to play one of our codecs. MSE and the Web Audio API are required for the best experience.

We are able to gracefully degrade when either the Web Audio API or MSE is missing or there are errors during playback.

We’ll cover what we use MSE and the Web Audio API for in a bit, but first, let’s see what the <audio /> element alone does for us.

<audio />

This is able to take a URL of an audio file and play it back if the codec is supported by the browser. It is informed of the codec in the content-type header on the response, and it provides an API, which can be used to control playback and to determine if the browser supports the codec:

const audio = document.createElement('audio');
audio.src = 'http://example.invalid/something.mp3';

audio.play();

Media Source Extensions

With the <audio /> element alone, the browser does all the work behind the scenes, and you don’t have access to the underlying buffer.

With MSE, we are able to create a buffer for a codec that is supported by the browser. We are then able to handle both downloading the media ourselves and appending it to the buffer. This means we can make optimizations, such as preloading, which is where we download the first few seconds of tracks we think you will play before you click the play button and store them in memory. And then when you click play, we can append this data straight from memory into the buffer instead of having to go to the network:

const audio = document.createElement('audio');
const mse = new MediaSource();
const url = URL.createObjectURL(mse);
audio.src = url;
audio.play();

mse.addEventListener('sourceopen', () => {
  // 'audio/mpeg' for mp3
  const buffer = mse.addSourceBuffer('audio/mpeg');
  buffer.mode = 'sequence';
  const request = new Request('http://example.invalid/segment0.mp3');
  fetch(request)
    .then(response => response.arrayBuffer())
    .then(data => {
      buffer.appendBuffer(data);
    });
});

Web Audio API

The Web Audio API is the newest of the APIs mentioned here. We use a small portion of this API to perform a quick fade in and out when you play, pause, or seek. This makes seeking feel snappier, and playing/pausing is less abrupt:

const audio = document.createElement('audio');
const context = new AudioContext();
const sourceNode = context.createMediaElementSource(audio);
const gainNode = context.createGain();
sourceNode.connect(gainNode);
gainNode.connect(context.destination);

audio.src = 'http://example.invalid/something.mp3';
audio.play();

// Schedule fade out.
gainNode.gain.linearRampToValueAtTime(0, context.currentTime + 1);

Goals of Maestro

  • Simple API
  • Plugin architecture
  • Easy feature detection
  • Type safety
  • Support all major browsers
  • Handle bugs/differences in browser implementations
  • Great performance

    • Ability to preload
    • As responsive as possible
  • Configurable buffer length and cache size

    • Should be able to work on devices with constrained memory, such as Chromecast
  • Metrics

    • Provide error and performance data, which can be monitored to detect bugs and make more improvements

Tech Stack

The API

Maestro consists of many packages. The core package provides an abstract BasePlayer class, which provides the player API. It delegates tasks to specific implementations, but external communication happens through BasePlayer. Up-to-date state can be retrieved through player methods, and the user is also notified of any changes.

For example, the play() method returns a Promise which may resolve or reject. The BasePlayer will inform the implementation when it should play or pause, and the implementation will inform the BasePlayer when it is actually playing. Each player implementation is decoupled from the actual play() method. This also means the isPlaying() method and corresponding updates can be handled completely by BasePlayer. Another example is getPosition(), which will normally ask the implementation for the current time, except for when a seek is in progress, in which case BasePlayer will return the requested position. This means the time from getPosition() always makes sense and users don’t need to override it when seeking to ensure it doesn’t jump around.

Player implementations are contained in separate packages, and they all extend BasePlayer. We currently have the following players:

  • HTML5Player — This is the simplest player. It takes a URL and a MIME type, which are passed directly to a media element.
  • HLSMSEPlayer — This extends HTML5Player, and it takes a Playlist object, which is responsible for providing segment data. This player uses MSE.
  • ChromecastPlayer — This player is a proxy to controlling a Chromecast.
  • ProxyPlayer — This player can control another player, which can be switched on the fly. It also has some configuration related to the direction to sync when a new player is provided. One of the benefits of this player is that it can be provided to apps synchronously, even when the real player isn’t available yet. Then, once the real player is available, its state will be synced to match the proxy. Some other use cases of this are switching between playback on a Chromecast and locally, or switching qualities. The app only has to interact with one player, and the switch can happen behind the scenes.

State Management and Events

There is a lot of playback state to manage, and in Maestro, most of this is contained inside BasePlayer. Users also want to know when parts of the state change and will sometimes react to changes by performing other player actions. This introduces some complexities when we are running on a single thread. Sometimes we also want to update several parts of the state atomically (across multiple functions). An example is: If the user seeks to the end of the media, we also want to update the ended flag to true. The logic concerned with updating the ended flag is not tied to the seeking logic in code, but the update of the seeking state and the ended state should happen together in the API.

To achieve this, we built a component called StateManager, which enables us to:

  • Update multiple parts of the state across functions before calling out to inform the user of the changes.
  • Notify the user of state changes at the end of the player call stack so that any future interactions they have with the player are not interleaved in the call stack as a result. (For example, do the work and then fire the event, instead of firing the event and then doing the work.)

StateManager

The state manager maintains a state object. All changes to this object are made using an update() method, and a callback can be provided, which is then notified of any state changes that happened inside the last update(). These calls can be nested:

type ChangesCallback<State> = (
  changes: Readonly<Partial<State>>,
  state: Readonly<State>
) => void;
type Control = {
  remove: () => boolean;
};
type Subscriber<State> = {
  callback: ChangesCallback<State>;
  localState: State;
};

class StateManager<State extends { [key: string]: Object | null }> {
  private _state: State;
  private _subscribers: Array<Subscriber<State>> = [];
  private _updating = false;

  constructor(initialState: State) {
    this._state = clone(initialState);
    // ...
  }

  public update(callback: (state: State) => void): void {
    const wasUpdating = this._updating;
    this._updating = true;

    try {
      callback(this._state);
    } catch (e) {
      // error handling...
    }

    if (!wasUpdating) {
      this._updating = false;
      this._afterUpdate();
    }
  }

  public subscribe(callback: ChangesCallback<State>, skipPast = true): Control {
    // ...
  }

  private _afterUpdate(): void {
    this._subscribers.slice().forEach(subscriber => {
      const diff = this._calculateDiff(subscriber.localState);
      // We always recalculate the diff just before calling a subscriber,
      // which means that the state is always up to date at the point when
      // the subscriber is called.
      if (Object.keys(diff).length) {
        subscriber.localState = clone(this._state);
        deferException(() => subscriber.callback(diff, subscriber.localState));
      }
    });
  }

  private _calculateDiff(compare: State): Readonly<Partial<State>> {
    // ...
  }
}

Example Usage

type OurState = { a: number; b: string; c: boolean; d: number };
const stateManager = new StateManager<OurState>({
  a: 1,
  b: 'something',
  c: true,
  d: 2
});

stateManager.subscribe(({ a, b, c, d }) => {
  // On first execution:
  // a === 2
  // b === 'something else'
  // c === false
  // d === undefined

  // On second execution:
  // a === undefined
  // b === undefined
  // c === undefined
  // d === 3
  updateD();
});

stateManager.subscribe(({ a, b, c, d }) => {
  // a === 2
  // b === 'something else'
  // c === false
  // d === 3
});

doSomething();

function doSomething() {
  stateManager.update(state => {
    state.a = 2;
    updateB();
    state.c = false;
  });
}

function updateB() {
  stateManager.update(state => {
    state.b = 'something else';
  });
}

function updateD() {
  stateManager.update(state => {
    state.d = 3;
  });
}

Note that the first subscribe callback will be executed twice, but the second will only be executed once, and only with the latest state (i.e. d === 3).

Also note that we don’t get nested call stacks, because the callbacks are only executed once the work is finished.

Browser Limitations

Unfortunately, different browsers have different codec support (which can also depend on the OS) and different container requirements.

Chrome, for example, supports a raw MP3 file in MSE, but Firefox requires the MP3 to be in an MP4 container. This means that in Firefox, we need to package the MP3 that we download into an MP4 in the browser. Other codecs have similar complexities.

It’s also inevitable that there are bugs. Supporting a media processing pipeline that can handle a huge variety of media, without breaking backward compatibility, in a secure way, and in a web browser, is a huge task! Luckily, Maestro is able to handle workarounds for various bugs in different browsers, some which differ under the hood between versions.

Autoplay policies also differ between browsers, and this means we currently have to share media elements between players. This adds complexity because when the source of an element is changed, there are still events emitted for the previous source for a short time after, meaning we have to wait for the “emptied” event before attempting to use it, and we have to keep track of everything that was requested in the meantime. Maestro’s HTML5Player makes this simple with provideMediaElement(mediaEl) and revokeMediaElement(). This allows you to move a media element between players at runtime. When a player doesn’t have a media element, the player is simply stalled.

Testing

The BasePlayer and player implementations are covered by unit tests and integration tests: We use Mocha, Sinon, and Karma, along with mocha-screencast-reporter. The latter is great for viewing the progress of tests running remotely.

The BasePlayer alone currently has more than 700 tests, which ensures that the API behaves correctly. One test, for example, checks that the play() promise is resolved when the implementation reports it is playing. Another checks that play() is rejected with the correct error if the player is killed before the play request completes. There are also tests that check that the player errors if an inconsistency is detected — such as a player implementation reporting that a seek request could not be completed when the BasePlayer never requested a seek operation.

We also run all the tests on a variety of browsers and browser versions (including Chrome and Firefox beta) using SauceLabs. This takes several hours to complete, and so we test the major browsers for pull requests, and then we test everything just before a release. We also run all the tests weekly to ensure that there are no issues arising with new browser versions. Doing so once highlighted a bug with Web Audio in Firefox beta which would cause playback to freeze after the first few seconds.

Progressive Streaming (with the fetch() API)

We recently added support for progressive streaming (in supported browsers). This means that instead of having to wait for an entire segment to be downloaded before we process it and append it to the buffer, we’re able to process the data as it arrives, meaning we’re able to start playback before the segment download has finished.

This was made possible with the fetch() API (and moz-chunked-arraybuffer in Firefox), which provides small parts of the data while it is still being downloaded:

fetch(new Request(url)).then(({ body }) => {
  return body.pipeTo(
    new WritableStream({
      write: chunk => {
        console.log('Got part', chunk);
      },
      abort: () => {
        console.log('Aborted');
      },
      close: () => {
        console.log('Got everything');
      }
    })
  );
});

Before we added progressive streaming, if a download failed, we would just retry it, and this logic was pretty self-contained. With progressive streaming, it is more complex because, if a download fails part-way through, the entire pipeline has already started working on the data. We decided to retry the request on an error and discard all bytes that we have already seen. If the retry fails, then we are able to signal the error down the pipeline.

This also brings about more complexity. Before, we knew that each segment contained a complete number of valid units of audio, meaning the different parts of the pipeline could make certain assumptions. Now each part of data can contain fractions of units of audio, so we need to be able to detect when this happens and keep a buffer that waits for a complete unit to arrive.

What’s Next?

We have run Maestro in production since June 2017, and we’ve have had minimal reports of playback issues. We are able to monitor the performance and errors in real time, and in cases where errors occur, we are able to retrieve playback logs, which help with debugging.

We are looking for where to take Maestro next, and that’s where you come in: Let us know how you would use it and what you would like to see :D

If you have any questions regarding this post, or you notice any playback problems on soundcloud.com ;), please get in touch!