SoundCloud for Developers

Discover, connect and build

Backstage Blog RSS

  • December 2nd, 2014 Scalding Hadoop SoundCloud in Scalding case study by Concurrent Inc. By Josh Devins

    Recently we teamed up with Concurrent Inc., the backers of the data-processing framework Cascading, to do a case study of how we use Scalding for some of our data-driven products such as Search. Scalding enables us to iterate quickly, test easily, and it allows for loose coupling of some of our data-processing pipelines.

    Check back for future posts about our use of other data-processing tools, and frameworks such as Spark.

  • November 17th, 2014 Announcements API XML responses deprecated By Erik Michaels-Ober

    The SoundCloud API will be dropping support for Extensible Markup Language (XML) responses. XML will be phased out on the following schedule:

    1. XML is currently the default response format for requests without an explicit format specified in the path (e.g. /tracks) or Accept header. This default will be changed to JSON on December 1, 2014.
    2. Explicit requests for XML — specified either in the path (e.g. /tracks.xml) or an Accept: application/xml header — will continue to be supported until December 15, 2014. After that point, only JSON responses will be supported.

    SoundCloud has been using JSON exclusively for internal APIs for several years. Dropping support for XML in our public API will allow us to focus on providing consistent and reliable service.

    If your app still uses XML responses, please start working to upgrade it to JSON immediately. If your app does not currently use XML responses, it should be unaffected by this change.

    If you are unable to migrate your app from XML to JSON for some reason, we recommend accessing the SoundCloud API through a proxy server that converts JSON to XML.

    Please let us know if you have any questions about this update via email.

  • September 15th, 2014 iOS Mobile Building the new SoundCloud iOS application — Part II: Waveform rendering By Richard Howell

    When we rebuilt our iOS app, the player was the core focus. The interactive waveform was at the center of the design. It was important both that the player be fast and look good.

    Initial implementation

    We iterated on the waveform view until it was as responsive as possible. The initial implementation focused on replicating the design, which heavily used CoreGraphics. A single custom view calculates the current bar offset based on its time property. It then draws each of the waveform samples that are in the current visible section of the waveform. Each sample is either rendered as a filled rectangle for unplayed samples, or by adding clip paths to the context then drawing a linear CGGradient for the played samples.

    To handle progress updates, we used the callback block on AVPlayer:

    [player addPeriodicTimeObserverForInterval:CMTimeMake(1, 60)
                                         queue:dispatch_get_main_queue()
                                    usingBlock:^(CMTime time) {
        [waveformView setPlaybackTime:CMTimeGetSeconds(time)];
    }];
    

    This generates 60 callbacks per second to try and achieve the highest possible frame rate. This solution worked perfectly during testing on the simulator. Unfortunately, testing on an iPhone 4 did not produce similar results:

    CPU rendering instruments profile

    The preceding trace displays an initial frame rate of 17 FPS that drops to 10 FPS over time. As time progresses, there are more played waveform samples, which are expensive gradient fills and take more CPU time than unplayed samples. The CPU profiler displays that 90% of CPU time is spent in the drawRect: method of our waveform view.

    Reducing waveform renders

    We needed a new approach that did not require redrawing the entire waveform 60 times a second. The waveform samples never need to change in size, but only in tint color. This makes it possible to render the waveform once, and apply it as an alpha mask to a layer filled with a gradient on the left half, and white on the right half.

    Waveform mask with background colors

    This increases the FPS to the maximum of 60 with approximately 50% GPU usage. The CPU usage dropped to approximately 10%.

    Using system animations

    We can now stop using the AVPlayer timer callback for waveform progress and use a standard UIView animation instead:

    [UIView animateWithDuration:CMTimeGetSeconds(player.currentItem.duration)
                          delay:0
                        options:UIViewAnimationOptionCurveLinear
                     animations:^{
                        self.waveformView.progress = 1;
                    }
                    completion:nil];
    

    This drops CPU usage down to approximately 0%, without noticeably increasing the GPU usage. This would seem to be an obvious win, unfortunately it introduces some visual glitches. Watching the animation on longer tracks shows an amplitude phasing effect on the waveform samples.

    Pixel-aligned waveform rendering Interpolated waveform rendering
    Waveform on pixel boundary Waveform inbetween pixels

    While the mask is animated across the waveform view, the animation will not only be drawn at integer pixel positions. When the mask is at a fractional pixel position, it causes the edges of the mask to be interpolated across each of the neighboring pixels. This reduces the alpha of the mask at the edge of each sample. This lower alpha reduces the amplitude of the visible waveform samples. The overall effect is that as the mask is translated across the view, the waveform samples animate between a sharp dark state and a lighter blurry state.

    We need a way to tell CoreAnimation to only move the mask to pixel-aligned positions. This is possible by using a keyframe animation that is set to discrete calculation mode, which does not interpolate between each keyframe.

    A high-level API was added for keyframe animations in iOS7, but the CoreAnimation interface in this case is a bit clearer:

    - (CAAnimation *)keyframeAnimationFrom:(CGFloat)start to:(CGFloat)end
    {
        CAKeyframeAnimation *animation =
            [CAKeyframeAnimation animationWithKeyPath:@"position.x"];
    
        CGFloat scale = [[UIScreen mainScreen] scale];
        CGFloat increment = copysign(1, end - start) / scale;
        NSUInteger numberOfSteps = ABS((end - start) / increment);
    
        NSMutableArray *positions =
            [NSMutableArray arrayWithCapacity:numberOfSteps];
        for (NSUInteger i = 0; i < numberOfSteps; i++) {
            [positions addObject:@(start + i * increment)];
        }
    
        animation.values = positions;
        animation.calculationMode = kCAAnimationDiscrete;
        animation.removedOnCompletion = YES;
        return animation;
    }
    

    Adding this animation to the mask layer produces the same animation, but only at pixel-aligned positions. This has a performance benefit; animating longer tracks produces animations with the FPS reduced to the maximum possible FPS for the number of animation keyframes, which reduces the device utilization accordingly. In this case, the distance of waveform animation is 680 pixels. If the played track is 10 minutes long, the FPS is 680 animation pixels over 600 seconds ≈ 1 pixel position per second. CoreAnimation schedules the animation frames intelligently, so that we only get a 1 FPS animation with correspondingly reduced 1% device utilization.

    Increasing up-front cost for cheaper animations

    The CPU performance is at an acceptable level, but 50% GPU usage is still high. Most of this is due to the high cost of masking, which requires a texture blend on the GPU. As we fetch a waveforms samples asynchronously for each track, initial render time for the view is less important than performance during playback. Instead of masking the waveform samples, they can be drawn twice: once in the unplayed state, once in the played state with gradient applied. That way, each waveform is contained in separate views that clip their subviews. To adjust the waveform position within these windows, the bounds origin can be adjusted for each subview to slide over its content, similar to how a UIScrollView works.

    Clip view hierarchy

    You can offset the bounds origin of the left clip view by its width. The result is a seamless view that looks exactly the same as the masking effect. Because we are only moving views around in the hierarchy, redrawing does not need to occur and the same keyframe animations can be applied to each of the clip views. After applying these changes, the GPU utilization drops to less than 20% for 60 FPS animation. This was considered fast enough, and is our final iteration of the waveform renderer.

    Conclusion

    Understanding the performance cost of drawing techniques on iOS can be tricky. The best way to achieve acceptable performance is to start simple, profile often, and iterate until you hit your target. We significantly improved performance by profiling and identifying the bottleneck at each step, and most of the drawing code from our initial naïve implementation survived to the final performant iteration.

    We obtained the biggest win by shifting work from the CPU to the GPU. Any CoreGraphics drawing done using drawRect: uses the CPU to fill the layers contents. This is often unavoidable, but if the content seldom changes, the layer contents can be cached and manipulated by CoreAnimation on the GPU. After the drawing reduces to manipulating UIView properties, nearly all of the work can be performed using animations, thereby reducing the amount of CPU view state updates.

    It is important to consider GPU usage, but this is harder to understand intuitively. Profiling with Instruments provides helpful insight, especially the OpenGL ES Driver template, which shows animation FPS and the percentage utilization of the GPU. The main tricks here are to reduce blending and masking, ideally using opaque layers wherever possible. The simulator option "Color Blended Layers" can be useful to identify where you have unnecessary overdraw. For more details see the Apple documentation and WWDC videos.

  • July 7th, 2014 iOS Mobile Building the new SoundCloud iOS application — Part I: The reactive paradigm By Mustafa Sezgin & Jan Berkel

    Recently, SoundCloud launched the new iOS application which was a complete rewrite of the existing iOS application. The Mobile engineering team saw this as an opportunity to build a solid foundation for the future of SoundCloud on iOS and to experiment with new technologies and processes at the same time.

    In the world of mobile, you deal with data, errors, threads and concurrency a lot. The common scenario starts with a user tapping on the screen. The application jumps off of the main UI thread, does some I/O related work, some database-related operations along with some transformations to the data. The application then jumps back to the UI thread to render some new information onto the screen.

    Both the Android and iOS platforms provide some tools to deal with the different aspects of this scenario yet they are far from ideal. Some of them do not provide much in terms of error handling, which forces you to write boiler-plate code while some of them force you to deal with low-level concurrency primitives. You might have to add some additional libraries to your project so that you do not have to write filtering and sorting predicate code.

    We knew early on we wanted to avoid these issues and thus came into the picture the functional reactive paradigm and Reactive Cocoa.

    In short, Reactive Cocoa allows you to create composable, event-driven (finite or infinite) streams of data while making use of functional composition to perform transformations on those streams. Erik Meijer is best known in this space with his Reactive Extensions on the .NET platform which also spawned the JVM based implementation RxJava. By adopting this paradigm, we now have a uniform way of dealing with data models and operators that can apply transformations on those data models while taking care of low level concurrency primitives so that one does not have to be concerned about threads or the difficult task of concurrent programming.

    Let's take an example

    Like most mobile applications, the SoundCloud iOS application is a typical case of an API client with local storage. It fetches JSON data from the API via HTTP and parses it into API model objects. The persistent store technology we use is Core Data. We decided early on that we wanted to isolate the API from our storage representation so there is a final mapping step involved where we convert API models to Core Data models.

    We break this down into smaller units of work: we have

    1. Execute a network request. Parse the JSON response.
    2. Transform JSON objects into API model objects.
    3. Transform API model objects into Core Data models.

    1. Executing the network request

    For simplicity, assume that the network-access layer implements the following method:

    - (RACSignal *)executeRequest:(NSURL *)url;
    

    We do not pass in any delegates or callback blocks, we just give it an NSURL and get back a RACSignal representing a possible asynchronous operation, or future. To obtain the data from that operation, we can subscribe to the signal using subscribeNext:error:completed:

    - (RACDisposable *)subscribeNext:(void (^) (id result))nextBlock
                               error:(void (^) (NSError *error))errorBlock
                           completed:(void (^) (void))completedBlock
    

    You might recognize the familiar-looking error and success-callback blocks from other asynchronous APIs. This is where some of Reactive Cocoa's and FRP's strengths lie as we shall see later.

    2. Parsing the JSON response

    After the network request has been made, the JSON response needs to be parsed into an API model representation. For this we use a thin layer around Github's Mantle library, which wraps parsing in a RACSignal and pushes the result (or error) to the subscriber:

    - (RACSignal *)parseResponse:(id)data
    {
      return [RACSignal createSignal:^(id<RACSubscriber> subscriber) {
        NSError *error = nil;
        id apiModel = [MTLJSONAdapter modelOfClass:ApiTrack.class
                                fromJSONDictionary:data
                                             error:&error];
        if (error) {
          [subscriber sendError:error];
        } else {
          [subscriber sendNext:apiModel];
          [subscriber sendCompleted];
        }
      }
    }
    

    To achieve the composition of operations that we have mentioned earlier, we wrapped the functionality of existing libraries with signals, where appropriate.

    3. Persisting the API model with Core Data

    In our architecture, the database represents the single source of truth. Therefore to show tracks to the user we first need to store them as Core Data objects. We have a collection of adapter classes that are responsible for mapping API model objects to Core Data model objects. An ApiTrackAdapter might look as follows:

    - (RACSignal *)adaptObject:(ApiTrack *)apiTrack
    {
      return [[self findOrBuildTrackWithUrn:apiTrack.urn]
                               map:^(CoreDataTrack *coreDataTrack) {
          coreDataTrack.title = apiTrack.title;
          // set other properties
          return coreDataTrack;
      }];
    }
    

    Putting it all together

    We now have the building blocks to issue a network request, parse the JSON, and store it as a Core Data object. RAC makes it very easy to compose the individual methods functionally by feeding the output of each operation as an input to the next one. The following example uses flattenMap:

    -(RACSignal *)loadAndStoreTrack:(NSURL *)url
    {
      return [[requestHandler executeRequest:url] flattenMap:^(id json) {
        return [[parser parseResponse:json] flattenMap:^(ApiTrack *track) {
          return [adapter adaptObject:track];
        }];
      }];
    }
    

    The flattenMap: method maps or transforms values emitted by a signal and produces a new signal as a result. In this example, the newly created signal returned by loadAndStoreTrack: would either return the adapted Core Data track object or error if any of the operations failed. In addition to flattenMap:, there is a whole range of predefined functional operators like filter:or reduce: that can be applied to signals.

    RAC Schedulers

    We left out one powerful feature of RAC which is the ability to parametrize concurrency. To ensure that the application stays responsive, we want to perform the network I/O and model parsing in a background queue.

    Core Data operations are different, we do not have a choice there. They have to be executed on a predefined private queue, otherwise we risk creating deadlocks in our application.

    With the help of RACScheduler we can easily control where the side-effects of a signal are performed by simply calling subscribeOn: on it with a custom scheduler implementation:

    -(RACSignal *)loadAndStoreTrack:(NSURL *)url
    {
      return [[requestHandler executeRequest:url] flattenMap:^(id json) {
        return [[parser parseResponse:json] flattenMap:^(ApiTrack *track) {
          return [[adapter adaptObject:track]
                           subscribeOn:CoreDataScheduler.instance];
        }];
      }];
    }
    

    Here, we use a scheduler that is aware of the current Core Data context to ensure that adaptObject: is executed on the right queue by wrapping everything internally with performBlock:.

    If we want to update our UI with the title of the track we just fetched, we could do something like the following:

    [[trackService loadAndStoreTrack:trackUrl] subscribeNext:^(Track *track) {
       self.trackView.text = track.title;
    } error:^(NSError *error) {
      // handle errors
    }];
    

    To ensure that this final update happens on the UI thread we can tell RAC to deliver us the information back on the main thread by using the deliverOn: method:

    [[[trackService loadAndStoreTrack:trackUrl]
          deliverOn:RACScheduler.mainThreadScheduler]
      subscribeNext:^(Track *track) {
       self.trackView.text = track.title;
    } error:^(NSError *error) {
      // handle errors
    }];
    

    By breaking down each operation within this common scenario into isolated units of work, it becomes easier to perform the operations we need on the desired threads by taking advantage of Reactive Cocoa's scheduling abilities. The functional reactive paradigm has also helped us to compose these independent operations one after another by using operators such as flattenMap. Although adopting FRP and ReactiveCocoa has had its difficulties, we have learned many lessons along the way.

    Steep learning curve

    Adopting FRP requires a change of perspective, especially for developers who are not used to a functional programming style. Methods do not return values directly, they return intermediate objects (signals) that take callbacks. This can lead to more verbose code, especially when the code is heavily nested which is common when doing more complex things with RAC.

    Therefore, it is important to have short and well-named methods, for example a method signature like -(RACSignal *)signal does not communicate anything about the type of values the caller is going to receive.

    Another problem is the sheer number of methods or operators defined on a base classes like RACStream / RACSignal. In practice only a few (like flattenMap: or filter:) are used on a regular basis, but the remaining 80% tend to confuse developers who are new to the framework.

    Memory management

    Memory management can be problematic because of RAC's heavy use of blocks which can easily lead to retain cycles. They can be avoided by breaking the cycle with weak references to self (@weakify / @strongify macros).

    One of RAC's promises is to reduce the amount of state you need to keep around in your code. This is true but you still need to manage the state introduced by the framework itself, which comes in the form of RACDisposable, an object returned as a result of signal subscription. A common pattern we introduced is to bind the lifetime of the subscription to the lifetime of the object with asScopedDisposable:

    self.disposable = [[signal subscribeNext:^{ /* stuff */ }] asScopedDisposable];
    

    Overdoing it

    It is easy to fall into the trap of trying to apply FRP to every single problem one encounters (also known as the golden hammer syndrome), thereby unnecessarily complicating the code. Defining clear boundaries and rules between the reactive and non-reactive parts of the code base is important to minimize verbosity and to use the power of FRP And Reactive Cocoa where appropriate.

    Performance

    There are inherent performance problems within RAC. For example, a simple imperative for loop is guaranteed to execute much faster than flattenMap: which introduces a lot of internal method dispatching, object allocation, and state handling.

    In most cases this overhead is not noticeable, especially when I/O latency is involved, as in the preceding examples.

    However in situations where performance really matters, such as fast UI rendering, it makes sense to avoid RAC completely.

    Debugging

    We found this to be a non-issue if your application components are well designed and individually tested. Backtraces tend to get longer but this can be alleviated with some extra tooling like custom LLDB filters. A healthy amount of debug logging across critical components also does not hurt.

    Testing

    Testing a method that returns a RACSignal is more complicated than testing code that returns plain value objects, but it can be made less painful with a testing library that supports custom matchers. We have created a collection of matchers for expecta that lets us write concise tests. For example:

    RACSignal *signal = [subject executeRequest:url];
    expect(signal).to.sendSingle(@{ @"track": @{ @"title": @"foo" } });
    

    We found that adopting FRP tends to produce easily testable components because they are generally designed to perform one single task, which is to produce an output given a specific input.

    It took a while for the team to get up to speed with FRP and Reactive Cocoa and to learn for which parts of the application it can be used most effectively. Right now it has become an indispensable part of our mobile development efforts, both on Android and iOS. The functional reactive approach has made it easier to build complex functionality out of smaller pieces whilst simplifying concurrency and error handling.

  • July 3rd, 2014 Data Real-time counts with Stitch By Emily Green

    We made Stitch to provide counts and time-series of counts in real-time.

    Stitch was initially developed to do the timelines and counts for our stats pages. This is where users can see which of their tracks were played and when.

    SoundCloud Stats Screenshot

    Stitch is a wrapper around a Cassandra database. It has a web application that provides read-access to the counts through an HTTP API. The counts are written to Cassandra in two distinct ways, and it's possible to use either or both of them:

    Real-time
    For real-time updates, Stitch has a processor application that handles a stream of events coming from a broker and increments the appropriate counts in Cassandra.
    Batch
    The batch part is a MapReduce job running on Hadoop that reads event logs, calculates the overall totals, and bulk loads this into Cassandra.

    The problem

    The difficulty with real-time counts is that incrementing is a non-idempotent operation, which means that if you apply the same increment twice you get a different value to when you apply it once. If an incident affects our data pipeline, and the counts are wrong, we cannot fix by simply re-feeding the day's events through the processors; we would risk double counting.

    Our first solution

    Initially, Stitch only supported real-time updates and addressed this problem with a MapReduce job named The Restorator that performed the following actions:

    1. Calculated the expected totals
    2. Queried Cassandra to get the values it had for each counter
    3. Calculated the increments needed to apply to fix the counters
    4. Applied the increments

    Meanwhile, to stop the sand shifting under its feet, The Restorator needed to coordinate a locking system between itself and the real-time processors, so that the processors did not try to simultaneously apply increments to the same counter, resulting in a race-condition. It used ZooKeeper for this.

    As you can probably tell, this was quite complex, and it could take a long time to run. But despite this, it did indeed work.

    Our second solution

    We got a new use-case; a team wanted to run Stitch purely in batch. This is when we added the batch layer and took the opportunity to revisit the way Stitch was dealing with the non-idempotent increments problem. We evolved to a Lambda Architecture style approach, where we combine a fast real-time layer for a possibly inaccurate but immediate count, with a batch slow layer for an accurate but delayed count. The two sets of counts are kept separately and updated independently, possibly even living on different database clusters. It is up to the reading web application to return the right version when queried. At its naïvest, it returns the batch counts instead of the real-time counts whenever they exist.

    Stitch Diagram

    Thanks go to Kim Altintop and Omid Aladini who created Stitch, and John Glover who continues to work on it with me.

    If this sounds like the sort of thing you'd like to work on too, check out our jobs page.