SoundCloud for Developers

Discover, connect and build

We use cookies for various purposes including analytics and personalized marketing. By continuing to use the service, you agree to our use of cookies as described in the Cookie Policy.

Backstage Blog RSS

You're browsing posts of the category Open Source

  • July 19th, 2016 Announcements Open Source Monitoring Go Prometheus has come of age – a reflection on the development of an open-source project By Björn "Beorn" Rabenstein

    On Monday this week, the Prometheus authors have released version 1.0.0 of the central component of the Prometheus monitoring and alerting system, the Prometheus server. (Other components will follow suit over the next months.) This is a major milestone for the project. Read more about it on the Prometheus blog, and check out the announcement of the CNCF, which has recently accepted Prometheus as a hosted project.

    Read more...

  • June 1st, 2016 Open Source Hadoop Big Data Crunch Data pipelines with Apache Crunch and Java 8 By David Whiting

    With Java 8 now in the mainstream, Scala and Clojure are no longer the only choices to develop readable, functional code for big data technology on the JVM. In this post we see how SoundCloud is leveraging Apache Crunch and the new Crunch Lambda module to do the high-volume data processing tasks which are essential at early stages in our batch data pipeline efficiently, robustly and simply in Java 8.

    Read more...

  • January 26th, 2015 Announcements Open Source Monitoring Go Prometheus: Monitoring at SoundCloud By Julius Volz, Björn Rabenstein

    In previous blog posts, we discussed how SoundCloud has been moving towards a microservice architecture. Soon we had hundreds of services, with many thousand instances running and changing at the same time. With our existing monitoring set-up, mostly based on StatsD and Graphite, we ran into a number of serious limitations. What we really needed was a system with the following features:

    • A multi-dimensional data model, so that data can be sliced and diced at will, along dimensions like instance, service, endpoint, and method.

    • Operational simplicity, so that you can spin up a monitoring server where and when you want, even on your local workstation, without setting up a distributed storage backend or reconfiguring the world.

    • Scalable data collection and decentralized architecture, so that you can reliably monitor the many instances of your services, and independent teams can set up independent monitoring servers.

    • Finally, a powerful query language that leverages the data model for meaningful alerting (including easy silencing) and graphing (for dashboards and for ad-hoc exploration).

    All of these features existed in various systems. However, we could not identify a system that combined them all until a colleague started an ambitious pet project in 2012 that aimed to do so. Shortly thereafter, we decided to develop it into SoundCloud's monitoring system: Prometheus was born.

    Read more...

  • May 9th, 2014 Announcements Go Open Source Roshi: a CRDT system for timestamped events By Peter Bourgon

    Let's talk about the stream.

    The SoundCloud stream represents stuff that's relevant to you primarily via your social graph, arranged in time order, newest-first. The atom of that data model, an event, is a simple enough thing.

    • Timestamp
    • User who did the thing
    • Identifier of the thing that was done

    For example,

    If you followed A-Trak, you'd want to see that repost event in your stream. Easy. The difficult thing about time-ordered events is scale, and there are basically two strategies for building a large-scale time-ordered event system.

    Data models

    Fan out on write means everybody gets an inbox.

    Fan out on write

    That's how it works today: we use Cassandra, and give each user a row in a column family. When A-Trak reposts Skrillex, we fan-out that event to all of A-Trak's followers, and make a bunch of inserts. Reads are fast, which is great. But writes carry perverse incentives: the more followers you have, the longer it takes to persist all of your updates. Storage requirements are also quadratic against user growth and follower count (i.e. affiliation density). And mutations, e.g. changes in the social graph, become costly or unfeasible to implement at the data layer. It works, but it's unwieldy in a lot of dimensions.

    At some point, those caveats and restrictions started affecting our ability to iterate on the stream. To keep up with product ideas, we needed to address the infrastructure. And rather than tackling each problem in isolation, we thought about changing the model.

    The alternative is fan in on read.

    Fan in on read

    When A-Trak reposts Skrillex, it's a single append to A-Trak's outbox. When users view their streams, the system will read the most recent events from the outboxes of everyone they follow, and perform a merge. Writes are fast, storage is minimal, and since streams are generated at read time, they naturally represent the present reality. (It also opens up a lot of possibilities for elegant implementations of product features and experiments.)

    Of course, reads are difficult. If you follow thousands of users, making thousands of simultaneous reads, time-sorting, merging, and cutting within a typical request-response deadline isn't trivial. As far as we know, nobody operating at our scale builds timelines via fan-in-on-read. And we presume that's due at least in part to the challenges of reads.

    Yet we saw potential here. Storage reduction was actually huge: we projected a complete fan-in-on-read data size for all users on the order of a hundred gigabytes. At that size, it's feasible to keep the data set in memory, distributed among commodity servers. The problem then becomes coördination: how do you reliably and correctly populate that data system (writes), and materialize views from up to thousands of sources by hard deadlines (reads)?

    Enter the CRDT

    If you're into so-called AP data systems, you've probably run into the term CRDT recently. CRDTs are conflict-free replicated data types: data structures for distributed systems. The tl;dr on CRDTs is that by constraining your operations to only those which are associative, commutative, and idempotent, you sidestep a lot of the complexity in distributed programming. (See: ACID 2.0 and/or CALM theorem.) That, in turn, makes it straightforward to guarantee eventual consistency in the face of failure.

    With a bit of thinking, we were able to map a fan-in-on-read stream product to a data model that could be implemented with a specific type of CRDT. We were then able to focus on performance, optimizing our reads without becoming overwhelmed by incidental complexity imposed by the consistency model.

    Roshi

    The result of our work is Roshi, a distributed storage system for time-series events. It implements what we believe is a novel CRDT set type, closely resembling a LWW-element-set with inline garbage collection. At its core, it uses the Redis ZSET sorted set to store state, and orchestrates self-repairing reads and writes on top, in a stateless operational layer. We spent a long while optimizing the read path to support our latency and QPS requirements, and we're confident that Roshi will accommodate our exponential growth for years. It took about six developer months to build, and we're in the process of rolling it out now.

    Roshi is fully open-source, and all the gory technical details are in the repository, so please do check it out. I hope it's easy to grok: at the time of writing, it's 5000 lines of Go, of which 2300 are tests. And we intend to keep the codebase lean, explicitly not adding features that are outside of the tightly defined problem domain.

    Open-sourcing our work naturally serves the immediate goal of providing usable software to the community. We hope that Roshi may be a good fit for problems in your organizations, and we look forward to collaborating with anyone who's interested in contributing. Open-sourcing also serves another, perhaps more interesting goal, which is advancing a broader discussion about software development. The obvious reaction to Roshi is to ask why we didn't implement it with an existing, proven data system like Cassandra. But we too often underestimate the costs of doing that: costs like mapping your domain to the generic language of the system, learning the subtleties of the implementation, operating it at scale, and dealing with bugs that your likely novel use cases may reveal. There are even second-degree costs: when software engineering is reduced to plumbing together generic systems, software engineers lose their sense of ownership, which is the foundation of craftsmanship and software quality.

    Given a well-defined problem, a specific solution may be far less costly than a generic version: there's a smaller domain translation, a much smaller surface area, and less operational friction. We hope that Roshi stands in evidence for the case that the practice of software engineering can be a more thoughtful and crafted process. Software that is "invented here" can, in the right circumstances, deliver outstanding business value.

    Roshi was a team effort. I'm deeply indebted to the amazing work of Tomás Senart, Björn Rabenstein, and Johan Uhle, without whom Roshi would have never been possible.

  • March 8th, 2014 iOS Mobile Open Source Ruby Sponsoring CocoaPods By Erik Michaels-Ober

    CocoapodsI’m excited to announce that SoundCloud is sponsoring the development of CocoaPods through a Travis Foundation grant. CocoaPods is an open-source dependency manager for Objective-C projects. Travis Foundation is a non-profit that pairs corporate sponsors with open-source projects to make open source even better.

    At SoundCloud, our iOS team uses CocoaPods every day to manage the dependencies of our mobile apps. We hope that this sponsorship will lead to improvements that benefit the entire Mac and iOS developer ecosystem.

    Watch for updates in the coming weeks on the Travis Foundation blog and CocoaPods blog.

  • October 7th, 2013 Open Source Say hello to Sketchy the spam fighter By Ursula Kallio

    Sketchy the spam fighterSketchy is a spam-fighting, open-source software framework developed by SoundCloud engineers Matt Weiden, Rany Keddo, and Michael Brückner. Sketchy reduces malicious user activity on web applications. You can use it to address several common issues:

    • Detect when a user submits text that contains spam content.
    • Detect when a user submits multiple texts that are nearly identical.
    • Rate-limit malicious actions that users perform repeatedly.
    • Check user signatures such as IP addresses against external blacklist APIs.
    • Collect and consolidate reports of spam from users.

    We created Sketchy so that we could fight spam around the clock. Any startup that reaches the scale of SoundCloud has to deal with spammers on a daily basis. We decided to release Sketchy as open-source software so other companies don’t have to reinvent the wheel. Since we’ve released it, Sketchy has been featured on GigaOM.

    Sketchy has been deprecated and is no longer in use. It worked well for many years but has since been replaced by other services.

  • August 9th, 2013 Contests Open Source Ruby Events Win a trip to the Barcelona Ruby Conference By Erik Michaels-Ober

    Barcelona Ruby Conference September MMX111

    The lineup for BaRuCo 2013 looks amazing, with speakers such as Aaron Patterson, Katrina Owen, Sandi Metz, and Ruby’s inventor Yukihiro Matsumoto. The conference is currently SOLD OUT, but we have one extra ticket… and it could be yours!

    If you win the ticket, SoundCloud will fly you from anywhere in the world to Barcelona, Spain and put you up in a nice Catalonian hotel.

    How do you enter to win?

    It’s simple. Just create a command-line interface in Ruby that uses the SoundCloud API. You can use the SoundCloud Ruby SDK, but this is not a requirement. The only constraints are that it is:

    1. a command-line interface,
    2. written in Ruby,
    3. integrated with SoundCloud,
    4. and totally awesome!

    The author of the most creative CLI that uses SoundCloud will win the BaRuCo ticket.

    To submit an entry, push your code to GitHub and send an email that includes a link to the repo to e@soundcloud.com by 12:59:59 UTC on August 26th, 2013. We will announce the winner on August 30th.

    You must be at least 18 years old to enter and allowed to travel to Spain (sorry Edward Snowden).

    Entries will be judged by a team of SoundCloud staff. If you have any questions, email me.

  • August 30th, 2012 API Architecture Open Source Evolution of SoundCloud’s Architecture By Sean Treadway

    This is a story of how we adapted our architecture over time to accomodate growth.

    Scaling is a luxury problem and surprisingly has more to do with organization than implementation. For each change we addressed the next order of magnitude of users we needed to support, starting in the thousands and now we’re designing for the hundreds of millions.  We identify our bottlenecks and addressed them as simply as possible by introducing clear integration points in our infrastructure to divide and conquer each problem individually.

    By identifying and extracting points of scale into smaller problems and having well defined integration points when the time arrived, we are able to grow organically.

    Product conception

    From day one, we had the simple need of getting each idea out of our heads and in front of eyeballs as quickly as possible. During this phase, we used a very simple setup:

    Apache was serving our image/style/behavior resources, and Rails backed by MySQL provided an environment where almost all of our product could be modeled, routed and rendered quickly. Most of our team understood this model and could work well together, delivering a product that is very similar to what we have today.

    We consciously chose not to implement high availability at this point, knowing what it would take when that time hopefully arrived. At this point we left our private beta, revealing SoundCloud to the public.

    Our primary cost optimization was for opportunity, and anything that got in the way of us developing the concepts behind SoundCloud were avoided. For example, when a new comment was posted, we blocked until all followers were notified knowing that we could make that asynchronous later.

    In the early stages we were conscious to ensure we were not only building a product, but also a platform. Our Public API was developed alongside our website from the very beginning. We’re now driving the website with the same API we were offering to 3rd party integrations.

    Growing out of Apache

    Apache served us well, but we were running Rails app servers on multiple hosts, and the routing and virtual host configuration in Apache was cumbersome to keep in sync between development and production.

    The Web tier’s primary responsibility is to manage and dispatch incoming web requests, as well as buffering outbound responses so to free up an application server for the next request as quickly as possible. This meant the better connection pooling and content based routing configuration we had, the stronger this tier would be.

    At this point we replaced Apache with Nginx and reduced our web tier’s configuration complexity, but our architecture didn’t change.

    Load distribution and a little queue theory

    Nginx worked great, but as we were growing, we found that some workloads took significantly more time compared to others (in the order of hundreds of milliseconds).

    When you’re working on a slow request when a fast request arrives, the fast request will have to wait until the slow request finishes, called “head of the line blocking problem”. When we had multiple applications servers each with its own listen socket backlog, analogous to a grocery store, where you inevitably stand at one register and watch all the other registers move faster than your own.

    Around 2008 when we first developed the architecture, concurrent request processing in Rails and ActiveRecord was fairly immature. Even though we felt confident that we could audit and prepare our code for concurrent request processing, we did not want to invest the time to audit our dependencies. So we stuck with the model of a single concurrency per application server process and ran multiple processes per host.

    In Kendall’s notation once we’ve sent a request from the web server to the application server, the request processing can be modeled by a M/M/1 queue. The response time of such a queue depends on all prior requests, so if we drastically increase the average work time of one request the average response time also drastically increases.

    Of course, the right thing to do is to make sure our work times are consistently low for any web request, but we were still in the period of optimizing for opportunity, so we decided to continue with product development and solve this problem with better request dispatching.

    We looked at the Phusion passenger approach of using multiple child processes per host but felt that we could easily fill each child with long-running requests. This is like having many queues with a few workers on each queue, simulating concurrent request processing on a single listen socket.

    This changed the queue model from M/M/1 to M/M/c where c is the number of child processes for every dispatched request. This is like the queue system found in a post office, or a “take a number, the next available worker will help you” kind of queue. This model reduces the response time by a factor of c for any job that was waiting in the queue which is better, but assuming we had 5 children, we would just be able to accept an average of 5 times as many slow requests. We were already seeing a factor of 10 growth in the upcoming months, and had limited capacity per host, so adding only 5 to 10 workers was not enough address the head of the line blocking problem.

    We wanted a system that never queued, but if it did queue, the wait time in the queue was minimal. Taking the M/M/c model to the extreme, we asked ourselves “how can we make c as large as possible?”

    To do this, we needed to make sure that a single Rails application server never received more than one request at a time. This ruled out TCP load balancing because TCP has no notion of an HTTP request/response. We also needed to make sure that if all application servers were busy, the request would be queued for the next available application server. This meant we must maintain complete statelessness between our servers. We had the latter, but didn’t have former.

    We added HAProxy into our infrastructure, configuring each backend with a maximum connection count of 1 and added our backend processes across all hosts, to get that wonderful M/M/c reduction in resident wait time by queuing the HTTP request until any backend process on any host becomes available. HAProxy entered as our queuing load balancer that would buffer any temporary back-pressure by queuing requests from the application or dependent backend services so we could defer designing sophisticated queuing in other components in our request pipeline.

    I heartily recommend Neil J. Gunther’s work Analyzing Computer System Performance with Perl::PDQ to brush up on queue theory and strengthen your intuition on how to model and measure queuing systems from HTTP requests all the way down to your disk controllers.

    Going asynchronous

    One class of request that took a long time was the fan-out of notifications from social activity. For example, when you upload a sound to SoundCloud, everyone that follows you will be notified. For people with many followers, if we were to do this synchronously, the request times would exceed the tens of seconds. We needed to queue a job that would be handled later.

    Around the same time we were considering how to manage our storage growth for sounds and images, and had chosen to offload storage to Amazon S3 keeping transcoding compute in Amazon EC2.

    Coordinating these subsystems, we needed some middleware that would reliably queue, acknowledge and re-deliver job tickets on failure. We went through a few systems, but in the end settled on AMQP because of having a programmable topology, implemented by RabbitMQ.

    To keep the same domain logic that we had in the website, we loaded up the Rails environment and built a lightweight dispatcher class with one queue per concern.  The queues had a namespace that describes estimated work times. This created a priority system in our asynchronous workers without requiring adding the complexity of message priorities to the broker by starting one dispatcher process for each class of work that bound to multiple queues in that work class. Most of our queues for asynchronous work performed by the application are namespaced with either “interactive” (under 250ms work time) or “batch” (any work time). Other namespaces were used specific to each application.

    Caching

    When we approached the hundreds of thousands user mark, we saw we were burning too much CPU in the application tier, mostly spent in the rendering engine and Ruby runtime.

    Instead of introducing Memcached to alleviate IO contention in the database like most applications, we aggressively cached partial DOM fragments and full pages. This turned into an invalidation problem which we solved by maintaining the reverse index of cache keys that also needed invalidation on model changes in memcached.

    Our highest volume request was one specific endpoint that was delivering data for the widget. We created a special route for that endpoint in nginx and added proxy caching to that stack, but wanted to generalize caching to the point where any end point could produce proper HTTP/1.1 cache control headers and would be treated well by an intermediary we control. Now our widget content is served entirely from our public API.

    We added Memcached and much later Varnish to our stack to handle backend partially rendered template caching and mostly read-only API responses.

    Generalization

    Our worker pools grew, handling more asynchronous tasks. The programming model was similar for all of them: take a domain model and schedule a continuation with that model state to be processed at a later state.

    Generalizing this pattern, we leveraged the after-save hooks in ActiveRecord models in a way we call ModelBroadcast. The principle is that when the business domain changes, events are dropped on the AMQP bus with that change for any asynchronous client that is interested in that class of change. This technique of decoupling the write path from the readers enables the next evolution of growth by accommodating integrations we hadn’t foreseen.

    after_create do |r|
      broker.publish("models", "create.#{r.class.name}", r.attributes.to_json)
    end
    
    after_save do |r|
      broker.publish("models", "save.#{r.class.name}", r.changes.to_json)
    end
    
    after_destroy do |r|
      broker.publish("models", "destroy.#{r.class.name}", r.attributes.to_json)
    end
    

    This isn’t perfect, but it added a much needed non-disruptive, generalized, out-of-app integration point in the course of a day.

    Dashboard

    Our most rapid data growth was the result of our Dashboard. The Dashboard is a personalized materialized index of activities inside of your social graph and the primary place to personalize your incoming sounds from the people you follow.

    We have always had a storage and access problem with this component. Looking at the read and write paths separately, the read path needs to be optimized for sequential access per user over a time range. The write path needs to be optimized for random access where one event may affect millions of users’ indexes.

    The solution required a system that could reorder writes from random to sequential and store in sequential format for read that could be grown to multiple hosts. Sorted string tables are a perfect fit for the persistence format, and add the promise of free partitioning and scaling in the mix, we chose Cassandra as the storage system for the Dashboard index.

    The intermediary steps started with the model broadcast and used RabbitMQ as a queue for staged processing, in three major steps: fan-out, personalization, and serialization of foreign key references to our domain models.

    • Fan-out finds the areas of the social graph where an activity should propagate.
    • Personalization looks at the relationship between the originator and destination users as well as other signals to annotate or filter the index entry.
    • Serialization persists the index entry in Cassandra for later lookup and joining against our domain models for display or API representations.

    Search

    Our search is conceptually a back-end service that exposes a subset of data store operations over an HTTP interface for queries. Updating of the index is handled similarly to the dashboard via ModelBroadcast with some enhancement from database replicas with index storage managed by Elastic Search.

    Notifications and Stats

    To make sure users are properly notified when their dashboard updates, whether this is over iOS/Android push notifications, email or other social networks we simply added another stage in the Dashboard workflow that receives messages when a dashboard index is updated. Agents can get that completion event routed to their own AMQP queues via the message bus to initiate their own logic. Reliable messages at the completion of persistence is part of the eventual consistency we work with throughout our system.

    Our statistics offered to logged in users at http://soundcloud.com/you/stats also integrates via the broker, but instead of using ModelBroadcast, we emit special domain events that are queued up in a log then rolled up into a separate database cluster for fast access across the various time ranges.

    What’s next

    We have established some clear integration points in the broker for asynchronous write paths and in the application for synchronous read and write paths to backend services.

    Over time, the application server’s codebase has collected both integration and functional responsibilities. As the product development settles, we have much more confidence now to decouple the function from the integration to be moved into backend services that can be consumed à la carte by not only the application but by other backend services, each with a private namespace in the persistence layer.

    The way we develop SoundCloud is to identify the points of scale then isolate and optimize the read and write paths individually, in anticipation of the next magnitude of growth.

    At the beginning of the product, our read and write scaling limitations were consumer eyeballs and developer hours. Today, we’re engineering for the realities of limited IO, network and CPU. We have the integration points set up in our architecture, all ready for the continued evolution of SoundCloud!

  • July 24th, 2012 Go Open Source Go at SoundCloud By Peter Bourgon

    SoundCloud is a polyglot company, and while we’ve always operated with Ruby on Rails at the top of our stack, we’ve got quite a wide variety of languages represented in our backend. I’d like to describe a bit about how—and why—we use Go, an open-source language that recently hit version 1.

    It’s in our company DNA that our engineers are generalists, rather than specialists. We hope that everyone will be at least conversant about every part of our infrastructure. Even more, we encourage engineers to change teams, and even form new ones, with as little friction as possible. An environment of shared code ownership is a perfect match for expressive, productive languages with low barriers to entry, and Go has proven to be exactly that.

    Go has been described by several engineers here as a WYSIWYG language. That is, the code does exactly what it says on the page. It’s difficult to overemphasize how helpful this property is toward the unambiguous understanding and maintenance of software. Go explicitly rejects “helper” idioms and features like the Uniform Access Principle, operator overloading, default parameters, and even exceptions, on the basis that they create more problems through ambiguity than they solve in expressivity. There’s no question that these decisions carry a cost of keystrokes—especially, as most new engineers on Go projects lament, during error handling—but the payoff is that those same new engineers can easily and immediately build a complete mental model of the application. I feel confident in saying that time from zero to productive commits is faster in Go than any other language we use; sometimes, dramatically so.

    Go’s strict formatting rules and its “only one way to do things” philosophy mean we don’t waste much time bikeshedding about style. Code reviews on a Go codebase tend to be more about the problem domain than the intricacies of the language, which everyone appreciates.

    Further, once an engineer has a working knowledge of Effective Go, there seems to be very little friction in moving from “how the application behaves today” to “how the application should behave in the ideal case.” Should a slow response from this backend abort the entire request? Should we retry exactly once, and then serve partial results? This agent has been acting strangely: can we install a 250ms timeout? Every high-level scenario in the behavior of a system can be expressed in a straightforward and idiomatic implementation, without the need for libraries or frameworks. Removing layers of abstraction reduces complexity; plainly stated, simpler code is better code.

    Go has some other nice properties that we’ve taken advantage of. Static typing and fast compilation enable us to do near-realtime static analysis and unit testing during development. It also means that building, testing and rolling out Go applications through our deployment system is as fast as it gets.

    In fact, fast builds, fast tests, fast peer-reviews and fast deployment means that some ideas can go from the whiteboard to running in production in less than an hour. For example, the search infrastructure on Next is driven by Elastic Search, but managed and interfaced with the rest of SoundCloud almost exclusively through Go services. During validation testing, we realized that we needed the ability to mark indexes as read-only in certain circumstances, and needed the indexing applications to detect and respect this new dimension of index-state. Adding the abstraction in the code, polling a new endpoint to reliably detect the state, changing the relevant indexing behaviors, and writing tests for them, all took half an afternoon. By the evening, the changes had been deployed and running under load for hours. That kind of velocity, especially in a statically-typed, natively-compiled language, is exhilarating.

    I mentioned our build and deployment system. It’s called Bazooka, and it’s designed to be a platform for managing the deployment of internal services. (We’ll be open-sourcing it pretty soon; stay tuned!) Scaling 12-Factor apps over a heterogeneous network can be thought of as one large, complex state machine, full of opportunities for inconsistency and race conditions. Go was a natural choice for this kind of job. Idiomatic Go is safely concurrent by default; Bazooka developers can reason about the complexity of their problem without being distracted by the complexity of their tools. And Bazooka makes use of Doozer to coordinate its shared state, which—in addition to being the only open-source implementation of Paxos in the wild (that we’re aware of)—is also written in Go.

    All together, SoundCloud maintains about half a dozen services and over a dozen repositories written entirely in Go. And we’re increasingly turning to Go when spinning up new backend projects.

    Interested in writing Go to solve real problems and build real products? We’d love to hear from you!

  • December 9th, 2011 Hacks Open Source Stop! Hacker Time. By Alexander Grosse

    At SoundCloud we like to invent new ideas. But we’re not adverse to implementing really great tried and tested ideas like the 20% time concept made famous by Google.

    We’re calling it Hacker Time. We’re still very much in start-up mode so we’re keen to nurture the spirit of hacking. We’ve been testing out Hacker Time for a few weeks now and we’re excited about its potential, from industry-changing initiatives like “Are we playing yet” to unusual passion projects like the “Owl Octave”.

    Why we’re doing it**

    When I arrived at SoundCloud I gathered all the engineers and developers in a room and asked them what they wanted more of. This was one of the top requests:

    Like every other fast-paced tech company out there, the everyday demands of development leave little time for pet projects or invention outside of the roadmap. But we shouldn’t let great ideas slip away: it’s wasteful of talent and it’s frustrating for developers themselves. Hacker Time is an attempt to keep both the ideas and the employees focused on making the product even better.

    I don’t think there’s a single developer at SoundCloud who isn’t a sound creator. Or at least they quickly become one once they’ve joined. We’re a company of electronic music producers, acoustic musicians, field recordists and social sound fanatics. Of course we’re developing the product for 9 million people, but we’re also developing something we want to use ourselves.

    Ironically we’re not trying to invent anything new by initiating Hacker Time. Aside from Google, we’ve been watching the Atlassian experiment and are taking a similar approach; we’ll start with a simple set of rules and adapt to what works best for us. Like Atlassian, we’ll be blogging about our experiences too.

    Which projects are engineers allowed to work on?

    We’ve compiled a list which is not meant to be restrictive but rather should be a guideline:

    • Pet features/improvements that never made it onto the roadmap
    • Apps based on the Soundcloud API
    • Hack Days
    • Anything for SoundCloudLabs.com
    • “this always annoyed me” bug-fixes or architectural improvements
    • Integration of some technology-du-jour with Soundcloud
    • Contributing to OSS used at Soundcloud
    • Other cool projects using SoundCloud
    • Conferences
    • Writing Blog Posts, Technical Articles

    How will time be allocated?

    The big decision is how to allocate time to these projects. We don’t want to cannibalize valuable product development time but we do want to give people as much freedom as possible,

    There are lots of approaches out there – from reserving one day a week, accumulating time to set aside whole weeks or handling that time like vacation. We’ve decided to put the decision-making in the hands of each team, allowing them to allocate Hacker Time according to each team’s workstyle. It’s an experiment, and something we’ll be reviewing in the early stages.

    Demo it!

    We’re pretty disciplined about demoing work to the whole company. Hacker Time projects will be included in the demos (which happen every 2 weeks at SoundCloud), giving developers a chance to showcase their projects and sparking interest in their hacks from everyone in the organization.

    Watch this blog for further reports!

    Many thanks to

    Atlassian for blogging about their experiences

    Simon Stewart from Google

    Jim Webber from Neo4J

    Jan Lehnhardt from Couchbase

    Stefan Roock from it-agile

    for sharing their experiences with us.

  • November 9th, 2011 HTML5 Open Source SoundCloud launches the HTML5 Audio Improvement Initiative By matas

    We at SoundCloud want to build the best sound player for the web, and we want to do that using the Open Web standards. While working on the native audio features on our mobile site and new widgets, or even as an experiment on the main site, we have discovered that the HTML5 Audio standard is not equally well implemented across all modern browsers and some decisions can be made that would benefit the web audio users and web developers alike. Soundcloud launches the “Are We Playing Yet?” project, which aims to raise the awareness about the state of HTML5 Audio implementations in the web browsers.

    We have decided to help the parties involved and collect the issues in one place, document them, provide the code and add interactive tests that will show the implementation progress. We understand how the software development works, and that a few iterations are needed until something is fully done. We hope ”Are We Playing Yet?” can function as a handy development and quality monitoring tool.

    “Are We Playing Yet?” was started by SoundCloud but it’s open to all companies and developers who care about the state of HTML5 audio and want to build applications based on this Web standard. You can get the project source on GitHub, contribute tests and fixes via the pull requests or Issue Tracker, and connect to the people involved via @areweplayingyet on Twitter.