SoundCloud for Developers

Discover, connect and build

We use cookies for various purposes including analytics and personalized marketing. By continuing to use the service, you agree to our use of cookies as described in the Cookie Policy.

Backstage Blog RSS

You're browsing posts of the category Search

  • January 24th, 2018 Search Data Science Machine Learning Analytics Data PageRank in Spark By Josh Devins

    SoundCloud consists of hundreds of millions of tracks, people, albums, and playlists, and navigating this vast collection of music and personalities poses a large challenge, particularly with so many covers, remixes, and original works all in one place.

    For search on SoundCloud, one of the ways we approach this problem is by using our own version of the PageRank algorithm, which we affectionately refer to as DiscoRank (Get it? Disco as in discovery and Saturday Night Fever?!).

    The job of PageRank is to help rank search results from a query like finding all Go+ tracks called “royals.” At first glance, this task might seem trivial. The first result is, and should indeed be, Lorde’s original song, “Royals.” However, there are plenty of covers and remixes of this track, which leaves us with questions like: Which ones should we show at the top and in which order? What about other tracks in our catalog that have the word “royals” in them? Where should they be in our search results list?