With Java 8 now in the mainstream, Scala and Clojure are no longer the only choices to develop readable, functional code for big data technology on the JVM. In this post we see how SoundCloud is leveraging Apache Crunch and the new Crunch Lambda module to do the high-volume data processing tasks which are essential at early stages in our batch data pipeline efficiently, robustly and simply in Java 8.
Recently we teamed up with Concurrent Inc., the backers of the data-processing framework Cascading, to do a case study of how we use Scalding for some of our data-driven products such as Search. Scalding enables us to iterate quickly, test easily, and it allows for loose coupling of some of our data-processing pipelines.
Check back for future posts about our use of other data-processing tools, and frameworks such as Spark.