With Java 8 now in the mainstream, Scala and Clojure are no longer the only choices to develop readable, functional code for big data technology on the JVM. In this post we see how SoundCloud is leveraging Apache Crunch and the new Crunch Lambda module to do the high-volume data processing tasks which are essential at early stages in our batch data pipeline efficiently, robustly and simply in Java 8.
Backstage Blog RSS
You're browsing posts of the category Hadoop
June 1st, 2016 Open Source Hadoop Big Data Crunch Data pipelines with Apache Crunch and Java 8 By David Whiting
December 2nd, 2014 Scalding Hadoop SoundCloud in Scalding case study by Concurrent Inc. By Josh Devins
Recently we teamed up with Concurrent Inc., the backers of the data-processing framework Cascading, to do a case study of how we use Scalding for some of our data-driven products such as Search. Scalding enables us to iterate quickly, test easily, and it allows for loose coupling of some of our data-processing pipelines.
Check back for future posts about our use of other data-processing tools, and frameworks such as Spark.