This is part two in a series about solving Gradle remote build cache misses. Solving build cache misses is important to both avoid work that has already been compiled and improve build speeds. For more information, check out part one here.
To understand how these cache misses happen, it is important to understand how we seed our build cache. Gradle recommends that your CI system populates the build cache from clean builds and that developers only load from it. At SoundCloud, we also pull from the build cache on CI in order to make our CI builds faster. This is a diagram of what our remote build cache setup looks like.
For the same task, differences in the list of inputs when the task is running on the CI system vs. on the local machine causes these build cache misses.
So now on to the explanation of the title. Part of a build engineer’s role is to speed up builds. Improving build performance and avoiding work with caching is one way to achieve this, but another tool in the build engineer’s belt is that of disallowing slow builds. Sometimes, there are two paths that produce the same outcome. Let’s call them the slow path and the fast path. If we can always take the fast path, we can improve our build speed.
Often it isn’t obvious which path is the “slow path” and which path is the “fast path.” When confronted with two paths in real life, we usually ask Google Maps which path has the most traffic so that we can avoid it. With Android Builds, there may be limited guidance from Google. So, it is the build engineer’s role to investigate and disallow the slow paths in order to make engineers happy and productive.
One of the slow paths we discovered at SoundCloud was Instant Run. At first, we didn’t realize that it was causing problems; we only saw that some of our build scans showed that some of our developers were building with the Gradle build cache disabled. Since we have org.gradle.caching=true
checked in to our repository in the root gradle.properties
, we were puzzled as to how it could happen. We paired with developers to troubleshoot these slow builds, and we quickly realized that they were being built with Instant Run.
Instant Run was a tool designed to allow developers to deploy changes to devices faster, but it didn’t always work in large projects. One big limitation is that it disabled the build cache. This sometimes led to painfully slow builds, as everything had to be recompiled. Additionally, this meant that when building with Instant Run, both the local and remote build cache were disabled. We did a performance analysis of Instant Run in different scenarios, and it showed that in our codebase, Instant Run performed either equally as fast as a build without it, or slower — sometimes significantly slower.
We already had a page in our internal wiki recommending our team disable Instant Run for faster builds, but nobody reads documentation! So how could we disable Instant Run for our team and save everyone from slow builds?
Here is a Groovy snippet for Gradle which does exactly that:
def instantRun = project.getProperties().get("android.optional.compilation")?.contains("INSTANT_DEV")
if (instantRun) {
throw new IllegalStateException("Disable Instant Run: Android Studio -> Preferences -> Build, Execution, Deployment -> Instant Run")
}
The snippet checks the project for a specific property, which Android Studio injects into Gradle whenever Instant Run is enabled. When a build is run with Instant Run, the build fails and a helpful error message is displayed guiding the developer to disable Instant Run. This might be slightly annoying to colleagues, but it is all for the team’s benefit: A 10-second settings change is rewarded many times over with faster builds.
Learn more about how injected properties in Android Studio work here.
Instant Run was removed in Android Gradle plugin 3.5.0.
Another slow path we had at SoundCloud was that of empty directories. Empty directories in Gradle create a slow path compilation, which we should disallow. But how and why does this happen? Well, let’s suppose you have the following directory structure on your local machine:
src/main/java/com/soundcloud/Player.java
src/main/java/com/soundcloud/Artwork.java
src/main/java/com/soundcloud/audio/
And let’s assume you have the following on CI:
src/main/java/com/soundcloud/Player.java
src/main/java/com/soundcloud/Artwork.java
If audio is an empty folder, are the inputs to the task the same or not? The Java compiler itself ignores empty source directories. Gradle, however, does not. Empty directories are treated as different input properties. Well, that is, until this issue is resolved.
How can we push out a change to delete all empty directories on all our local project workspaces? If we take a cue from the previous section, we can fail the build if we find any empty source directories. This ensures that the directory structure, as Gradle sees it, is the same locally as it is on CI:
allprojects {
tasks.withType(SourceTask).configureEach { t ->
t.doFirst {
t.source.visit { FileVisitDetails d ->
if (d.file.directory && d.file.listFiles().size() == 0) {
throw new IllegalStateException("Found an empty source directory. Remove it: \nrmdir " + d.file.absolutePath)
}
}
}
}
}
The above snippet, which should be placed in the root build.gradle
, looks at all SourceTask
s in the project. It traverses the source directory structure to find any empty directories. When it finds a source directory, it prints a helpful error message to the console telling you how to delete that directory. It performs this check before any source task is run.
You can also remove all empty directories by running the following command:
find . -empty -type d -delete
One thing to note is that if the root directory is empty, this does not fail the build.
So, before adding this to our codebase, we needed to be sure this didn’t impact our build times. In this case, every single module in our project took less than 5 milliseconds to run this check, except one particularly large legacy module, which took 89 milliseconds to run.
Is it worth it? We decided it was. Almost all of our developers were experiencing build cache misses while having a clean local Git state compared to master. After pushing the change, many developers started getting a little frustrated when they would have several build failures in a row because of nested empty directories. It is quite common to have these lying around after refactoring code or changing branches, since Git ignores empty directories. However, after the initial change, these failures became quite rare and we eventually increased our remote build cache hits.
Next, we saw a build cache miss for a generated class called InAppBillingService.java
. This class is generated by the compileDebugAidl
task that is part of the Android Gradle plugin. It looks for any source files ending in .aidl
, and for each one, it generates a Java file before the JavaCompile
task runs in your AIDL source directory. It then adds these generated Java files to the source directory of the JavaCompile
task. We opened the generated Java source file to find this comment:
/**
* This file is auto-generated. DO NOT MODIFY.
* Original file /Users/no/workspace/soundcloud/android/libs/src/main/java/com/soundcloud/android/InAppBillingService.aidl
*/
This file is always going to generate a remote cache miss, because the absolute path is different on every machine. We used the power of Gradle to solve this one as well:
tasks.named("compileDebugAidl").configure {
doLast {
outputs.files.forEach { directory ->
directory.traverse(type: FILES) { file ->
file.setText((file as String[]).findAll {
!it.contains('Original file:')
}.join(‚\n'), 'utf-8')
}
}
}
}
The above code snippet finds the compileDebugAidl
task and adds a doLast
action, which iterates through each of its lines and rewrites it as is, skipping the line that contains “Original file.” The file is now the same on every machine. That solved the build cache miss for us, but we want to make the world a better place, so we filed a bug against Google.
While examining this task, we also noticed that the CompileAidl
task would take 1.3 seconds to perform an up-to-date check — something which should typically be done in less than 10 milliseconds. The reason it took longer is because the check filters the entire source tree. A Gradle engineer opened an issue about this for us, but in the meantime, as a workaround, we moved this AIDL file into its own module, reducing the run time to .003 seconds.
There are, of course, a couple more build cache misses we fixed here and there that aren’t mentioned in this article. One fun thing to note: Make sure your annotation processors generate the same code locally as on CI. And when annotation processors generate lists, make sure they are ordered or sorted, as code may be processed in different orders on CI.
The end result of all this work was quite noticeable when looking at build times because speed gains from modularization are multiplied by improving cache hits. In the week before this work started, our build times were 49.55 seconds on average.
Even accounting for four months of increased codebase size, after the majority of this work was done, our average build speed dropped to 36.84 seconds on average. You can also note the increased remote and local build cache usage.
From 49.55 to 36.84 seconds per build on average is a huge change in the day-to-day lives of developers, as our team averages around 40 builds per developer per day. It doesn’t just mean that developers are faster at writing code. Having faster build speeds means that developers have more time to explore the realm of possibilities when building new features. More explored possibilities leads to more maintainable code, increased app stability, and fewer bugs.
If you’d like to learn more, check out “Remote Build Cache Misses” from BerlinDroid Gradle Night. You can watch the talk or browse the slides.