Gradle Remote Build Cache Misses

August 30th, 2019 by Nelson Osacky

Until recently, one of the top technical risks facing SoundCloud’s Android team was increasing build times. Our engineering leadership was well aware of the problem, and it was highlighted in our company’s quarterly goals and objectives as modularization. Faster build times means more productive developers. More productive developers are happier and can iterate on products more quickly.

Modularization is key to decreasing build times, but avoiding work is another important part of the puzzle, and build caching is one way to avoid that work. Gradle, our tool for building Android, has a local file system cache that reuses outputs of previously performed tasks. We have been using the Gradle remote build cache in order to save our developers’ time. It helps us avoid redoing work that other teammates have already done or switching to old branches. However, to get the full benefits of caching, you have to go beyond simply setting it up.

This is the story about how we solved some cache misses with our remote build cache and, in turn, saved our developers’ time. Maybe some of these tips can help you and your team save some time as well. This is part one of two. More cache misses and results will come in part two.

Tasks and Caching

A task is an amount of work with inputs and outputs, and the remote build cache works by storing the outputs of Gradle tasks. If all the inputs are the same, we can skip the work and just reuse the outputs.

For example, in the JavaCompile task, the inputs are Java files such as Audio.java and Waveform.java, and the outputs of the compile task are the compiled class files such as Audio.class and Waveform.class. If we haven’t added any new Java files or changed any of the contents of the Java files, we can reuse the output of the previous JavaCompile task.

To determine that none of the inputs have changed, Gradle computes a build cache key, which uniquely identifies a task. It is computed based on the list of inputs and outputs for that task. Gradle then packs and stores the outputs of cacheable tasks in the Gradle build cache. A cache is simply a store of the build cache keys and the outputs. The cache can be local (file system) or remote (http).

Comparing Builds

A cache miss occurs when Gradle can’t find a corresponding entry for a task that should have been cacheable. To identify build cache misses, we use Gradle Enterprise build scans. Gradle Enterprise identifies cache misses by recording the inputs to a Gradle task, which can then be used to compare two builds. Here is an example of the same exact build on the same computer but with a whitespace added to a specific file:

From this, we can gather that whitespace in Java source files is treated as part of the input’s cache key. Gradle must re-execute this task if any of the whitespace in any of the inputs changes, which is unnecessary.

BuildConfig Misses

We can also use Gradle Enterprise to compare the same task running locally vs. in our build seed cache. In order for the remote cache to work, the inputs to the tasks between the local machine and the remote machine must match.

The BuildConfig is a file in Android builds used to keep environment and configuration constants. Here is a sample BuildConfig class, and as you can see, all the values are constants:

public final class BuildConfig {
    public static final boolean DEBUG = Boolean.parseBoolean("true");
    public static final String APPLICATION_ID = "com.soundcloud.android";
    public static final String BUILD_TYPE = "debug";
    public static final int VERSION_CODE = -1;
    public static final int TEST_RETRY_COUNT = 0;
}

However, the constants can be set by functions, which may be different depending on the environment:

afterEvaluate {
  android.libraryVariants.all { variant ->
    variant.resValue "bool", "analytics_enabled", "true"
    variant.resValue "bool", "verbose_logging", "false"
    variant.buildConfigField "int", "TEST_RETRY_COUNT", "${getRetryCount()}"
  }
}

In our case, the TEST_RETRY_COUNT class was being set by a function, getRetryCount(). This function was very simple, but it led to different values locally vs. on the CI server:

private static int getRetryCount() {
  if (isCI) {
    return 1
  } else {
    return 0
  }
}

Since the value was always 0 locally and 1 on the CI server, we could not reuse the outputs of this task with CI for local builds. Unfortunately, this was in one of the largest modules in our codebase, which meant a large portion of our code would have had to be recompiled, even if it didn’t need to be.

Our solution was to remove this check from our build configuration and instead check it at runtime. For this case, we could determine at runtime if we were running on the CI server or not:

class ActivityTest<T : Activity> constructor(activityClass: Class<T>) {

  val isRunningOnTestLab = Settings.System.getString(contentResolver, "firebase.test.lab") == "true"

  @JvmField
  val retryRule = RetryRule(if (isRunningOnTestLab) 1 else 0)
}

Kotlin Compiler Flags

We also noticed that all of our Kotlin compile tasks were not pulling anything from the cache. Since we use Kotlin in nearly every module, this meant we were pulling almost nothing from the Gradle cache.

We looked at the source of KotlinCompile to understand what incremental (shown above) was and what it did. incremental is true by default, but it took us a while to realize that we had disabled incremental compilation for all our CI tasks.

We turned incremental compilation back on for our CI nodes that seed our build cache and immediately saw a big boost in cache hits. More cache hits meant less time spent building something that was already built.

Conclusion

In the end, we learned that simply turning on a remote build cache is not very effective in improving build speeds. In fact, blindly turning it on might not get any cache hits at all. It is important to keep in mind how tasks work, how they behave on different systems, and how to use the right tools to analyze differences in task inputs across different systems. That’s how we solved some cache misses with our remote build cache and, in turn, saved our developers’ time. Maybe some of these tips can help you and your team save some time as well.

In part two, we’ll dive into more complicated cache misses and talk more about the impact of solving them.

In the meantime, if you’d like to learn more, check out “Remote Build Cache Misses” from BerlinDroid Gradle Night. You can watch the talk or browse the slides.