Mobile: Unit Testing

September 12th, 2011 by Jörn

When we started the Mobile project early 2011, unit testing JavaScript was one of the goals to tackle on the technical side. The history of custom JavaScript code at SoundCloud up until then rarely included unit tests, so providing references and the necessary ground research was important for both the project at hand as well as for other projects at SoundCloud.

This articles aims to provide an overview of the tools we use, what worked well and what we need to improve.

Tools

When we started the Mobile project, there were just two developers on the team, Matas and Jörn. With Jörn already maintaining and supporting QUnit for three years, this particular choice was an easy one. If you haven’t yet heard of it: Among available unit testing frameworks, QUnit is among the most popular ones. There’s a comprehensive tutorial over at ScriptJunkie.

As we were building an API client in the browser, mocking API requests was really important for us. We didn’t want to depend on the API being available, both to be able to work offline and to not depend on data that changes all the time. At the start of the project, jQuery 1.5 and its ajax extension points like custom transports weren’t available yet, so we went with mockjax, a library adding mocking on top of jQuery’s ajax module.

To run tests in continuous integration systems (at SoundCloud, on Jenkins), we looked at quite a lot of options. Jörn has some slides that give an overview of that research. Other teams at SoundCloud use Selenium, which wasn’t an option for us due to the lack of support for Chrome or Safari (which is still a work in progress). In the end we went with PhantomJS. PhantomJS is built on top of Qt-WebKit, provides a reasonable browser-like environment and enough API to run our unit tests and report back results.

We considered using TestSwarm to distribute running of our unit tests to regular desktop browsers as well as mobile devices. The lack of a Jenkins-TestSwarm plugin (now actually available) as well as tools for managing VMs, browsers, simulators and emulators (or even managing mobile devices) was enough of a hurdle that we skipped this. Until we get this in place, we won’t know how many bugs we could have catched earlier with this additional setup.

The Good

QUnit does a pretty good job. The few small issues we encountered were swiftly fixed upstream. We ended up customizing the module-method quite heavily, mostly to integrate Mockjax. Overall, Mockjax also did a pretty good job, once we figured out a pattern that worked for us. Here’s a typical module-call for testing Backbone Views and Models that fetch their data from the API:

module("user", {
  "/users/183/tracks": "/fixtures/forss-tracks.json",
  "/users/183/playlists": "/fixtures/forss-playlists.json",
  "/users/183/favorites": "/fixtures/forss-favorites.json",
  "/users/183/groups": "/fixtures/forss-groups.json",
});

We still call the module-method with the module-name as the first argument. The second argument can contain setup- and teardown-properties, just like QUnit expects it. In addition, we pass url-mock pairs, which are passed on to $.mockjax. In addition to those, we define a catch-all to make sure that no test ever ends up calling the actual API. And we have a global timeout for each test to ensure a broken async test never prevents the suite from finishing.

var testTimeout;
module = function(name, mocks) {
  QUnit.module(name, {
    setup: function() {
      if (mocks) {
        if (mocks.setup) {
          mocks.setup.apply(this, arguments);
        }
        $.each(mocks, function(url, mock) {
          if (/setup|teardown/.test(url)) {
            return;
          }
          if ( $.type(mock) === "string" ){
            $.mockjax({
              url: "/_api" + url,
              proxy: mock,
              responseTime: 1
            });
          } else {
            $.mockjax($.extend(mock,{url: "/_api" + url}));
          }
        });
      }
      $.mockjax({
        url: "/_api*",
        responseTime: 1,
        response: function(obj){
          var message = "Mockjax caught unmocked API call for url: " + obj.url
          if (obj.modelType) {
            message += ", from component " + obj.modelType;
          }
          ok( false, message );
        }
      });

      testTimeout = setTimeout(function() {
        equal( true, false, "test timeout (5s)" );
        // could involve multiple stop calls, reset
        QUnit.config.semaphore = 1;
        start();
      }, 5000);
    },
    teardown: function() {
      clearTimeout(testTimeout);
      $.mockjaxClear();
      if (mocks &amp;&amp; mocks.teardown) {
        mocks.teardown.apply(this, arguments);
      }
    }
  });
};

The problem with this design was the lack of a $.mockjaxClear(url) method – you can’t remove an existing handler or replace it (mockjaxClear(index) is supported, but didn’t help us). We needed that to test error conditions, for example, when the API returned a 404 when asking if a particular track was a favorite of a user. In some cases, we could just mix it with other mocks.

In other cases, we grouped these tests into a separate module-call (with the same name):

module("user", {
  "/users/183/playlists": {
    responseStatus: 500,
    responseText: "servererror",
    responseTime: 1
  }
});

With that, we did the regular tests in one place, the error conditions in the other.

The Bad

An interesting QUnit feature, inspired by Kent Beck’s work on JUnit MAX, is its built-in reordering. It basically records the results of one test run in sessionStorage, then looks at those results during the next run. If a test failed before, its scheduled to run first. All that happens without changing the order of the result output. If it works, you can get the relevant test results much faster then for regular sequential runs, as its likely that tests that failed before will fail again, while passing tests are a lot less likely to start failing.

The problem with that reordering for us was that with all the asynchronous tests in our suite, sometimes tests would have side effects on other tests. As long as they ran in a fixed order, those effects weren’t noticeable. Instead of addressing the actual side effects, we ended up disabling the reordering. Its on the pile of chores to still address.

Overall, the unit tests did a good job, though its not quite clear how much value they actually provided. Most bug reports are about visual issues, sometimes small glitches, often enough device specific issues. As a mobile web developer, Android, or Andy as we started to call it, becomes kind of an IE6. It gets updated only with the OS, the OS isn’t updated, so we’re stuck with this browser that was okay a year ago, but is a real pain today. On Android 2.1, you even have the same issue as on IE6: HTML5 elements like ‘header’ or ‘article’ aren’t styled. At least on IE6, there’s a workaround…

Anyway, the other category of bugs were reported much less frequently, and unit testing didn’t help there either. We learned that client-side error logging is extremely valuable. Tools like Airbrake and Bugsense still have a long way to go, but writing a single-page web application without logging of client side errors means you never know about the thousands of errors your users get to see. Expect another post on that topic.

The Ugly

As long as mockjax did its job, we were happy with it. When it didn’t, we had to look at the source, and we weren’t happy anymore. The whole thing is quite a mess and in dire need of some good refactorings. Still, in terms of features, alternatives like jQuery 1.5 custom transports or sinon.js just aren’t on par, so we stuck with mockjax.

What we now mostly gave up on is PhantomJS. The Jenkins-job that ran our QUnit tests using PhantomJS is currently disabled, as it kept failing for months. We spent overall several days trying to find the source of the one failing test, giving up at the end. We still don’t know why it was failing, and there were several hurdles that made it difficult to debug:

It failed only on our Jenkins server. Running the tests locally, using the same PhantomJS version, worked fine. The difference was the enviroment, with mostly OSX running on developer machines, but Debian Lenny on the Jenkins box. Sure, that’s a problem, but the point of the tool is to provide a browser-like enviroment, it shouldn’t matter what system its running on.
We were stuck with PhantomJS 1.1, even after 1.2.x was out for several months. While we could adapt to the completely backwards incompatible API changes from 1.1 to 1.2, we didn’t find any way around PhantomJS just crashing on our testsuite, with no useful output. If you’re interested, you can find the debugging process somewhat documented on this Google Groups thread. Even debugging with gdb proved to be a waste of time. The unhelpfulness of PhantomJS when failing to load a page is stunning.

So as nice as PhantomJS is, the combination of not being able to upgrade and not being able to fix the existing build forced us to abandon it. TestSwarm is a lot more interesting now with the existing Jenkins plugin. And with Chrome support upcoming in Selenium, that is an attractive short term solution as well.

Epilog

As you can see, this story isn’t over yet. It seems to share a common theme with other developer tools, be that editors, bug tracking or testing tools: most of them do their job, but we aren’t satisfied with any of them.

What are your experiences? What tools would you like to see improved, replaced or invented?