Isolation makes tests tautological

Simultaneous Circles, 1934 - Robert Delaunay

There are basically two ways of writing unit tests: the Classical approach and the Mockist approach. They can also be viewed as black-box and white-box approaches, or behaviour tests and implementation tests. The Classical approach involves testing module (unit) behaviour by passing test data to its public API and verifying that returned output or generated side effects are as expected. It also involves switching dependencies which cannot easily be used for tests (say, because they make network calls and would make tests too slow and unreliable) with simple fake implementations. It does strive, however, to use as much of real code and as little of fake code as feasible. Mockist approach, on the other hand, tries to make the unit under test as small as possible, usually a single class or even a single method, and relies heavily on mocking frameworks to isolate it from its dependencies by mocking the interactions. A lot has been written about why the mockist approach turned out not to be a good approach, and there seems to be now a general agreement between the leading thinkers in software development that the classicist approach is superior. One of the drawbacks1 of isolation by mocking is that it tends to make tests tautological. It is a particularly interesting drawback, and a lot has been written about it as well, but the thesis of these articles seems to be that mocking tends to make tests tautological. I will argue two things here: that it is isolation and not mocking per se that makes tests tautological, and that it doesn’t merely tend to make them tautological, it necessarily makes them so.

Tautological tests

All tests have the expected part and the actual part2. Tautological tests are tests where expected and actual parts are equivalent in some respect such that it makes them useless or meaningless in that respect. They pass even if the implementation of the module under test is wrong. The simplest form of tautological tests is where expected and actual parts are exactly the same:

assertEquals(
    new Sum(3, 4).toString(), // expected
    new Sum(3, 4).toString()  // actual
);

It is obvious why this test is completely useless. No matter if the implementation of Sum is correct or not, the test necessarily passes (let’s assume there is no randomness or exceptions for simplicity). A non-tautological test would look like this:

assertEquals(
    "7",                     // expected
    new Sum(3, 4).toString() // actual
);

Here the expected part is completely different than the actual part. Expected part is a literal, while the actual part is an evaluation of expression which contains the object under test.

A slightly more subtile form of tautological tests is where the expected part is just a copy-paste of the implementation under test. Since they are equivalent (developer just copy pasted his implementation), these tests also pass even if the implementation is wrong. E.g.:

assertEquals(
    String.valueOf(3 + 4),
    new Sum(3, 4).toString()
);

Expected and actual parts are not exactly the same here, but they are equivalent in the sense that the essential part of the implementation (a + b) was just copy-pasted into the expected part of the test. The developer just assumed his algorithm must be correct, and so he used the algorithm to express the expected value. If the algorithm used for implementation was wrong, e.g. a - b, and it was copy-pasted into the expected part, the test would still pass because expected and actual would still evaluate into the same (wrong) values:

assertEquals(
    String.valueOf(3 - 4),
    new Sum(3, 4).toString()
);

A third form of tautological tests involves isolation and (usually) mocking. When implementation depends on other objects, mockists tell us we can (should?) isolate the class under test by mocking out its interactions with its dependencies. If each class is completely isolated and thoroughly tested (so the argument goes), then we can just focus on the single class we are implementing by mocking out all of its dependencies, because we can trust they work correctly. This supposedly makes testing simpler. It doesn’t, it makes tests tautological. The expected part of such tests includes not only the expected output, but also the expected behaviour of all dependencies. If the dependencies are replaced with mocks which are set up in the expected part, then expected part is equivalent to actual part in the sense that the mocked dependencies used in the actual part will behave exactly the way they were prepared to behave in the expected part. The second form of the tautological tests described above involved the assumption of the developer that his algorithm is correct and can be used in both expected and actual parts. The third form involves the assumption of the behaviour of dependencies, and that behaviour is used in both expected and actual parts. If the real behaviour of these dependencies differ from the assumed one, making the implementation wrong, the tests still pass. I will demonstrate this with an example.

Example: biometric alarm system lock

Let’s say we need to implement an alarm system lock with face recognition. It has a camera which can capture a person’s face, and if that face is authorized, it should unlock the door, otherwise it shouldn’t. We have the following three abstractions we can use for the implementation:

public interface Camera {
    Face capture();
}
public interface AuthorisedFaces {
    boolean matches(Face face);
}
public interface Emotion {
    boolean visibleIn(Face face);
}

Let’s assume for the purposes of this example that all implementations of these interfaces are thoroughly tested and work correctly. We could implement our alarm system this way:

public final class CorrectAlarmSystem {
    private final Camera camera;
    private final AuthorisedFaces authorised;

    // Constructor...

    public boolean unlock() {
        return authorised.matches(camera.capture());
    }
}

Here we use Camera and AuthorisedFaces. We don’t use Emotion. When client calls unlock(), we capture a face with the Camera and match it against the list of authorized faces. If it matches, we unlock the door, otherwise we don’t. This is a correct way to implement it, and we will use tests to verify it. But first, let’s write a wrong implementation. This time we will not use AuthorisedFaces, we will use a specific implementation of Emotion instead.

public final class WrongAlarmSystem {
    private final Camera camera;
    private final MurderousRage emotion;

    // Constructor...

    public boolean unlock() {
        return emotion.visibleIn(camera.capture());
    }
}

This is a very bad way to implement an alarm system. It will unlock the door if it detects MurderousRage emotion in the captured face. You don’t want this in your home. Our testing approach must catch the error.

Isolationist - Mockist approach

Let’s see how an Isolationist - Mockist test would look like.

@Test
void unlocks() {
    Camera camera = Mockito.mock(Camera.class);
    Mockito.when(camera.capture()).thenReturn(testFace());
    MurderousRage emotion = Mockito.mock(MurderousRage.class);
    Mockito.when(emotion.visibleIn(any(Face.class))).thenReturn(true);
    assertTrue(
        new WrongAlarmSystem(camera, emotion).unlock()
    );
}

This is a positive test. Negative test would look the same, just inverted.

@Test
void doesNotUnlock() {
    Camera camera = Mockito.mock(Camera.class);
    Mockito.when(camera.capture()).thenReturn(testFace());
    MurderousRage emotion = Mockito.mock(MurderousRage.class);
    Mockito.when(emotion.visibleIn(any(Face.class))).thenReturn(false);
    assertFalse(
        new WrongAlarmSystem(camera, emotion).unlock()
    );
}

Both tests pass, even though the implementation of WrongAlarmSystem is wrong. The assumption of the developer was that when the door is to be unlocked, Emotion#visibleIn(Face) will return true. This is a stupid assumption, but that is not the point. Programming in the real world we all make assumptions that might be less stupid, but nonetheless wrong. Good tests should catch those wrong assumptions. This test did not catch it because the wrong assumption is both in the expected part and in the actual part. In the expected part the assumption is there when the mocks are being set up, and in the actual part the assumption is there when the mocks are being used. The test is tautological.

Classical approach

Classical tests attempt to use as much of the real code as possible, and only fake the components which are too impractical to use in tests. In our case this would be Camera, because the real implementation uses actual hardware and we don’t want to depend on it when running tests. We will use the real MurderousRage implements Emotion object with the real detection algorithm.

@Test
void unlocks() {
    assertTrue(
        new WrongAlarmSystem(
            new FakeCamera(authorizedFace()),
            new MurderousRage()
        ).unlock()
    );
}

@Test
void doesNotUnlock() {
    assertFalse(
        new WrongAlarmSystem(
            new FakeCamera(unauthorizedFace()),
            new MurderousRage()
        ).unlock()
    );
}

Notice we have to use real faces for Classical test, because these tests use real recognition algorithms. For Isolationist-Mockist tests any images (or even just byte buffers, depending on technical details) could have been used, because these tests don’t do anything with them.

Since we need the wrong assumption of the developer to be caught, at least one of the classical tests has to fail. And it is very likely that this is the case. Both positive and negative tests can pass only if it happens so that the authorized face used by the test looks very angry, and the unauthorized does not. This is extremely unlikely, and the likelyhood of this decreases even more as additional tests with more faces are written. The Isolationist-Mockist tests, on the other hand, necessarily pass because they are tautological.

Classical tests for CorrectAlarmSystem

If you are curious, I will show how Classical tests for the right implementation should look like. We will fake the part of the AuthorisedFaces implementation which connects to a database, because real database would make our tests too slow. We will use the rest of the actual AuthorisedFaces implementation. This is what it looks like:

public final class SimpleAuthorisedFaces implements AuthorisedFaces {
    private final AuthorisedFacesStore store;

    // Constructor...

    @Override
    public boolean matches(Face face) {
        return store.list().stream()
            .anyMatch(bytes ->
                new SimpleFace(bytes).difference(face) < 7
            );
    }
}

In order to write a unit test where the unit is as large as reasonably possible, we need to think about three things: what behaviour we actually want to test, what inputs into the unit under test we need to provide in order to observe this behaviour, and what outputs would verify that the behaviour is as expected. What we actually want to test is that if the camera captures a face which is in the database of authorised faces, the door must be unlocked. So the inputs need to be: some face which the camera would return and another, similar enough (or the same) face which would be among the authorised ones. The output must be true.

@Test
void unlocks() {
    AuthorisedFacesStore store = new FakeAuthorisedFacesStore();
    Face face = authorisedFace();
    store.add(face.bytes());
    assertTrue(
        new CorrectAlarmSystem(
            new FakeCamera(faceSimilarTo(face)),
            new SimpleAuthorisedFaces(store)
        ).unlock()
    );
}

We create FakeAuthorisedFacesStore and add a test face. We setup the FakeCamera to return a similar face to the one in the store. We expect the result to be true.

The negative test sets the camera up to return a completely different face than the one which is in store and expects the door not to unlock.

@Test
void doesNotUnlock() {
    AuthorisedFacesStore store = new FakeAuthorisedFacesStore();
    Face face = authorisedFace();
    store.add(face.bytes());
    assertFalse(
        new CorrectAlarmSystem(
            new FakeCamera(unauthorisedFace()),
            new SimpleAuthorisedFaces(store)
        ).unlock()
    );
}

Further considerations

Developers are smarter than that?

Do we really need classical tests to tell us the implementation of alarm system which unlocks the door for the person who looks like he wants to murder you is wrong? No, but in real world situations we usually don’t get to choose between clear abstractions such as AuthorisedFaces and Emotions. The real codebases are much larger, quite anemic (the Mockist approach very often goes hand in hand with anemic codebases), and the result of many different people trying to implement tasks as fast as possible without enough design thinking due to pressure of deadlines. It’s quite likely we will have to choose between components such as FaceService, FaceValidator, FacialAttributesBuilder, FaceTemplatesEngine and CombinedFaceValidatorProviderFactory, each with ten or twenty public methods, operating on data structures (usually incorrectly called “DTO”s) which look more or less the same except for slight difference in data fields. Can you be sure you picked the right service method / data structure combination, or would you like your tests to help you verify that? If you just mock everything to work as you expect, the test will be useless in this regard. It will tell you whether you implemented your class the way you intended to, but it won’t check the assumptions behind that intention. What you want is a test which runs your class together with as much of the real code as possible, because if your intention is wrong (you chose the wrong component), the more of the real code runs together with your wrong implementation, the higher probability something will go wrong and the test will fail.

Isolation or mocking?

Technically, mocking is not the essence of the problem (it is the essence of other problems I did not mention in this post) - isolation is. We can write equally tautological tests even by avoiding mocking frameworks and using fake objects - as long as we isolate the class under test from its real dependencies. If our assumptions behind our implementation are wrong, there will be no real code to check it - our fake objects will operate according to the same wrong assumption.

Test Driven Development

Test Driven Development can help, but not sufficiently. If we write tests first, we will be forced to think more. Most tautological tests are written when the developer who can’t do TDD is under time pressure. He is tempted to write the implementation the way he imagines it should work, and then just jot down some tests blindly mocking the interactions of his implementation with its dependencies. If his assumptions are wrong, these tests won’t catch it. TDD could help him clarify his assumptions and make it more likely his implementation will be correct, but it will not make his tests better, as long as isolation remains. Even if TDD corrects his assumptions, they will still be reflected in both expected and actual parts of the test, and the test will still be tautological.

Summary

Good tests should be able to catch two types of mistakes by the developer: they should fail if he implemented the object differently than he intended to, and they should fail if he implemented the object exactly the way he intended to, but the assumptions behind his intention were wrong. Isolation of dependencies of the object under test makes the test unable to catch the second type of mistakes. Even if developer’s assumptions are wrong, they will be reflected both in the expected and the actual parts of the test, making the test tautological, and therefore useless.

  1. There are many more serious drawbacks, but I will not get into them here. I encourage the reader to see the other articles I have linked to. 

  2. Even though BDD tests have three parts (given, when, then), “given” translates to “expected”, and “when” plus “then” translates to “actual”. 

Written on February 2, 2020