Unit Tests give you Observability!
Recently I was giving a talk about Observability in the context of maintaining stable production environments for highly distributed micro-service based systems, and was struck by the reality in the title of this post: Unit tests give you observability! It isn’t just system tracing, logging and metrics that can help you peak into what ultimately controls the behavior of the systems we control, but instead those are just another tool towards that end.
Observability is a concept developed in Control Theory to measure our ability to predict what the outcome of changing some feature in our system will be.
The definition on Wikipedia for Observability is how well internal states of a system can be inferred from knowledge of its external outputs.
Unit tests are a dynamic analysis technique where the code is studied by actually running it under some controlled scenario. This is similar to other techniques like performance testing, memory or cpu usage profiling, however the biggest difference comes from the manner in which the results are analyzed. Writing unit tests can be automating how well a system’s internal state conforms to some expected output and once we’ve defined it as such it is easy to see how unit tests grant us a form of Observability over the software systems we write.
If you augment your automated test strategies with slightly more advanced techniques like Mutation Testing you probe much deeper into all the potential internal states of your application either in terms of control flow changes (mutants) and how you would expect them to break your stated system invariants. What you gain is the predictive power over how a change in the internal operations of the system will affect its external outputs. Textbook observability and controllability.
Concretely, mutation testing is a method of evaluating test quality by injecting bugs into the code and seeing whether the tests detect the fault or not, but if you take it from the perspective of observability; then what mutation testing does is denounce the ability for your unit tests to observe changes in control flow statements, like the conditions tested in IF statements, the details of loops, or function return values. Every time you kill a mutant; you prove observability over the particular details of the control structure in question.
A practical example of how observability comes from testing is precisely by looking at the surviving mutants. For each surviving mutant there is one internal property we were unable to tease out from just the output of the tests. You can think of surviving mutants as a measure of the unobservable subspace of the written program. For example, if I change a conditional in my code under test then I’ve altered the internal state of the program and would consequently expect a unit test with actual observability to discern this change.
You can even stipulate any given unit test increases in value as per the number of mutants it kills.
In conclusion
Eventually I realized that Chaos testing is to deployed micro-services as mutation testing is to service library code, and more importantly an important way to evaluate how useful your unit tests are is to gauge the observability they grant you.
Even though the connection with Observability and Control Theory in general was made in the context of complex, global, cloud or micro-service deployments, the reality is this connection with control theory is much more profound and has existed for a much longer time in enterprise software development.