The Hidden Powers of Total Program Cardinality (2 of 2)
Recap of part 1
Total Program Cardinality (TPC) remains a largely under-explored facet of software engineering, but it holds transformative potential. It is a static analysis measure, representing the complete range of ways a program can pass a type checker. In part 1 we saw how it is inspired by the principles behind algebraic data types and presents an innovative approach to understanding and enhancing software quality.
We already showed hwo to calculate the TPC metric for data types and how the S.O.L.I.D. principles can be interpreted in terms of TPC. In this second part we will apply the principle to our function definitions. How TPC makes sense of multiple language features across many different programming languages, and why support for Generics can be such a deal breaker for many engineers.
Dealing with functions: The Exponents
The fact that functions are exponents means the set of potential inputs can be observed to recombine into so many resulting value . We’ve already seen products and sums in the data types. Let’s now consider how TPC relates to them as method parameters.
Leaving higher order functions (functions that take functions as parameters), we need to consider both products (classes, structs) and sums (enums, interfaces). The case for products is the simplest, since products distribute over exponents, and developers / compilers can factor these out as needed / at will / automatically. This exercise is actually called currying / uncurrying, partial application, or the builder pattern. Which indeed are useful, and can be reflected in a TPC score.
The distributive law does not apply to sums of powers however, which happens to be a very common scenario in software systems. Having a sum in an exponent is the same thing as taking an interface, or an enum as a parameter. Whenever this happens, that you have an enum as a parameter, TPC says you are better off writing individual functions for each enum value.
This conclusion is not just from TPC; you will also notice many programming languages have explicit systems and design patterns dedicated to support this fact. Extension methods, type classes, polymorphism of class hierarchies are all work-arounds attempting to distribtute a power of a sum into a sum of powers. This is immensely powerful as it translates to less if or switch statements.
- (aᵇ + aᶜ) < a^(b+c)
This can be very dramatic for complex return values (a), and by now you might be convinced we have been optimizing for this fact without calling out TPC explicitly, and yes; OOP introduced dynamic dispatch, which is a language feature that almost exclusively addresses this point. You will see this feature baked into enums themselves on multiple languages, and whenever you have a class hierarchy the different object values serve to avoid the sum in the parameter;
You will also see an objective measure for the principle of writing small re-usable methods; it is best to avoid an Option as a parameter and instead leverage Option’s map function to pass a simpler version to it.
Extension methods can be used for a similar purpose. The beauty here is transforming complex parameter relationships into simplified interactions. You will find control flow statements mostly disappear when you implement the techniques of this stage and your methods require less backtracking to read them. This stage might not be the most impactful in absolute terms, but it is, probably the most surprising.
An important side note should be made about pattern matching: Syntax makes a difference on how we interpret pattern matching in its ability to isolate enum parameters per value.
sign x | x > 0 = 1
| x == 0 = 0
| x < 0 = -1
and
take 0 _ = []
take _ [] = []
take n (x:xs) = x : take (n-1) xs
start to isolate the parameters before the body is read and do a good job of limiting what could possibly be written there. This can be contrasted with a conditional block, where the body must be entered to understand what must be done.
sign x =
if (x > 0) 1
else if (x == 0) 0
else if (x < 0) -1
In essence, pattern matching is in the spirit of splitting an enum’s value, and it is very much clear when the other potential values for the enum are simply unavailable and out of scope whenever we wish to implement a given feature. TPC reflects why we often recommend to place ‘early returns’, or have switchboard methods.
At this point; the bulk of the returns on following TPC related recommendations have been reaped. In a previous article titled: “Type System Savings” we followed a particular example which includes the following table:
There you can see the biggest win of using algebraic data types lowering TCP from 18,014,398,509,481,984 to 8,192.
If developers spend most of their time reading, thinking and understanding what their code does; TPC is a measure of how much effort it takes to decipher what methods do. Moreover TPC gives a quantitative measure which aligns surprisingly well with best practice suggestions. DevInside recorded a video highlighting this distinction, and you will find developers already programming constructively at this level refuse to go back.
Generics: The Threshold of Theorem Discovery
Drawing inspiration from Philip Wadler’s enlightening paper, “theorems for free,” generics illuminate the path to theorem discovery. This approach effectively narrows down the cardinality of a parameter’s type, potentially morphing it into what’s known as an existential type. It offers a fresh perspective, ensuring that for every type A
, the corresponding method or proof remains unaltered.
It means you can prove there is only one possible function that has this signature
id :: a -> a
It is unique.
Informally, I can convince you this is true due to the nature of generics. By making the input parameter generic, there is nothing we know about what can be done with it, other than returning it back unchanged since we have one.
The same is true for a method I leveraged in a previous article:
fold :: (a -> c) (b -> c) (Either a b) -> c
There is only one way to get a value of type c
during the implementation of this function and that is the “theorem for free” Wadler was showcasing. In terms of maintainability, it means all these generic functions which have only one implementation become shorthand and immediately get cataloged in developers’ toolboxes.
While this fourth stage, rooted in generics, might not elicit monumental shifts in sheer absolute terms, its implications cannot be dismissed. In the case above, it was effectively halving the implementation space. In other situation, we might see more powerful reductions.
Although most of the community will stay at the previous three stages reaping that easy 80% of the wins, distinctly efficient professionals (craftsmens) operate at the fourth stage of reduction. At this stage, the amount of re-use given to core functions is now astronomical, and the body of our own functions has naturally shrunk to about 3–5 lines.
What results from this perspective,
The first thing worth sharing is how I addressed a bug in production by just reducing the cardinality of the codebase. When I was working on a booking system, we ran into an issue which allowed empty parties to be booked, causing an exception. But an empty party should not be representable to begin with, so converting this snippet:
case class Party(
adults: Int,
children: Int,
babies: Int
)
Into this:
case class Party(
adults: Int Refined Positive,
children: Int Refined NonNegative,
babies: Int Refined NonNegative
)
immediately revealed the hidden bug by preventing its compilation. The problem was easily fixed proving a lower cardinality really does leave less places for bugs to hide.
In this spirit, we have started adding static analysis rules to Philip’s open source roslyn analyzers. Rules like: “Avoid returning null”, “Introduce generics”, and “Replace product with sum” are all guided by the reduction of TPC as the driving principle.
The adoption of Total Program Cardinality as a software quality metric offers profound insights and tools for developers. It not only reshapes our understanding of software, but also fortifies our strategies to design and evaluate robust, efficient, and elegant software solutions.
Connecting these insights with existing best practices bridges the gap between software maintainability, developer intuition, and commonly desired language features with a quantifiable, objective, and tractable quality metric.
Ultimately, this ensures the core tenant for Software Excellence (my team) at Philips: Better, Faster, Forever, and key to the wider Philips goals of improving the lives of billions of People, through better patient health and outcomes.