A categorical view of covariance and contravariance in C++
First off, I will give an extremely brief introduction to the necessary category theoretical notions. Interested readers are advised to continue their research in other sources, like Mac Lane’s classic Categories for the Working Mathematician, or read about the topic on Wikipedia. For the passer by, this should be a not terribly difficult introduction.
Then, I’ll try to make the parallels clear between the C++ notion of return type covariance, and the categorical reasons for that name. It’s not a formal relation, it is simply borrowed intuition from category theory. Indeed, the term actually had to pass through the machinery of type theory first.
Basic category theory
Categories
Categories are mathematical constructions, aimed at extracting general theorems from various, apparently unrelated structures.
Definition. A category \(C\) has a class of things called its objects, denoted \(Ob(C)\), and a class of functions from some element in \(Ob(C)\), to some other element in \(Ob(C)\), called \(Mor(C)\), or its morphisms. For any \(X, Y \in Ob(C)\), the family of morphisms from \(X\) to \(Y\) is called \(Hom_C(X, Y)\). Thus,
\[ Mor(C) = \bigcup_{X, Y \in Ob(C)} Hom_C(X, Y) \]
When the context is clear, we will omit the \({}_C\) and simply say \(Hom(X, Y)\). Additionally, a category must also have a way to compose its morphisms, which we call \(\circ\).
Any category \(C\) must also verify some additional axioms:
- For every \(X, Y \in Ob(C)\), there must exist two morphisms \(Id_A\) and \(Id_B\), such that for every \(f \in Hom(X, Y)\), \(Id_Y \circ f = f = f \circ Id_X\).
- The binary composition operation, \(\circ\), must be such that \(\circ : Hom(Y, Z) \times Hom(X, Y) \to Hom(X, Z)\).
- Composition of morphisms must be associative. Thus, for any four \(W, X, Y, Z \in Ob(C)\), and any three morphisms \(f \in Hom(Y, Z), g \in Hom(X, Y), h \in Hom(W, X)\), the following holds:
\[ (f \circ g) \circ h = f \circ (g \circ h) \]
As a few examples, we can present these:
- The category of groups, Grp, where the objects are groups, and the morphisms are group homomorphisms.
- The category of vector spaces over a field \(k\), Vec\({}_k\), where the objects are vector spaces, and the morphisms are linear transformations.
- The category of topological spaces, Top, where the objects are topological spaces, and the morphisms are continuous maps.
- The category of sets, Set, where the objects are sets, and the morphisms are functions.
- The category of rings, Ring, where the objects are rings with identity, and the morphisms are ring homomorphisms which preserve the identity.
- The category of abelian groups, Ab, where the objects are abelian groups, and the morphisms are abelian group homomorphisms.
As an example, let’s take this last category, Ab. Two objects in Ab could be \(X = (\mathbb{Z}, +), Y = (\mathbb{Z}/20\mathbb{Z}, +)\). We could have an abelian group homomorphism \(f \in Hom(X, Y)\), such that \(f(a) = a \pmod{20}\). We also have \(Z = (\mathbb{Z}/5\mathbb{Z}, +)\). We can then consider a morphism \(g \in Hom(Y, Z)\), such that \(g(a+20\mathbb{Z}) = a \pmod{5}\). Then, by the composition law \(\circ\), this gives us a third abelian group homomorphism \(h \in Hom(X, Z)\), such that \(h = g \circ f\).
In the category of \(\mathbb{R}\)-vector spaces, for instance, we could have \(X = K[X_1, \dots, X_n]\). An element of \(Hom(X, X)\) (which we call an endomorphism) could be the derivative operator, \(\operatorname{d}: X \to X\). It is easy to show that \(\operatorname{d}\) is such a morphism.
If you want to get more comfortable, pick any of those categories, and try to find some objects and morphisms between them (if they exist). In general, try to grab your favorite structures (modules, rings, algebras, magmas, monoids, whatever), and see if you can make a category out of them. Then play around with their morphisms to see how some behave. This notion of category theory doesn’t introduce anything you didn’t already know about those structures, yet.
When you’ve got some, not necessarily much, intuition about categories, we can move on to an additional bit of abstraction: the functor.
Functors
Functors are, in some way, a way to evidence that a specific construction in one category, is an analogue of some other construction in some other category. A way to make parallels between categories.
In short, I’ll define them, and continue on with some examples for their intuition.
Definition. Given two categories \(C\) and \(D\), a functor \(F\) from \(C\) to \(D\) is a pair of functions \(F_{Ob}: Ob(C) \to Ob(D)\), \(F_{Mor} : Mor(C) \to Mor(D)\), satisfying the following laws:
For any \(X\) in \(Ob(C)\), \[ F_{Mor}(Id_X) = Id_{F_{Ob}(X)} \]
For any \(f \in Hom_C(X, Y), g \in Hom_C(Y, Z)\), \[ F_{Mor}(g \circ f) = F_{Mor}(g) \circ F_{Mor}(f) \]
These two laws imply that for any \(f \in Hom_C(X, Y)\), \(F(f) \in Hom_D(F_{Ob}(X), F_{Ob}(Y))\).
We will now see a few examples of these. As a sidenote, I find it useful to see examples as soon as I see a definition in category theory, as they tend to be quite abstract.
Let’s try a functor from Grp, the category of groups, to Ab, the category of abelian groups. There is a common construction in group theory, called the commutator. Quite simply, for a group \(G\) and some \(g \in G\), we define \([g, h] = g^{-1} h^{-1} g h\), and we define \([G, G]\) as the subgroup generated by all the commutators. In some sense, for those unfamiliar, \([G, G]\) measures how far away \(G\) is from being abelian. It is easy to see that \(G/[G, G]\) is always abelian, and indeed it is the smallest subgroup of \(G\) that has this result when quotiented. We also see that, just like with any quotient, this induces a map \(\pi: G \to G/[G, G]\).
Let us the define \(F\) as a functor from Grp to Ab, such that \(F_{Ob}(X) = X/[X, X]\). Let us note that for every abelian group \(Y \in Ob(Grp)\), and for every \(f \in Hom_{Grp}(X, Y)\), there is a uniquely determined \(f' \in Hom_{Ab}(F_{Ob}(X), F_{Ob}(Y))\) such that \(f = f' \circ \pi\). Note that \(F_{Ob}(Y) = Y\), since \([Y, Y] = 0\), the trivial abelian group. Then let \(F_{Mor}(f) = f'\). I leave it to you as a simple exercise to check that, indeed, \(F\) is a functor from Grp to Ab.
There is a simpler functor, from Ab to Grp, that is simply the inclusion. It is trivial that this object is a functor.
We always have a functor \(F: C \to\) Set, to the category of sets. This functor simply forgets the added structure of \(C\)s elements, and maps them to their underlying set-theoretical sets. This is called the forgetful functor.
Given two monoids \(m\) and \(n\), we can form the one-object categories \(M\) and \(N\), whereby a functor \(F: M \to N\) is simply a monoid homomorphism. As you may guess, there is a sense in which functors are the morphisms in the “category of categories”, so to speak. This construction takes a bit more care to avoid paradoxes similar to Russell’s, but the idea that functors are structure-preserving category morphisms is well founded and can be formalized.
We also have endofunctors, which will be our next topic of study. These are functors from a category to itself. For instance:
- The identity functor, for any category, which does the obvious thing.
- The powerset functor, \(\mathcal{P} :\) Set \(\to\) Set.
- The double-dual functor in Vec\({}_k\), which takes a vector space \(V\) to its double-dual, \(V^{**}\).
The single dual, \(V \to V^*\), is a bit trickier, because it inverts the arrows it is given: for a vector space morphism \(f: V \to W\), it gives us \(f^* : W^* \to V^*\), not a function from \(V^*\) to \(W^*\). We call these contravariant functors, while the normal kind of functors are called covariant functors. We will see more of this later, but the definition of functor we’ve given can accommodate for contravariant functors just as it accommodates for covariant ones. It is, however, useful that we see this new definition:
Definition. A contravariant functor is one that inverts the arrows of its morphisms. That is, if \(F: C \to D\), and \(f \in Hom_C(X, Y)\), then \(F_{Mor}(f) \in Hom_D(F_{Ob}(Y), F_{Ob}(X))\), as opposed to \(Hom_D(F_{Ob}(X), F_{Ob}(Y))\).
Again, play around with some categories, and try to find functors between them, or inside them (endofunctors).
A category of types
In a typed programming language, one could imagine the category of its types, where the objects are types, and the morphisms are functions. This is exemplified in Haskell, where the category Hask is such a category of the language’s types.
Haskell
Indeed, Haskell also has the notion of a Functor
. However, the functors Haskell describes as Functor
s are solely a subset of the endofunctors of Hask. Indeed, a typical functor is the List Functor
. It is a functor because for every type \(T\) in Hask, it associates the type \([T]\), or lists of \(T\). Also, for every morphism in Hask, it gives you a corresponding morphism to work with lists. Indeed, this morphism is the one returned by map f
, where f was our morphism.
If we recall, the type signature for map
was the following
-> b) -> [a] -> [b] (a
Due to how ->
works, this is the same as this type signature:
-> b) -> ([a] -> [b]) (a
Here we see, perhaps more clearly, what is happening when map
works its magic. It is receiving a function from the types a
and b
, and giving us a function from [a]
to [b]
. If we recall, a functor’s \(F_{Mor}\) did exactly this: given a morphism \(f: X \to Y\), it gave us a morphism \(F_{Mor}(f): F_{Ob}(X) \to F_{Ob}(Y)\). Here, \(F\) is the List functor, and we can see that:
map id == id
Since mapping the identity over each element in a list does nothing to it, and
map (f.g) == map f . map g
Since evaluating g
on each element, and then evaluating f
on each result, is the same as evaluating f . g
of each element (recall that (f . g) x = f(g x)
).
And thus, List is an endofunctor of Hask.
Now, we can consider a functor we’ve been using all along and perhaps not realized, the arrow. ->
takes two types, say a
and b
, and produces the type of the functions from a
to b
, (a -> b)
. In short, and calling this functor \(F\):
\[ F: \textrm{Hask} \times \textrm{Hask} \to \textrm{Hask} \]
Remember this functor, since we will use it in a short while.
(For the category theory nerds, this is analogous to Hom).
C++
Now, C++ has no such notions of a category of its types, but it does not stop us from imagining it. We may call this category Cpp, where the objects are C++ types, and the morphisms are C++ functions. We won’t bother ourselves with whether or not these functions are total or whether we allow them to be undefined - for simplicity, we will assume all functions are total. See the appendix Cpp category details for more information.
We can again see how we can have a function declaration b f(a);
, where f
is a function that takes a value of type a
and produces a value of type b
. Now, if we have a vector<a>
, we can easily map this f
over it, producing a vector<b>
. In this way, we also have a way to convert b f(a)
to vector<b> ff(vector<a>)
, that respects the same laws we expect from any functorial mapping.
C++’s type system is not as outgoing with its categorical notions, but we will see the terminology being used does show some categorical insight.
Virtual functions
You may recall that C++ has virtual functions which can be overwritten by child classes. An example would be:
class X {};
class Y : public X {};
class A {
public:
virtual X* f() {return new X;}
};
class B : public A {
public:
virtual X* f() {return new Y;}
};
Now, before 1998, C++ users needed to return an X*
from the overwriting B::f
, even if they knew the actual object being returned was actually a subclass of X
. With C++ 98, the following declaration for B::f
became valid:
virtual Y* f() {return new Y;}
Now, even though the function signatures don’t exactly match between A::f
and B::f
, since they take the same parameters, but the child class returns a subclass of the parent class, the function is allowed to compile.
In other words, what became allowed is that if B
is a child class of A
, then in some sense, “functions which return B
” are children of “functions which return A
”. This is called return type covariance.
Note that it’s not a literal, type-based fact, in our previous notation, a -> Y*
isn’t an actual subtype of a -> X*
, but the compiler allows it to go through. In the following section, we will see why this is called covariance.
The arrow functor
Remember the functor \(F : \textrm{Hask} \times \textrm{Hask} \to \textrm{Hask}\)?
Well, it likely won’t surprise you that we can think of it in C++ as well. Simply put, we can consider, in the category Cpp, the functor \(Func : \textrm{Cpp} \times \textrm{Cpp} \to \textrm{Cpp}\), which takes two types a
and b
and returns the type of “functions which take an a
and return a b
”, which I’ll denote as a -> b
.
Here’s the central point of this whole post: This functor is covariant in the second argument, and contravariant in the first argument.
What do I mean by this?
Suppose that you had a type A
and a subtype B
. We will denote this relationship as B <: A
.
Then, if we fix the first parameter of the arrow functor, let’s say we fix it at some type t \(\in Ob(Cpp)\), we have that B <: A
implies (t -> B) <: (t -> A)
. This isn’t strictly so, as, as we said, C++ does not consider these to be literal subclasses. However, it works in the sense of the previous section: overwriting virtual functions. Also note that this only works with reference or pointer return types, it does not work if you simply return instances of classes. I will omit this detail hereonafter.
Consider, then, the relation A <: B
as “I can use an A anywhere I can use a B”. For instance, if Foo <: Bar
, any method that requires a Bar
can receive a Foo
since, in particular, all Foo
s are Bar
s. Thus, when a class defines a virtual function that returns an A*
, because of the covariance of the arrow functor, I can return a B*
, because whoever uses this overwritten function f
, and knowing he only asked for an A
in his parameters, even if he is given a B
instead, the call to f
will still work, because the caller can only use properties of the result which it already has. In other words, it can only use its X
-like aspects, but, in particular, our result also has Y
-like aspects, on top of the X
-like aspects.
Try to let that sink in for a few minutes. A <: B
implies (t -> A) <: (t -> B)
. This is covariance in the second parameter of ->
.
What we could also consider, is that this functor is contravariant in the first parameter, although current C++ will disagree with us. Why? First, let us say it symbolically: Fixing a type t \(\in Ob(Cpp)\), A <: B
implies (B -> t) <: (A -> t)
. That means, if I need to give a function that takes a A
and returns a t
, I might as well give a function that takes an B
and returns a t
. Why? Because the one who uses this function, will give me an instance of A
, and I’m supposed to return an instance of t
. But if my function can return instances of t
already, even when only given an B
(which is a more general type than A
), then I’m perfectly fine. Whatever my function uses of its type-B
parameter, will also work when someone else gives it a A
, because A
s, in particular, are B
s.
Thus, if A <: B
, then (B -> t) <: (A -> t)
. Note how the order is now reversed. This is why the first parameter of ->
is called contravariant, while the second parameter of ->
is called covariant.
In fact, there is a more formal sense for this. Every subtype relationship can be witnessed by some function. For instance, in Haskell, the function fst :: (a, b) -> a
acts as a witness that (a, b)
is a subtype of a
. Whenever I am requested a a
, I can take an (a, b)
and precompose fst
with the desired function, and it will work. fst
, in this case, is the arrow that is being preserved or inverted, when one has covariance and contravariance, respectively. In essence, <:
can be expressed by morphisms, and the ->
functor can either lift this morphism and leave its direction as was before, in which case we call this covariance, or lift it while changing its direction, in which case we call this contravariance.
Conclusion
If you haven’t read anything about category theory before, this might be a bit tough to swallow. It is definitely a subject that takes some getting used to. Here I have only explained the basics of this area of mathematics, but I encourage you to do research on your own about it. It really has some very beautiful and general results.
I was glad that C++ had not butchered another category theory term like “functor”, but is actually using covariance and contravariance in as close to their technical definition as possible. It also makes one see how the theory of programming languages can be seen from a category theoretical point of view, and how Haskell has embraced quite a lot of this. While Haskell does not have subtyping, its category theoretical roots were invaluable in trying to explain how the ->
functor works, and why it can be considered both covariant and contravariant.
I hope you’ve enjoyed this mess of an explanation, I certainly think I should have divided this post into several, and explained things piece by piece. However, I couldn’t wait to post about this when I learned about it, so you’ll have to forgive my poor information organization.
Happy coding!
Appendix: Cpp category details
During the post, I handwaved a bit when I talked about the Cpp category of C++ types. Indeed, I never proved it is indeed a category. I will now attempt to prove that, but I will also make a clarification: The functoriality of ->
in C++ is simply a way to see where the notion of “covariance” comes from. I’m not trying to say that the implementation of covariance actually follows the strict mathematical notions of a functor. Indeed, C++ implementations are as far away from category theory as one can imagine :)
First off, I will restrict myself to pointer types. Indeed, when talking about C++ return type covariance, the standard dictates it only goes in effect when the return type is either a pointer, or a reference. So covariance does not kick in when returning instances of classes.
Secondly, I choose to ignore non-total functions. This can be solved (and is solved in actual type theory - for more information check Haskell’s notion of “bottom”) by adding an additional value to each type, representing that its computation will not finish. The related mathematical area of three-valued logic deals in this.
Lastly, I want to clarify again that the subtyping relation I’m talking about is not as the language semantics define it. Indeed, even if A <: B
, (t -> A)
and (t -> B)
don’t really have a subtype relationship in C++, at least not as per language semantics. This post talks about (t -> A)
being able to substitute (t -> B)
in overriding a virtual function’s signature, when inheriting from a class that has defined a function (t -> B)
.
Having said that, let’s formalize Cpp a bit. We need to say what the objects are, what the morphisms are, and prove the axioms of a category. \(Ob(\textrm{Cpp})\), as we said, are the pointer types of C++. Here we find int*
, vector<char>*
, char*(*)(int*)
(functions taking int*
and returning char*
), and so on. Morphisms are the total functions which take and return pointers. I believe, though I’m not sure, I need not ask them to be pure (i.e. not state-modifying). Since C++ doesn’t really like passing functions themselves, we’ll make the morphisms the pointers to these functions.
For every objectt*
, there is the identity function t* id(t* x) { return x; }
. The corresponding morphism would be &id
. Composition is simply application in the given order - \((f \circ g)(x) = f(g(x))\), and is associative, since both \(((f \circ g) \circ h)(x)\) and \((f \circ (g \circ h))(x)\) are f(g(h(x)))
.
Thus Cpp, as defined, is a category.
I won’t prove ->
‘s functoriality formally here. It is much more of a mess than I’m willing to get into at the time (I started proving it and needed to use the type (s*(*(*)(t*))(q*))
…), and expressing that a method can overwrite another in a derived class is quite a mess. Furthermore, it wouldn’t clarify anything - the point is to understand that return type covariance is the notion that A <: B
implying (t -> A) <: (t -> B)
, where’<:’ formally means subtyping, but in our comparison would mean subclass on the first occurrence, and “can be used as a virtual function overriding” on the second occurrence. Likewise, parameter type contravariance would be the notion that, if A <: B
, then (B -> t) <: (A -> t)
. I think adding formalism at this point is useless, the idea is to get the intuition for why it is called “covariance”.