Properties of graph representations

With the incredible usefulness of graphs and the amount of algorithms to solve graph problems, it is an important issue how we represent these graphs. In this post I will introduce a couple of ways of representing graphs on a computer, together with their advantages and disadvantages. In following posts I will discuss the interesting algebraic properties of the adjacency matrix, as well as some more advanced ways of representing graphs.

We’ll turn to the two most used ones: adjacency matrices and adjacency lists. These are the “canonical” way of representing graphs. Both are quite simple data structures, yet their performance and behavior differs noticeably.

Adjacency Matrices and Adjacency Lists

For both of these structures, we will assume that we can number the nodes of our graphs as \(v_1, \cdots, v_n\). This is usually a reasonable assumption - if we have more data associated with a node, we can always store them in an array and use their index in this array as their “identity” for these two structures.

Adjacency Matrices

So let us say we have our nodes, \(V = \{v_1, \cdots, v_n\}\). We also have a certain set of edges, \(E = \{e_1, \cdots, e_m\}\), where each edge is of the form \(e_i = (u, v)\) for some nodes \(u, v\), The adjacency matrix for a graph \(G = (V, E)\) is a matrix \(A \in \{0, 1\}^{n \times n}\).

The entries of \(A\) are described simply:

\[ A_{i, j} = \begin{cases} 1, & \text{if } (v_i, v_j) \in E \\ 0, & \text{otherwise} \end{cases} \]

That is, the \(i, j\)th entry in A holds whether or not \(v_i\) and \(v_j\) are joined by an edge. Let’s look at a particular example:

Here, our nodes are labeled precisely \(1, \cdots, 5\). Algebraically, we have a graph \(G = (V, E)\) such that \(V = \{v_1, v_2, v_3, v_4, v_5\}\), and \(E = \{e_{1, 2}, e_{2, 1}, e_{2, 4}, e_{4, 2}, e_{3, 1}, e_{1, 3}, e_{3, 5}, e_{5, 3}, e_{2, 5}, e_{5, 2}, e_{1, 4}, e_{4, 1}\}\), with \(e_{i, j} = (v_i, v_j)\). Our adjacency matrix will be a \(5 \times 5\) matrix of \(0\)s and \(1\)s. Specifically, it will be this matrix:

\[ A = \begin{pmatrix} 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{pmatrix} \]

Usually, one would store this matrix as an array of arrays, or, if one wishes a bit more abstraction, either as a matrix class, or using something like the C++ STL vector<vector<bool>>.

Some properties of this representation become apparent:

If the graph is very sparse (few edges), this representation wastes a lot of space: most of the matrix will be zeros because for any given two nodes, it’s unlikely that there’s an edge between them.
The matrix will always be symmetric for graphs. That is, \(A_{i, j} = A_{j, i}\ \forall\ 1 \le i, j \le n\), or, equivalently, \(A^t = A\). Likewise, since auto-edges are not allowed on graphs, it is always true that \(A_{i, i} = 0\ \forall\ 1 \le i \le n\).
It can also be seen that in order to know the degree of a node \(v_i\), we must compute \(\sum_{j = 1}^n A_{i, j}\). This operation is \(\Theta(n)\) in time.
The space one needs to store this matrix is \(\Theta(|V|^2)\). One could get clever and, since this is a binary matrix, take advantage of a the binary architecture of today’s computers, and store 8 elements on a single word (assuming 8 bit bytes). Due to a number of reasons, this ends up being suboptimal in terms of runtime. The most noticeable of these reasons are code complexity, cache performance, and inability to simply iterate through elements (since we address memory locations as words, not bits).

However, this structure does provide some interesting advantages. Namely:

Knowing if a particular edge exists between \(v_i\) and \(v_j\) is \(O(1)\), since we simply check \(A_{i, j}\) and see if there’s a \(1\) or a \(0\).
For dense graphs (lots of edges), we waste very little space. What’s more, the overhead imposed by the data structure itself is minimal.
If we are traversing the graph by rows, locality of reference will give us a big advantage in terms of caching, if our underlying data structure can take advantage of it (a simple array will do, or C++’s vector as well). The first time we ask for the \(i\)th row, a lot of entries will be stored in the computer’s fast cache memory. Next time we ask for an element of that same row (and, since we’re traversing the nodes row by row, this should be soon) we will get the information straight from the cache, and not have to refer to main memory.

Later on we will see a few more advanced uses for which adjacency matrices lend themselves well, and some cool properties of this structure using basic linear algebra.

For the time being, let’s inspect the other canonical representation of graphs.

Adjacency Lists

An adjacency list is an arguably more complicated structure than an adjacency matrix, but it’s still pretty simple. A good motivation for this data structure comes from the fact that, for sparse graphs, an adjacency matrix wastes a lot of space. Also, often we are not interested in all of the graph at the same time, but only on a specific part of it, namely the vicinity of a node (i.e. its neighbors).

If we take this into account, it’s natural to have, instead of a matrix (which can be seen as an array of arrays), an array of lists, such that the \(i\)th position in the array has a list of the \(v_i\)’s neighbors. In this way, if this list is a linked list, we only use space for the neighbors the node actually has, and not the ones it could have, as in the adjacency matrix.

To solidify this idea, let’s take another graph as an example.

As we said, we are going to have an array \(B\) of lists, where the list at \(B_i\) holds the neighbors of the node \(v_i\). For this example graph, the adjacency list would look like this:

\[B = \{[2, 3, 4, 7], [1, 6], [1, 5], [1, 6], [3], [4, 2], [1]\}\]

To use the information, one would go to the desired index in the array, and look at the list of adjacent nodes. So if \(4\) is in \(B_6\), that means \(v_4\) and \(v_6\) are neighbors.

First, a couple of benefits of this structure:

The space used is, without counting overhead, less than or equal to the space used by an adjacency matrix. In particular, it is \(O(|E|)\).and \(|E|\) itself is \(O(|V|^2)\). For sparse graphs, this can incur in huge space savings.
Knowing the degree of a node is usually an \(O(1)\) time operation, since most implementations of lists have their length available in constant time. In this case, to know the degree of a node \(v_i\), we return \(length(B_i)\).
It is quite frequent that we wish to conduct the operation “given a node, execute some action for each of its neighbors”. Traversing neighbors in an adjacency list is quite efficient, since you only traverse the neighbors you have, you don’t need to ask each time if the neighbor exists, as you do with an adjacency matrix.

In algorithms such as DFS, BFS, Dijkstra’s, or Bellman-Ford, an adjacency list structure performs better than its matrix counterpart, because the main operation we will be doing is, given a node, traverse all of its neighbors and execute some subroutine.

As you can expect, however, this data structure does have some drawbacks.

Asking whether or not two nodes \(v_i\) and \(v_j\) share an edge is expensive. Specifically, it can be done in \(O(\min(|B_i|, |B_j|))\), since you might as well check the smaller of the two lists for the presence of the other node.
For dense graphs, you gain little with respect to an adjacency matrix. Traversing with an adjacency matrix is only prohibitively expensive if most times you’re going to encounter a \(0\), meaning when you have a sparse graph. But for a dense graph, you lose the advantage of the adjacency list.
Removing edges is more expensive than in an adjacency matrix, because you don’t have direct access to the edges. You must first find it by traversing the adjacency list of one of its endpoints. If one also insists in keeping these neighbors sorted in the adjacency list (not needed in the general case), then adding edges is again more expensive than in adjacency matrices.

As a result of these drawbacks, there are some algorithms that become extremely inefficient using adjacency lists, such as Floyd-Warshall, or algorithms where the goal is to check for the presence or absence of a given edge.

Conclusion

Those are the two canonical data structures for graph representations. Each one has its advantages and drawbacks, and there’s no superior data structure in every sense. It is a good exercise to write a short program demonstrating the use of an adjacency matrix versus an adjacency list for some common algorithms, say BFS/DFS, Dijkstra’s algorithm, Floyd and Warshall’s all-pairs-shortest-path algorithm, or Kruskal’s minimum spanning tree algorithm.