Properties of graph representations

With the incredible usefulness of graphs and the amount of algorithms to solve graph problems, it is an important issue how we represent these graphs. In this post I will introduce a couple of ways of representing graphs on a computer, together with their advantages and disadvantages. In following posts I will discuss the interesting algebraic properties of the adjacency matrix, as well as some more advanced ways of representing graphs.

We’ll turn to the two most used ones: adjacency matrices and adjacency lists. These are the “canonical” way of representing graphs. Both are quite simple data structures, yet their performance and behavior differs noticeably.

Adjacency Matrices and Adjacency Lists

For both of these structures, we will assume that we can number the nodes of our graphs as \(v_1, \cdots, v_n\). This is usually a reasonable assumption - if we have more data associated with a node, we can always store them in an array and use their index in this array as their “identity” for these two structures.

Adjacency Matrices

So let us say we have our nodes, \(V = \{v_1, \cdots, v_n\}\). We also have a certain set of edges, \(E = \{e_1, \cdots, e_m\}\), where each edge is of the form \(e_i = (u, v)\) for some nodes \(u, v\), The adjacency matrix for a graph \(G = (V, E)\) is a matrix \(A \in \{0, 1\}^{n \times n}\).

The entries of \(A\) are described simply:

\[ A_{i, j} = \begin{cases} 1, & \text{if } (v_i, v_j) \in E \\ 0, & \text{otherwise} \end{cases} \]

That is, the \(i, j\)th entry in A holds whether or not \(v_i\) and \(v_j\) are joined by an edge. Let’s look at a particular example:

Here, our nodes are labeled precisely \(1, \cdots, 5\). Algebraically, we have a graph \(G = (V, E)\) such that \(V = \{v_1, v_2, v_3, v_4, v_5\}\), and \(E = \{e_{1, 2}, e_{2, 1}, e_{2, 4}, e_{4, 2}, e_{3, 1}, e_{1, 3}, e_{3, 5}, e_{5, 3}, e_{2, 5}, e_{5, 2}, e_{1, 4}, e_{4, 1}\}\), with \(e_{i, j} = (v_i, v_j)\). Our adjacency matrix will be a \(5 \times 5\) matrix of \(0\)s and \(1\)s. Specifically, it will be this matrix:

\[ A = \begin{pmatrix} 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{pmatrix} \]

Usually, one would store this matrix as an array of arrays, or, if one wishes a bit more abstraction, either as a matrix class, or using something like the C++ STL vector<vector<bool>>.

Some properties of this representation become apparent:

However, this structure does provide some interesting advantages. Namely:

Later on we will see a few more advanced uses for which adjacency matrices lend themselves well, and some cool properties of this structure using basic linear algebra.

For the time being, let’s inspect the other canonical representation of graphs.

Adjacency Lists

An adjacency list is an arguably more complicated structure than an adjacency matrix, but it’s still pretty simple. A good motivation for this data structure comes from the fact that, for sparse graphs, an adjacency matrix wastes a lot of space. Also, often we are not interested in all of the graph at the same time, but only on a specific part of it, namely the vicinity of a node (i.e. its neighbors).

If we take this into account, it’s natural to have, instead of a matrix (which can be seen as an array of arrays), an array of lists, such that the \(i\)th position in the array has a list of the \(v_i\)’s neighbors. In this way, if this list is a linked list, we only use space for the neighbors the node actually has, and not the ones it could have, as in the adjacency matrix.

To solidify this idea, let’s take another graph as an example.

As we said, we are going to have an array \(B\) of lists, where the list at \(B_i\) holds the neighbors of the node \(v_i\). For this example graph, the adjacency list would look like this:

\[B = \{[2, 3, 4, 7], [1, 6], [1, 5], [1, 6], [3], [4, 2], [1]\}\]

To use the information, one would go to the desired index in the array, and look at the list of adjacent nodes. So if \(4\) is in \(B_6\), that means \(v_4\) and \(v_6\) are neighbors.

First, a couple of benefits of this structure:

In algorithms such as DFS, BFS, Dijkstra’s, or Bellman-Ford, an adjacency list structure performs better than its matrix counterpart, because the main operation we will be doing is, given a node, traverse all of its neighbors and execute some subroutine.

As you can expect, however, this data structure does have some drawbacks.

As a result of these drawbacks, there are some algorithms that become extremely inefficient using adjacency lists, such as Floyd-Warshall, or algorithms where the goal is to check for the presence or absence of a given edge.

Conclusion

Those are the two canonical data structures for graph representations. Each one has its advantages and drawbacks, and there’s no superior data structure in every sense. It is a good exercise to write a short program demonstrating the use of an adjacency matrix versus an adjacency list for some common algorithms, say BFS/DFS, Dijkstra’s algorithm, Floyd and Warshall’s all-pairs-shortest-path algorithm, or Kruskal’s minimum spanning tree algorithm.