An introduction to differential geometry, specifically the concepts needed to understand how automatic differentiation works in modern machine learning frameworks.

We’ll rigorously build from topological spaces up to the Jacobian-vector product and vector-Jacobian product, through smooth manifolds, tangent and cotangent spaces, and the notions of pullback and pushforward.

The first part of a three-part series about convolutional neural networks, implemented using JAX + Flax. This part tries to classify the `stanford_dogs`

dataset using the VGG19 net, achieving around 50% test accuracy. We build everything from the ground up, as an introduction to the JAX framework.

In this post I propose a dense version of Kruskal’s algorithm that I have not seen in the literature. It has a runtime of \(\Theta(n^2)\), which is better than \(O(m \log m)\), the runtime of the standard version of the algorithm, when \(m \in \Theta(n^2)\).

Experimentation shows the runtime of a simple implementation to be similar to Prim’s algorithm for the dense case, and a marked improvement over the usual, sparse version of Kruskal.

AVL trees are data structures for ordered element storage and retrieval. They are a common implementation of sets and maps. The advantage they provide over regular binary search trees is they bound the height of the tree, resulting in logarithmic insertion, removal, and retrieval, and linear traversal.

In this post we will see how one can implement such a structure in a strongly typed language - in particular, Haskell - with one important note: the height invariant of this tree will be known to hold at compile time.

It’s relatively frequent in programming competitions that one needs to compute large terms of a linear recurrence with constant coefficients, like the Fibonacci sequence, modulo some number.

In this post we’ll see a bit about how to compute these terms fast, exploring not only the programming side, but the mathematics behind it that makes it work.

In this post I analyze a generalized version of generalized version of rock-paper-scissors, to any \(2n+1\) choices, \(n \ge 1\). I also look at a specific playing rule, which is “best-of-three”, meaning the two players play until one has two victories. I then generalize this to \(k\) victories, \(k \ge 1\).

In this post I introduce the notion of spectral clustering, and link to a small C++ impementation. I discuss the formalism and mathematical motivation for it as a relaxation of an NP-hard problem, and finally show an example of its behavior on a typical dataset on which \(k\)-means has bad behavior.

The boolean satisfiability problem, or SAT in computational complexity jargon, acts as a cornerstone of the NP complexity class. For every \(k \in \mathbb{N}\), \(k\)-SAT is a restriction of SAT where each clause has exactly \(k\) terms.

In this post I analyze the boundary between P and NP that occurs as we increase \(k\) in \(k\)-SAT, for \(1 \le k \le 3\). I also prove that \(3\)-SAT is already NP-complete.

In this post we take a look at how the mathematical concepts of variance and contravariance appear in the guise of a programming language construct.

Braess’ paradox is the notion that adding free edges to a network where traffic flows through with a given cost per edge, may actually *increase* the average cost to flow through the network. This counterintuitive result is based on Game Theory, since it can be modelled as a suboptimal Nash equilibrium.

In this post I show a classical example of it, and I introduce the notion of the price of anarchy.

In the third and last graph representation post, I discuss another linear algebraic representation of a graph, the incidence matrix. I also show some theorems which are easily proved using this representation.

In the second graph representation post, I talk a bit about one of the simpler ways of representing a graph: an adjacency matrix This representation is not only useful computationally, it has some interesting algebraic properties as well.

In this post I’ll introduce this representation, and show some proofs which are naturally expressed using this way.

In the first of three posts on graph representations, I give a brief summary and contrast of adjacency lists and matrices, the two classical ways of representing graphs in programming.

In this post I’ll show a brief introduction to dynamic programming, using the classical knapsack problem. I’ll go over top-down and bottom-up dynamic programming, and a proof of optimality for this problem. For the example code, I’ve used a mishmash of Python, pseudocode, and C.