I've been on hiatus for a while due to work at and studies, but I thought I'd take a quick break and post about some interesting stuff I've been playing around with, and that's spectral clustering.
I'll try to assume no previous knowledge of clustering data, so I'll give a brief introduction to that.
The post goes into the k-means clustering algorithm, spectral clustering, and a C++ implementation of both.
Recently the students for the class I TA in had to solve a couple of exercises involving the boolean satisfiability problem, also known as SAT. I'd like to talk about some aspects of it which, while by no means new, should be interesting for someone being introduced to the problem.
I had often heard the expression "C++ now has covariant return types". To be honest, I vaguely knew covariance was a mathematical term, and I was also pretty sure it had nothing to do with statistical covariance.
Now that I understand a bit more about category theory, I feel I can clarify some things about this usage of the word "covariance" in programming languages. Surprisingly, it is not a terrible bastardization of the mathematical term, as C++ and Java have done to the word "functor".
With the incredible usefulness of graphs and the amount of algorithms to solve graph problems, it is an important issue how we represent these graphs. In this post I will introduce a couple of ways of representing graphs on a computer, together with their advantages and disadvantages. In following posts I will discuss the interesting algebraic properties of the adjacency matrix, as well as some more advanced ways of representing graphs.
We'll turn to the two most used ones: adjacency matrices and adjacency lists. These are the "canonical" way of representing graphs. Both are quite simple data structures, yet ...
So. 7:30 am on a cold, winter monday morning. What better time to talk about programming?