Complexity of Kruskal's algorithm

2016-05-12

An answer to this question on Stack Overflow.

Question

I'm trying to find out what's the complexity of Kruskal's disjoint-set determining loops.

In Kruskal's, we adding vertices into disjoint sets and creating unions of these sets if we added an edge which has both vertices in different sets as far as I'm concerned. That's how we determining loops - if two vertices are already in one set, we don't add this edge.

How do we check whether a vertex is in the set? In my opinion, it is in O(n) so Kruskal would be at least in O(n^2) + sorting time of edges O(nlogn).

For example here SO Answer, which is my problem too, they say that it is possible to run Kruskal in O(V+E) when you have just two types of edges. I get it, but I don't get the thing with disjoint sets, I think that it creates worse complexity.

Answer

You can check whether a vertex is in a set by checking the disjoint-set container.

Disjoint sets have, at most, O(α(n)) time complexity for any operation. Here α is the inverse Ackerman function, which is less than 5 for any input you, or anyone else, will ever use.

A disjoint set is like a forest of tree structures. The root of the tree uniquely identifies a set. Disjoint sets achieve their black magic by ensuring that when two sets are joined, they are always joined so as to make the shortest possible path to the root. Additionally, whenever a set membership check is performed on a set, it collapses every node on the path to the root such that those nodes can now have their membership checked in O(1) time.

The upshot is that Kruskal's algorithm ends up spending O(α(n)) time on each operation with the set, which is, practically speaking, the same as spending no time on it.