Big oh notation for heaps

2013-01-06

An answer to this question on Stack Overflow.

Question

I am trying to understand big oh notations. Any help would be appreciated.

Say there is a program that creates a max heap and then pushes and removes the item.

Say there is n items.

To create a heap,it takes O(n) to heapify if you have read it into an array and then, heapifies it. To push an item, it takes O(1) and to remove it, it takes O(1) To heapify it after that, it takes log n for each remove and n log n for n items

So the big oh notation is O(n + n log n) OR, is it O(n log n) only because we choose the biggest one.

Answer

Formally speaking, O(N + N log N) is equivalent to O(N log N).

That said, it's assumed that there are coefficients buried in each of these, ala: O( aN + bN log(cN) ). If you have very large N values, these coefficients become unimportant and the algorithm is bounded only by its largest term, which, in this case, is log(N).

But it doesn't mean the coefficients are entirely unimportant. This is why in discussions of graph algorithms you'll often see authors say something like "the Floyd-Warshall algorithm runs in O(N^3) time, but has small coefficients".

If we could somehow write O(0.5N^3) in this case, we would. But it turns out that the coefficients vary depending on how you implement an algorithm and which computer you run it on. Thus, we settle for asymptotic comparisons, not necessarily because it is the best way, but because there isn't really a good alternative.

You'll also see things like "Worst-case: O(N^2), Average case: O(N)". This is an attempt to capture how the behavior of the algorithm varies with the input. Often times, presorted or random inputs can give you that average case, whereas an evil villain can construct inputs that produce the worst case.

Ultimately, what I am saying is this: O(N + N log N)=O(N log N). This is true, and it's the right answer for your homework. But we use this big-O notation to communicate and, in the fullness of time, you may find situations where you feel that O(N + N log N) is more expressive, perhaps if your algorithm is generally used for small N. In this case, do not worry so much about the formalism - just be clear about what it is you are trying to convey with it.