How to design inserting to an infinite array
An answer to this question on Stack Overflow.
Question
Problem statement
Imagine we have an infinite array, where we store integers. When n elements are in the array, only n first cells are used, the rest is empty.
I'm trying to come up with a data structure / algorithm that is capable of:
- checking whether an element is stored
- inserting a new element if it is not already stored
- deleting an element if it is stored
Each operation has to be in O(sqrt(n)).
Approach 1
I've come across this site, where the following algorithm was presented:
- The array is (virtually, imagine this) divided into subarrays. Their lengths are 1, 4, 9, 16, 25, 36, 49, etc. the last subarray is not a perfect square - it may not be filled entirely.
- Assumption that, when we consider those subarrays as sets - they're in increasing order. So all elements of a heap that is further to the right are greater than any element from heaps on their left.
- Each such subarray represents a binary heap. A max heap.
- Lookup: go to first indexes of heaps (so again 1, 4, 9, 16, ...) and go as long as you find the first heap with its max (max is stored on those indexes) is greater than your number. Then check this subarray / heap.
- Insert: once you do the lookup, insert the element to the heap where is should be. When the heap is full - take the greatest element and insert it to the next heap. And so on.
Unfortunately, this solution is O(sqrt(n) * log(n)).
How to make it pure O(sqrt(n))?
Idea 2
Since all the operations require the lookup to be performed, I imagine that inserting and deleting would both be O(1). Just a guess. And probably once inserting is done, deleting will be obvious.
Clarification
What does the infinite array mean?
Basically, you can store any number of elements in it. It is infinite. However, there are two restrictions. First - one cell can only store one element. Second - when the array stores currently n elements, only n first cells can be used.
What about the order?
It does not matter.
Answer
Create a linked list to a set of k arrays which represent hash tables.
Per the idea of the first site, let the hash tables be sized to contain 1, 4, 9, 16, 25, 36, 49, ... elements.
The data structure therefore contains N=k*(k+1)*(2*k+1)/6=O(k^3) (this is the result of a well-known summation formula for adding squares) elements with k hash tables.
You can then successively search each hash table for elements. The hash check, insert, and delete operations all work in O(1) time (assuming separate chaining so that deletions can be handled gracefully), and, since k<sqrt(N) (less than the cubic root, actually), this fulfills the time requirements of your algorithm.
If a hash table is full, add an additional one to the linked list. If a hash table is empty, remove it from the list (add it back in if necessary later). List insertion/deletion is O(1) for a doubly-linked list, so this does not affect the time complexity.
Note that this improves on other answers which suggest a straight-out hash table because rehashing will not be required as the data structure grows.