Python- How to find the height of standard binary search tree?
An answer to this question on Stack Overflow.
Question
I've a json file of type [{"score": 68},{"score": 78}]
I need to find the height of standard binary search tree that is made using the scores of all the objects. How can I do it?
This is what I'm doing. I'm first getting all the scores and storing inside the json file and then applying the formula.
import ijson
import math
f = open ('data_large')
content = ijson.items(f, 'item')
n = len(list(i['score'] for i in content))
height = math.ceil(math.log((n+1),2)-1)
print height
Well, this does gives me the correct answer, but wanted to know 2 things?
1) Whether this formula will also be valid in case when there are duplicates in the list, since I need to develop a BST which can have duplicates as well?
2) I think n = len(list(i['score'] for i in content)) is useless because since I dont need the node values to calculate the height of the BST, but only the length of the list. Is there any way I can calculate the number of entries so that I omit this line and calculate the total nuber of entries in the json file, which will serve the purpose of n?
The other thing is I also wanted to calculate is the unique scores as well from the file. So, this is how I'm doing print set(i['score'] for i in content) , but it takes 201secs to execute since the file is so large( 256MB, hence used ijson for fast processing ), hence there are multiple entries inside the content. Can I make this statement much more time-efficient. If yes, How?
Answer
- Yes/no. If you've added a property to each node which counts the number of times that node has been inserted into the tree then you still have a BST and the answer is yes.
If you actually want duplicate nodes, then you'd need modify the BST property. The unmodified property says that items smaller than X go left and items greater than X go right. If you instead say items greater than or equal to X to right, then it's easy to see that you can make the tree arbitrarily high by adding many duplicate items and the answer is no.
Have you tried
list(content)? Of course, you cannot build a BST without somehow removing duplicate nodes, this cannot be used to calculate the height of the BST. You need to remove duplicate items. This leads to your third question.As regards the
print set(i['score']...You shouldn't bundle separate questions together like this as it will lead you down the dark path to having different answers addressing different parts of your question. However, the code you've write certainly does have something Pythonic about it. So you've got to ask yourself if it's really worth your time (which is often the only time that really matters) to try to find a more convoluted, but quicker solution.