How to find the set of trees every one of which spans over another given tree?

2013-09-23

An answer to this question on Stack Overflow.

Question

Imagine it's given a set of trees ST and each vertex of every tree is labeled. Also another tree T is given (also with labels vertices). The question is how can I find which trees of the ST can span over the tree T starting from the root of T in such a way that the labels of the vertices of the spanning tree T' coincide with those labels of T 's vertices. Note that the children of every vertex of T should be either completely covered or not covered at all - partial covering of children is not allowed. Stated in other words: Given a tree and the following procedure: pick a vertex and remove all vertices and edges below this vertex (except the vertex itself). Find those trees of ST such that each tree is generated with a series of procedures applied to T. For example given the tree T

enter image description here

the trees

enter image description here

cover T and the tree

enter image description here

does not because this tree has children 3, 5 unlike T which has 2, 3 as children. The best thing I was able to think of was either to brute force it or to find the set of tree every one of which has the same root label as T and then to search for the answer among those trees but I guess neither of those two approaches is the optimal one. I was thinking of somehow hashing the trees but nothing came out. Any thoughts?

Notes:

The trees are not necessarily binary
A tree T can cover another tree T' if they share a root
The tree is ordered meaning that you cannot swap the position of any two children.

TL; DR Find a efficient algorithm which on query with given tree T the algorithm finds all trees from a given(fixed/static) set ST which are able to cover T.

Answer

I'll sketch an answer and then provide some working source code.

First off, you need an algorithm to hash a tree. We can assume, without loss of generality, that the children of each of your tree's nodes are ordered from least to greatest (or vice versa).

Run this algorithm on every member of ST and save the hashes.

Now, take your test tree T and generate all of its subtrees TP that retain the original root. You can do this (perhaps inefficiently) by:

Making a set S of its nodes
Generating the power set P of S
Generating the subtrees by removing the nodes present in each member of P from copies of T
Adding those subtrees which retain the original root to TP.

Now generate a set of all of the hashes of TP.

Now check each of your ST hashes for membership in TP.

ST hash storage requires O(n) space in ST, and possibly the space to hold the trees.

You can optimize the membership code so that it requires no storage space (I have not done this in my test code). The code will require approximately 2**N checks, where N is the number of nodes in T.

So the algorithm runs in O(H 2**N), where H is the size of ST and N is the number of nodes in T. The best way of speeding this up is to find an improved algorithm for generating the subtrees of T.

The following Python code accomplishes this:

#!/usr/bin/python
import itertools
import treelib
import Crypto.Hash.SHA
import copy
#Generate a hash of a tree by recursively hashing children
def HashTree(tree):
  digester=Crypto.Hash.SHA.new()
  
  digester.update(str(tree.get_node(tree.root).tag))
  children=tree.get_node(tree.root).fpointer
  children.sort(key=lambda x: tree.get_node(x).tag, cmp=lambda x,y:x-y)
  hash=False
  if children:
    for child in children:
      digester.update(HashTree(tree.subtree(child)))
    hash = "1"+digester.hexdigest()
  else:
    hash = "0"+digester.hexdigest()
  return hash
#Generate a power set of a set
def powerset(iterable):
  "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
  s = list(iterable)
  return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
#Generate all the subsets of a tree which still share the original root
#by using a power set of all the tree's nodes to remove nodes from the tree
def TreePowerSet(tree):
  nodes=[x.identifier for x in tree.nodes.values()]
  ret=[]
  for s in powerset(nodes):
    culled_tree=copy.deepcopy(tree)
    for n in s:
      try:
        culled_tree.remove_node(n)
      except:
        pass
    if len([x.identifier for x in culled_tree.nodes.values()])>0:
      ret.append(culled_tree)
  return ret
def main():
  ST=[]
  #Generate a member of ST
  treeA = treelib.Tree()
  treeA.create_node(1,1)
  treeA.create_node(2,2,parent=1)
  treeA.create_node(3,3,parent=1)
  ST.append(treeA)
  #Generate a member of ST
  treeB = treelib.Tree()
  treeB.create_node(1,1)
  treeB.create_node(2,2,parent=1)
  treeB.create_node(3,3,parent=1)
  treeB.create_node(4,4,parent=2)
  treeB.create_node(5,5,parent=2)
  ST.append(treeB)
  #Generate hashes for members of ST
  hashes=[(HashTree(tree), tree) for tree in ST]
  print hashes
  #Generate a test tree
  T=treelib.Tree()
  T.create_node(1,1)
  T.create_node(2,2,parent=1)
  T.create_node(3,3,parent=1)
  T.create_node(4,4,parent=2)
  T.create_node(5,5,parent=2)
  T.create_node(6,6,parent=3)
  T.create_node(7,7,parent=3)
  #Generate all the subtrees of this tree which still retain the original root
  Tsets=TreePowerSet(T)
  #Hash all of the subtrees
  Thashes=set([HashTree(x) for x in Tsets])
  #For each member of ST, check to see if that member is present in the test
  #tree
  for hash in hashes:
    if hash[0] in Thashes:
      print [x for x in hash[1].expand_tree()]
main()