Skip to content

How to find all descendant vertices of all vertices in a big DAG (Directed acyclic graph)?

An answer to this question on the Operations Research Stack Exchange.

Question

A simple algorithm may be traverse all vertices, and perform DFS for every vertex.

However, the computational complexity is $O(n(n+m))$, where $n$ and $m$ are the number of vertices and edges in the DAG, respectively. (Since $m \in O(n^2)$, the complexity is actually $O(n^3)$. Hope I'm right about the complexity).

For very big graphs, the complexity is unacceptable.

Is there any algorithm or idea that can solve the problem (reduce the algorithm complexity)?

Answer

The general strategy is as follows:

  1. For each node, determine its in-degree.
  2. Locate those nodes for which the in-degree is zero have no incoming edges (there will always be some since this is a DAG).
  3. Add those nodes to a queue.
  4. Pop nodes from the queue and process them.
  5. For each node popped from the queue, decrement the in-degree count of its "downstream" neighbours. If a neighbour's count becomes zero, add it to the queue.
  6. Repeat.

This algorithm can be parallelized across the "wavefront" of 0-in-degree cells either directly, or by using a parallelized matrix library.

Depending on the type of processing you want to do and whether or not your graph is already partitioned, it may be possible to achieve greater parallelism.

For instance, in this paper I wish to calculate the number of "upstream" nodes each node has in a 2 trillion node graph. Since my DAG is conveniently partitioned I can calculate this quantity with fixed per-cell and per-partition communication events achieving considerably more parallelism than the wavefront approach alone would allow.

For other resources you could consider the following papers:

Both of these papers follow the kind of techniques described in the book Graph Algorithms in the Language of Linear Algebra and proposed in the GraphBLAS standard.