Removing degree-2 vertices from a graph
An answer to this question on the Mathematics Stack Exchange.
Question
Consider a map of a river system. Each point on the river line is a graph vertex. Some points have degree 1, at the start and end of a river, some have degree > 2 where rivers merge (or more rarely, split) and a load of vertex points will have degree 2 as a river winds along. This graph is generated from the spatial coordinates of the river.
However, the degree 2 vertices are not relevant to the topology of the river system, so I want to remove them. I'll end up with a kind of 'skeleton' graph, with just vertices of degree 1 or >2 connected but without the degree-2 vertices.
Does this transformation have a name? I can't find anything from a bit of research, but it can sometimes be hard to get through all the graph theory jargon (cliques, induced subgraphs, etc etc).
I'm using R and the igraph package. I have coded up an iterative process that repeatedly removes a single degree-2 node and joins its neighbours until no degree-2 vertices are left, but I wonder if there's a trick I'm missing.
Since I want this to be more general than just river networks (which are likely to be but not necessarily be trees) I'll have to think about what to do with a circular graph where all vertices are degree-2. I think I want to end up with one arbitrary vertex with an edge to itself.
Anyway, maybe someone will know exactly what this is...
Answer
I don't believe there is a trick you are missing.
Assuming a directed graph, the algorithm for this is as follows:
Initialize a queue Q of all the vertices
While Q is not empty:
Pop the top of Q into N
If N does not have an one incoming and one outgoing edge:
Continue to the next item in Q
If N has been seen:
Continue to the next item in Q
Mark N as Seen
Let Up be the node which points into N
Let Down be the node which N points to
Create an edge Up-Down with Weight=Weight[Up]+Weight[Down]
Remove the edges Up-N and N-Down
You can see that each node is added to Q only once and all of the nodes are eventually removed from the Q.
Therefore, the algorithm runs in O(N) time. There is no way you can do better than this. The alternative algorithm you suggest in your answer may work faster, but this is only because of implementation details.