Skip to content

Ranking Sewer Lines worst to best condition using Genetic Algorithm?

An answer to this question on the Scientific Computing Stack Exchange.

Question

Problem

I work for a municipality and we are trying to figure out which sections of sewer lines to replace first or at least identify areas that should be looked at. It was suggested I use a Fast Messy Genetic Algorithm like this.

We have records on the sewer lines like date installed, pipe type(clay, plastic, etc), life expectancy of pipe type, tree cover (roots penetrate the pipe causing breaks and blockages), ground water level(if there is a break in this area water gets into the pipe and then is unnecessarily treated), and visual inspections on 45 out of 3,500 pipes(number of breaks in pipe and how sever).

There is some interaction between these factors like clay pipes can be broken by roots more easily.

Where I am suck

I am unsure how to make a fitness function. I am thinking I would sum the values like percent of pipe covered by trees, percent of pipe blow ground water, age of pipe divided by its life expectancy. However, I am not sure how to account for clay pipes interacting with tree coverage (possibly multiply tree coverage for a some weight of each pipe type).

How do I set up a fitness function for the above parameters?

Answer

I'm not sure why you would use a genetic algorithm here. It sounds to me as though you want to predict which pipes will break by training a model based on the inspection of the 45 pipes.

If you're just getting started on this sort of thing, random forests might be a good way to go. The theory behind them is pretty easy to understand and they are relatively easy to interpret versus other models.

You can find an introduction here and a step-by-step application of random forests to some data from the Titanic here.

In many ways genetic algorithms are a last resort for solving optimization problems when there is no good theory you can leverage to get something with better/faster convergence. If you're skipping straight to a GA, you're probably not thinking hard enough about your problem.