best method for loop parallelisation in c using MPI or openMP

2014-02-01

An answer to this question on Stack Overflow.

Question

I am learning about parallel programming this days using MPI or open MP. I would like to know what would be the best way of parallelising this kind of loop and why?

sum = 0;
for (y = 1; y <= Ny; y++) {
    for (x = 1; x <= Nx; x++) {
        d = v1[y][x] - v2[y][x];
        sum += d * d;
    }
}
return sqrt(sum);

Answer

The best way is a hard thing to know without lots of testing and consideration of the specific use case you are interested in.

A way is this:

sum=0;
#pragma omp parallel for collapse(2) reduction(+:sum) private(d)
for (y = 1; y <= Ny; y++) {
  for (x = 1; x <= Nx; x++) {
    d = v1[y][x] - v2[y][x];
    sum += d * d;
  }
}

The collapse statement tells OpenMP's parallel for construct to parallelize across both for loops. The reduction statement tells OpenMP to add the private sum variables of all of the threads together once they complete.

MPI is much more complicated than this to use, but there are situations where it will be the best choice. If you're looking for a simple way to parallelize a relatively simple operation, this method may be the best choice for you.

If you're looking for a comparison between OpenMP and MPI, don't. They are a little like apples and oranges. In fact, you can even use them at the same time. MPI can parallelize an operation to multiple computing nodes and OpenMP can parallelize an operation within a node.