Concurrent erode or dilate on bitmap

2018-02-07

An answer to this question on Stack Overflow.

Question

I need to do a morphological operation (a specialized version of dilate/erode) on a bitmap. In order to accelerate, I am using openmp to parallelize the process: int* bitmap = ...; // pointer to bitmap data with width and height #pragma omp parallel for for(long iy=0; iy<height; iy++) for(long ix=0; ix<width; ix++) if(write_pattern(ix,iy)) apply_pattern(ix,iy, 0); // writes "0" to bitmap, write only, no read That means at certain locations a pattern of constant values is written to the output bitmap. Since the 'pattern' may span several lines it is clear that several threads write the same value to the same memory location concurrently. It appears to work, but it looks a bit shady.

Is this ok, or what would be the recommended way of doing this?

Answer

OpenMP is a shared-memory paradigm. If you can guarantee that all of your different processes are going to write the same values to the same locations, then it is fine to have them do so. There are race conditions, but they will not affect the result.

That gives you this for code:

int* bitmap = ...;              // pointer to bitmap data with width and height
#pragma omp parallel for collapse(2)
for(long iy=0; iy<height; iy++) 
for(long ix=0; ix<width; ix++) 
  if(write_pattern(ix,iy)) 
    apply_pattern(ix,iy, 0);   // writes "0" to bitmap, write only, no read

If you can't make such a guarantee, you could use a critical section to restrict write access:

int* bitmap = ...;              // pointer to bitmap data with width and height
#pragma omp parallel for collapse(2)
for(long iy=0; iy<height; iy++) 
for(long ix=0; ix<width; ix++) 
  if(write_pattern(ix,iy)) {
    #pragma omp critical
    apply_pattern(ix,iy, 0);   // writes "0" to bitmap, write only, no read      
  }

But, if you can't make such a guarantee, this is a bad candidate for parallelism, since different processes will reach different parts of the bitmap at different times each time you run your code. That is, the order of execution is non-deterministic, so you will be unable to predict the outcome. Assuming there's only one right outcome, your processes must produce the same outputs.

Similarly, be sure you are using apply_pattern() to alter something other than your original data, or your result will, again, be non-deterministic since one thread may modify the input before or after another thread sees the modified section.

Another strategy would be to collect all of the places that need modifying and then write them serially later. This might make sense if your check is expensive but your writing is cheap; say, if you're using a regex for the check and writing out zeros.

int* bitmap = ...;              // pointer to bitmap data with width and height
typedef std::pair<long,long> gridloc;
std::vector<gridloc> patterns;
//Use a custom reduction operator, available in newer versions of OpenMP
#pragma omp declare reduction (merge : std::vector<gridloc> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
#pragma omp parallel for collapse(2) reduction(merge:patterns)
for(long iy=0; iy<height; iy++) 
for(long ix=0; ix<width; ix++) 
  if(write_pattern(ix,iy)) {
    patterns.emplace_back(ix,iy)
  }
for(const auto &c: patterns)
  apply_pattern(c.first,c.second, 0);   // writes "0" to bitmap, write only, no read

If writing is expensive, you could do some processing on the contents of patterns to combine writes, eliminate redundant writes, topologically sort writes, &c.