Skip to content

OpenMP parallelization stopped working

An answer to this question on Stack Overflow.

Question

On linux, AMD 8-core processor, using g++ 4 7.1.

This is - for me - a headbanger. This following code was working perfectly, and for some reason stopped parallelizing. I added the omp_get_num_procs(), and it prints 8 processors. I checked the compilaton, and -fopenmp is present as option both linking and compiling. No compilation/link error message. I checked if any environment variables were defined (OMP_xxx) - there were none.

Are there other - external - factors that could influence?

#pragma omp parallel
{
  lightray ray;
  rgba L;
  printf("Max nr processors: %d\n", omp_get_num_procs());
  
  #pragma omp for schedule(dynamic)
  for (int xy = 0; xy < xy_range; xy++) {
    int x = x_from + (xy % x_width);
    int y = y_from + (xy / x_width);
    ray = cam->get_ray_at(x, y);
    L = trace_ray(ray, 0, cam->inter);
    cam->set_pixel(x, y, L);
  }
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
}

EDIT: I think I've found something here... The command line for g++ generated by Anjuta contains this:

-DPACKAGE_LOCALE_DIR=\""/usr/local/share/locale"\" -DPACKAGE_SRC_DIR=\"".. -fopenmp  . "\"

The PACKAGE_SRC_DIR definition seems to 'include' the -fopenmp flag, which would hide it from g++. Haven't found the cause yet...

Answer

Try rewriting it this way:

lightray ray;
rgba L;
printf("Max nr processors: %d\n", omp_get_num_procs());
#pragma omp parallel for schedule(dynamic) private(ray,L)
for (int xy = 0; xy < xy_range; xy++) {
  int x = x_from + (xy % x_width);
  int y = y_from + (xy / x_width);
  ray = cam->get_ray_at(x, y);
  L = trace_ray(ray, 0, cam->inter);
  cam->set_pixel(x, y, L);
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);

That way you introduce ray and L as being variables specific to each of the threads tag-teaming the loop. Since variables defined outside of a parallel region are shared between threads by default, your current implementation is munging these two variables.

Also, omp_get_num_procs() "Returns the number of processors available to the program." according to the OpenMP API 3.1 C/C++ Syntax Quick Reference Card - it therefore does not necessarily tell you how many threads are actually being used in a region. For that you may want omp_get_num_threads() or omp_get_thread_num()