OpenMP parallelization stopped working
An answer to this question on Stack Overflow.
Question
On linux, AMD 8-core processor, using g++ 4 7.1.
This is - for me - a headbanger. This following code was working perfectly, and for some reason stopped parallelizing. I added the omp_get_num_procs(), and it prints 8 processors. I checked the compilaton, and -fopenmp is present as option both linking and compiling. No compilation/link error message. I checked if any environment variables were defined (OMP_xxx) - there were none.
Are there other - external - factors that could influence?
#pragma omp parallel
{
lightray ray;
rgba L;
printf("Max nr processors: %d\n", omp_get_num_procs());
#pragma omp for schedule(dynamic)
for (int xy = 0; xy < xy_range; xy++) {
int x = x_from + (xy % x_width);
int y = y_from + (xy / x_width);
ray = cam->get_ray_at(x, y);
L = trace_ray(ray, 0, cam->inter);
cam->set_pixel(x, y, L);
}
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
}
EDIT: I think I've found something here... The command line for g++ generated by Anjuta contains this:
-DPACKAGE_LOCALE_DIR=\""/usr/local/share/locale"\" -DPACKAGE_SRC_DIR=\"".. -fopenmp . "\"
The PACKAGE_SRC_DIR definition seems to 'include' the -fopenmp flag, which would hide it from g++. Haven't found the cause yet...
Answer
Try rewriting it this way:
lightray ray;
rgba L;
printf("Max nr processors: %d\n", omp_get_num_procs());
#pragma omp parallel for schedule(dynamic) private(ray,L)
for (int xy = 0; xy < xy_range; xy++) {
int x = x_from + (xy % x_width);
int y = y_from + (xy / x_width);
ray = cam->get_ray_at(x, y);
L = trace_ray(ray, 0, cam->inter);
cam->set_pixel(x, y, L);
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
That way you introduce ray and L as being variables specific to each of the threads tag-teaming the loop. Since variables defined outside of a parallel region are shared between threads by default, your current implementation is munging these two variables.
Also, omp_get_num_procs() "Returns the number of processors available to the program." according to the OpenMP API 3.1 C/C++ Syntax Quick Reference Card - it therefore does not necessarily tell you how many threads are actually being used in a region. For that you may want omp_get_num_threads() or omp_get_thread_num()