Can I use `omp_get_thread_num()` on the GPU?

2017-12-23

An answer to this question on Stack Overflow.

Question

I have OpenMP code which works on the CPU by having each thread manage memory addressed by the thread's id number, accessible via omp_get_thread_num(). This works well on the CPU, but can it work on the GPU?

A MWE is:

#include <iostream>
#include <omp.h>
int main(){
  const int SIZE = 400000;
  int *m;
  m = new int[SIZE];
  #pragma omp target
  {
    #pragma omp parallel for
    for(int i=0;i<SIZE;i++)
      m[i] = omp_get_thread_num();
  }
  for(int i=0;i<SIZE;i++)
    std::cout<<m[i]<<"\n";
}

Answer

The answer seems to be no.

Compiling with PGI using:

pgc++ -fast -mp -ta=tesla,pinned,cc60 -Minfo=all test2.cpp

gives:

13, Parallel region activated
    Parallel loop activated with static block schedule
    Loop not vectorized/parallelized: contains call
14, Parallel region terminated

whereas compiling with GCC using

g++ -O3 test2.cpp -fopenmp -fopt-info

gives

test2.cpp:17: note: not vectorized: loop contains function calls or data references that cannot be analyzed
test2.cpp:17: note: bad data references.