Can I use `omp_get_thread_num()` on the GPU?
An answer to this question on Stack Overflow.
Question
I have OpenMP code which works on the CPU by having each thread manage memory addressed by the thread's id number, accessible via omp_get_thread_num(). This works well on the CPU, but can it work on the GPU?
A MWE is:
#include <iostream>
#include <omp.h>
int main(){
const int SIZE = 400000;
int *m;
m = new int[SIZE];
#pragma omp target
{
#pragma omp parallel for
for(int i=0;i<SIZE;i++)
m[i] = omp_get_thread_num();
}
for(int i=0;i<SIZE;i++)
std::cout<<m[i]<<"\n";
}
Answer
The answer seems to be no.
Compiling with PGI using:
pgc++ -fast -mp -ta=tesla,pinned,cc60 -Minfo=all test2.cpp
gives:
13, Parallel region activated
Parallel loop activated with static block schedule
Loop not vectorized/parallelized: contains call
14, Parallel region terminated
whereas compiling with GCC using
g++ -O3 test2.cpp -fopenmp -fopt-info
gives
test2.cpp:17: note: not vectorized: loop contains function calls or data references that cannot be analyzed
test2.cpp:17: note: bad data references.