OpenMP offloading with GCC fails with "Ptx assembly aborted due to errors"

2020-07-11

An answer to this question on Stack Overflow.

Question

I am trying to compile the following simple OpenMP GPU offloading program with G++ 9.3.0

#include <iostream>
int main(){
  const int N=1000;
  int d[N];
  for(auto i=0;i<N;i++)
    d[i] = 1;
  #pragma omp target teams distribute parallel for map(tofrom:d[0:N])
  for(size_t i=0;i<N;i++){
    d[i] *= 3*i+1;
  }
  for(int i=0;i<N;i++)
    std::cout<<d[i]<<" ";
  std::cout<<std::endl;
}

Using the command:

g++ -fopenmp -O3 gpu_test.cpp

But this fails with:

ptxas /tmp/ccq6t6e2.o, line 189; error   : Illegal operand type to instruction 'ld'
ptxas /tmp/ccq6t6e2.o, line 246; error   : Illegal operand type to instruction 'ld'
ptxas /tmp/ccq6t6e2.o, line 189; error   : Unknown symbol '__stack_chk_guard'
ptxas /tmp/ccq6t6e2.o, line 246; error   : Unknown symbol '__stack_chk_guard'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: x86_64-linux-gnu-accel-nvptx-none-gcc-9 returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /usr/lib/gcc/x86_64-linux-gnu/9//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

How can I fix this?

Answer

Compiling with the following command will fix this problem:

g++ -fopenmp -fno-stack-protector -O3 gpu_test.cpp

If you're offloading to Nvidia you can check to see if offloading was successful by running your program like so:

nvprof ./a.out

This should return, e.g.

==61495== NVPROF is profiling process 61495, command: ./a.out
...
...
Program output
...
...
==61495== Profiling application: ./a.out
==61495== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   98.52%  173.09us         1  173.09us  173.09us  173.09us  main$_omp_fn$0
                    0.84%  1.4720us         1  1.4720us  1.4720us  1.4720us  [CUDA memcpy DtoH]
                    0.64%  1.1200us         1  1.1200us  1.1200us  1.1200us  [CUDA memcpy HtoD]

The copy to and from host coupled with the function call in main indicates that the offload was successful.