Please excuse if this is in the wrong tracker, it might need to be in the CUDA or OpenMP section (in fact, I'll file an issue there as well) I have a file without any OpenMP in it, and if I try clang++ [cuda args] -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -stdlib=libc++ b.cpp it is successful. If on the other hand I try clang++ [cuda args] -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -stdlib=libc++ -fopenmp b.cpp I get error: The target 'nvptx64-nvidia-cuda' is not a supported OpenMP host target. It appears that Clang only sees that I am using a CUDA backend and am using the OpenMP flag, not checking whether any OpenMP code actually needs to go to the PTX backend. Much like the Sandia folks, we're writing portable parallel programming models and would really love to put Clang through its paces, but if we can't write code where CUDA sections follow OpenMP ones, we won't be able to.
> Please excuse if this is in the wrong tracker, it might need to be in the CUDA or OpenMP section (in fact, I'll file an issue there as well) This is the right one.
What is going on is that -fopenmp is passed to the frontend command that generates code for CUDA. There is work under way to get a generic offloading implementation in clang. That will enable understanding the offloading programming model and pass the options -fopenmp only for OpenMP offloading code generation. Given that OpenMP offloading support in the driver, is not yet complete, a fix would be to not pass -fopenmp for any offloading toolchain. I'll post a patch for fixing that. Thanks for posting the bug.
(In reply to comment #2) > What is going on is that -fopenmp is passed to the frontend command that > generates code for CUDA. There is work under way to get a generic offloading > implementation in clang. That will enable understanding the offloading > programming model and pass the options -fopenmp only for OpenMP offloading > code generation. > > Given that OpenMP offloading support in the driver, is not yet complete, a > fix would be to not pass -fopenmp for any offloading toolchain. I'll post a > patch for fixing that. > > Thanks for posting the bug. Just to clarify, we *do* have code that looks like ________________________________________ Initialize() run_parallel(openmp,[=](int i){ //OpenMP parallel stuff }); run_parallel(cuda,[=](int i)__device{ //cuda parallel stuff }); ______________________________________ Will this intermediate solution of selective passing of -fopenmp work on a system like this? Thanks for responding so quickly!
Created attachment 16812 [details] A reduced version of the kind of parallel programming system we're writing
Sorry, used to systems in which I can edit comments rather than spam, last message for a while: the full version of the parallel system is at https://github.com/LLNL/RAJA if you want to check it out, but it doesn't currently build with this Clang (due to the "device attribute placement" bug Christian pointed out)
(In reply to comment #3) > (In reply to comment #2) > > What is going on is that -fopenmp is passed to the frontend command that > > generates code for CUDA. There is work under way to get a generic offloading > > implementation in clang. That will enable understanding the offloading > > programming model and pass the options -fopenmp only for OpenMP offloading > > code generation. > > > > Given that OpenMP offloading support in the driver, is not yet complete, a > > fix would be to not pass -fopenmp for any offloading toolchain. I'll post a > > patch for fixing that. > > > > Thanks for posting the bug. > > Just to clarify, we *do* have code that looks like > ________________________________________ > Initialize() > > run_parallel(openmp,[=](int i){ > //OpenMP parallel stuff > }); > > run_parallel(cuda,[=](int i)__device{ > //cuda parallel stuff > }); > ______________________________________ > > Will this intermediate solution of selective passing of -fopenmp work on a > system like this? Thanks for responding so quickly! Based on your description I think the fix would work just fine. All it would do is to prevent the CUDA code generation to choke on OpenMP options and directives.
Fantastic, this kind of support is really appreciated. If you ping me when it's in (or comment on this bug) I'll try to run it through RAJA and either get you a reason why it's still not working or tell you that it is working
Fixed in r276979.
You folks are wonderful, awesome time to candidate solution. I'll take it out for a spin and get you feedback. Much appreciated!
Verifying that this fixes this particular bug, we're going to be scaling up our use of LLVM to testing it on our mini-apps, I'll report any future bugs as they arise. Seriously impressive work, thanks again!
Closing comment: This particular bug is fixed, we're running into new ones which I'll report when I'm sure they're not on my end, and when I have a good explanation for what's going on. Thanks all for the help