Trying to compile the following cuda code: __device__ void foo() {} int main(){} Works in C++17 mode, fails in C++20 mode with a series of errors about "unknown type name '__device__'" from the cuda_wrappers headers.
Ouch. In c++20 mode one of the the standard C++ headers includes <new> which ends up including cuda_wrappers/new before we've got the standard CUDA macros ready. Should be fixed by https://reviews.llvm.org/D91807
Fixed in https://reviews.llvm.org/rG9a465057a64dba8a8614424d26136f5c0452bcc3
This would be a good low-risk fix to cherry-pick into 11.0.1
Hi, I applied the patch you posted, but now I am getting the following errors: /usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new:50:12: error: reference to __host__ function 'malloc' in __device__ function return ::malloc(size); ^ /usr/include/stdlib.h:539:14: note: 'malloc' declared here extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur; /usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new:67:7: error: reference to __host__ function 'free' in __device__ function ::free(ptr); ^ /usr/include/stdlib.h:563:13: note: 'free' declared here extern void free (void *__ptr) __THROW;
I'm going to hold off on backporting this until the failure in comment #4 is addressed.
Interesting. Can you tell me which OS and libstdc++ version you're using? Did the patched cuda_wrappers/new from the source tree make it into the /usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new where your clang is looking for them?
Ubuntu 18.04 libstdc++ should be from gcc 10.1: _GLIBCXX_RELEASE 10 __GLIBCXX__ 20200515 I applied your phabricator patch to that file, yes.
I have just tested it with libc++ and it works fine (but of course it would be nice to have it working with libstdc++).
Interesting. My version of libstdc++ is 20200918 and I do not see this issue. Would you be able to capture the output of the following command and attach it to the bug? Adjust --cuda-path to point to the CUDA version you're using. $ bin/clang++ -v --cuda-path=$HOME/local/cuda-11.0 --cuda-gpu-arch=sm_70 --std=c++20 -x cuda --stdlib=libstdc++ --cuda-device-only /dev/null -o - -dD -E
Created attachment 24223 [details] clang command log Here it is (I used arch sm_61, not sure if it makes a difference)
Thank you for the log. Indeed the device-side declarations for malloc/free are missing when cuda_wrappers/new has been includes. Clang in HEAD does not complain unless we must emit `operator new` which hides the issue. I can reproduce it by using new in device-side code. Looks like D91807 does not really fix the problem, just hides it sometimes. I'll need to find a better way to deal with this.
Do I understand correctly that this patch does not cause a regression, it just fixes a bug in some cases but not others? Do we still want to try to backport this?
Alas, the patch only hides the issue, and only in a rather useless toy cases. Unfortunately I do not think it would do any good to backport it to 11.0.1 and I do not have a better fix for it yet. If I were to attempt fixing it for 11.0.1, how much time do I have? I should have a better idea what can be done by the end of the day tomorrow.
https://reviews.llvm.org/D91807 has been updated with a better fix.
Landed in 43267929423bf768bbbcc65e47a07e37af7f4e22 @tstellar if/when you cherry-pick it, it will need to apply on top of 9a465057a64dba8a8614424d26136f5c0452bcc3. I didn't think to revert the first patch first. :-(
I've enabled testing w/ c++17 and c++20 on CUDA test bots and the tests compile and pass. http://lab.llvm.org:8011/#/builders/55/builds/2569 I think the patch is ready for cherry-picking into 11.0.1, if the train is not gone yet.
Merged: 59012b685fd