New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 48228 - Clang fails to compile cuda code in C++20 mode, works in C++17
Summary: Clang fails to compile cuda code in C++20 mode, works in C++17
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: CUDA (show other bugs)
Version: 11.0
Hardware: PC Linux
: P normal
Assignee: Artem Belevich
URL:
Keywords:
Depends on:
Blocks: release-11.0.1
  Show dependency tree
 
Reported: 2020-11-19 02:24 PST by Bruno Manganelli
Modified: 2020-12-09 09:48 PST (History)
5 users (show)

See Also:
Fixed By Commit(s): 9a465057a64dba8a8614424d26136f5c0452bcc3 43267929423bf768bbbcc65e47a07e37af7f4e22 aa29049404e 59012b685fd


Attachments
clang command log (127.01 KB, application/octet-stream)
2020-12-01 13:29 PST, Bruno Manganelli
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bruno Manganelli 2020-11-19 02:24:34 PST
Trying to compile the following cuda code:


__device__ void foo() {}
int main(){}

Works in C++17 mode, fails in C++20 mode with a series of errors about "unknown type name '__device__'" from the cuda_wrappers headers.
Comment 1 Artem Belevich 2020-11-19 10:19:42 PST
Ouch. In c++20 mode one of the the standard C++ headers includes <new> which ends up including cuda_wrappers/new before we've got the standard CUDA macros ready.

Should be fixed by https://reviews.llvm.org/D91807
Comment 2 Artem Belevich 2020-11-19 10:41:28 PST
Fixed in https://reviews.llvm.org/rG9a465057a64dba8a8614424d26136f5c0452bcc3
Comment 3 Artem Belevich 2020-11-19 10:43:42 PST
This would be a good low-risk fix to cherry-pick into 11.0.1
Comment 4 Bruno Manganelli 2020-11-20 06:59:52 PST
Hi, I applied the patch you posted, but now I am getting the following errors:

/usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new:50:12: error: reference to __host__ function 'malloc' in __device__ function
  return ::malloc(size);
           ^
/usr/include/stdlib.h:539:14: note: 'malloc' declared here
extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;


/usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new:67:7: error: reference to __host__ function 'free' in __device__ function
    ::free(ptr);
      ^
/usr/include/stdlib.h:563:13: note: 'free' declared here
extern void free (void *__ptr) __THROW;
Comment 5 Tom Stellard 2020-11-20 22:47:06 PST
I'm going to hold off on backporting this until the failure in comment #4 is addressed.
Comment 6 Artem Belevich 2020-11-30 09:51:21 PST
Interesting. 

Can you tell me which OS and libstdc++ version you're using?

Did the patched cuda_wrappers/new from the source tree make it into the /usr/lib/llvm-11/lib/clang/11.0.1/include/cuda_wrappers/new where your clang is looking for them?
Comment 7 Bruno Manganelli 2020-12-01 01:04:01 PST
Ubuntu 18.04
libstdc++ should be from gcc 10.1:
_GLIBCXX_RELEASE 10
__GLIBCXX__ 20200515

I applied your phabricator patch to that file, yes.
Comment 8 Bruno Manganelli 2020-12-01 06:06:58 PST
I have just tested it with libc++ and it works fine (but of course it would be nice to have it working with libstdc++).
Comment 9 Artem Belevich 2020-12-01 10:54:15 PST
Interesting. My version of libstdc++ is 20200918 and I do not see this issue.

Would you be able to capture the output of the following command and attach it to the bug? Adjust --cuda-path to point to the CUDA version you're using.

$ bin/clang++ -v --cuda-path=$HOME/local/cuda-11.0 --cuda-gpu-arch=sm_70 --std=c++20 -x cuda --stdlib=libstdc++ --cuda-device-only /dev/null -o - -dD -E
Comment 10 Bruno Manganelli 2020-12-01 13:29:28 PST
Created attachment 24223 [details]
clang command log

Here it is (I used arch sm_61, not sure if it makes a difference)
Comment 11 Artem Belevich 2020-12-01 13:55:48 PST
Thank you for the log. Indeed the device-side declarations for malloc/free are missing when cuda_wrappers/new has been includes.

Clang in HEAD does not complain unless we must emit `operator new` which hides the issue. I can reproduce it by using new in device-side code.

Looks like D91807 does not really fix the problem, just hides it sometimes.

I'll need to find a better way to deal with this.
Comment 12 Tom Stellard 2020-12-03 19:10:41 PST
Do I understand correctly that this patch does not cause a regression, it just fixes a bug in some cases but not others?

Do we still want to try to backport this?
Comment 13 Artem Belevich 2020-12-03 20:10:50 PST
Alas, the patch only hides the issue, and only in a rather useless toy cases.
Unfortunately I do not think it would do any good to backport it to 11.0.1 and I do not have a better fix for it yet.

If I were to attempt fixing it for 11.0.1, how much time do I have? I should have a better idea what can be done by the end of the day tomorrow.
Comment 14 Artem Belevich 2020-12-04 11:41:31 PST
https://reviews.llvm.org/D91807 has been updated with a better fix.
Comment 15 Artem Belevich 2020-12-04 12:12:18 PST
Landed in 43267929423bf768bbbcc65e47a07e37af7f4e22

@tstellar if/when you cherry-pick it, it will need to apply on top of 9a465057a64dba8a8614424d26136f5c0452bcc3. I didn't think to revert the first patch first. :-(
Comment 16 Artem Belevich 2020-12-04 14:07:54 PST
I've enabled testing w/ c++17 and c++20 on CUDA test bots and the tests compile and pass. 
http://lab.llvm.org:8011/#/builders/55/builds/2569

I think the patch is ready for cherry-picking into 11.0.1, if the train is not gone yet.
Comment 17 Tom Stellard 2020-12-09 09:48:03 PST
Merged: 59012b685fd