Created attachment 6638 [details] A simple test case which fails CGExprScalar.cpp: EmitShl(...) may produce an integer cast for a right hand side expression, assuming that it is unsigned. Please see the attached test case which produces a wrong IR and a patch to fix this problem.
Created attachment 6639 [details] A patch Should be a sufficient fix, but probably it worth doing the cast in SemaExpr.cpp instead.
I don't see how this is a bug.
This is a bug indeed, according to at least OpenCL spec 6.3.j: "The result of E1 << E2 is E1 left-shifted by log2(N) least significant bits in E2 viewed as an unsigned integer value, where N is the number of bits used to represent the data type of E1 after integer promotion, if E1 is a scalar, or the number of bits used to represent the type of E1 elements, if E1 is a vector." C99 specification says something very similar (sorry, can't give a precise reference now). Please note this integer promotion part. If E1 is 64 bit wide, and E2 is signed 16bit, Clang first promotes E2 properly to 32bit signed, but then zext-s it to 64bit, whereas it should be sext-ed to 64bits and only then treated as if it was unsigned.
Ok, I see, in C99 it is explicitly an undefined behaviour. So the bug is OpenCL-specific.
The bug seems to be still there at least in LLVM 3.6: define i64 @shl_long_short(i64 %arg1.coerce, i16 signext %arg2) #0 { %1 = zext i64 %arg1.coerce to i128 %2 = zext i16 %arg2 to i128 %3 = and i128 %2, 127 %4 = shl i128 %1, %3 %5 = trunc i128 %4 to i64 ret i64 %5 }
What IR are you claiming this should produce? The same thing, but with %2 being a sext instead of a zext? I am failing to see how the "and" doesn't erase any supposed distinction there.
Patch provided by Qwertyuiop
Created attachment 15693 [details] [FIX] Bug 10002 - [opencl] Wrongfully assuming RHS is always unsigned Patch/test added. Issue ready to review.
Created attachment 15712 [details] Checks that signed right side of LSH operator does cast to unsigned Please review and submit.
Created attachment 15764 [details] Patch - Wrongfully assuming RHS is always unsigned
Phabricator page: http://reviews.llvm.org/D16460 Please review and commit.
I am not very clear what IR are we expecting here for OpenCL and whether any change is really needed. Pekka, would you be able to elaborate for your example of x86 target?
Hmm, shouldn't long long be i128 in the OpenCL C mode? Of the original issue I'm not sure -- I forgot the original issue, and the below code seems to be per the specs quote (due to the and masking). clang -x cl 444.cl -O3 -emit-llvm -S -o - ; ModuleID = '444.cl' source_filename = "444.cl" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i64 @shl_long_short(i64 %arg1.coerce, i16 signext %arg2) local_unnamed_addr #0 { entry: %coerce.val.ii = zext i64 %arg1.coerce to i128 %0 = and i16 %arg2, 127 %shl.mask = zext i16 %0 to i128 %shl = shl i128 %coerce.val.ii, %shl.mask %coerce.val.ii2 = trunc i128 %shl to i64 ret i64 %coerce.val.ii2 } attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 4.0.0 (branches/release_40 294436)"}
Yes, the behavior is correct here; closing.
What about the long long retvalue and arg generated as i64 instead of i128? Expected too?
Looking at the Table 6.4 of the spec v2.0 with OpenCL types and long long is actually a reserved type which is not supposed to be used... I am not sure whether we could give an error or there are any cases (i.e. extensions) where it could be supported.
If you think something is wrong with the size of long long, please file a new bug.
Anastasia: Right, the specs (at least I could not find it) do not say what the kernel compiler should do when a reserved data type is used. The table lists long long as 128b, but as it's not part of the main standard I suppose the current behavior is OK. The improved behavior would be to error out due to use of a reserved data type. Thus, a missing feature, not a bug.