Consider the simple snippet of C++ code below. It is a somewhat contrived reduced example from a much larger codebase, where this issue was first detected. In a situation where a vector type and a normal scalar type alias (but where this is made very explicit to the compiler), Clang's aliasing optimization generates incorrect code. The class "A" is a double array with 8 entries, which is initialized to [0..8]. The entries can be accessed using Intel-specific vector types (__m256d) and double variables. The begin() and end() functions expose a C++ iterator interface to step through the values, and the main() function then prints them. When compiled on my machine (with "clang++ test.cpp -std=c++11 -O3 -mavx -o test"), I observe the following output 0.000000 1.000000 2.000000 3.000000 4.000000 0.000000 6.000000 7.000000 (^ note the zero value here, which should be 5) If compiled with -fno-strict-aliasing, the fifth entry is equal to 5.000000, confirming that this is indeed an aliasing optimization-related issue. In case this is an issue with the C++ code, I would be curious how I can signal to Clang that an array type may alias with its underlying scalar type (it appears to me that this is a fairly essential requirement in all sorts of situations) Thanks, Wenzel ======= C++ snippet ========= #include <immintrin.h> #include <stdio.h> struct A { A () { a = _mm256_setr_pd(0.0, 1.0, 2.0, 3.0); b = _mm256_setr_pd(4.0, 5.0, 6.0, 7.0); } const double *begin() { return c; } const double *end() { return c+8; } union { struct { __m256d a, b; }; double c[8]; }; }; int main(int argc, char *argv[]) { A a; for (double value : a) printf("%f ", value); return 0; } ==== LLVM IR Disassembly ===== (note the *undef* in the line %12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef) ) define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 { %3 = alloca %struct.Test, align 32 %4 = bitcast %struct.Test* %3 to i8* call void @llvm.lifetime.start(i64 64, i8* nonnull %4) #3 %5 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0 store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %5, align 32, !tbaa !2 %6 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 1 store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %6, align 32, !tbaa !6 %7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 0.000000e+00) %8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.000000e+00) %9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 2.000000e+00) %10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 3.000000e+00) %11 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 4.000000e+00) %12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef) %13 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 6 %14 = load double, double* %13, align 16, !tbaa !7 %15 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %14) %16 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 7 %17 = load double, double* %16, align 8, !tbaa !7 %18 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %17) call void @llvm.lifetime.end(i64 64, i8* nonnull %4) #3 ret i32 0 }
Does bug 31928 provide the answer?
I don't think it does -- this issue has something to do with the fact that 'a' and 'b' are vector types (__m256d). Two other things that are interesting: 1. Clang only miscompiles when 'a' and 'b' are initialized with constants (thus a part of the computation can be done at compile time, at which point something goes wrong). When the values are fetched from elsewhere (e.g. an external symbol), the issue disappears. 2. The issue seems to be specific to vector registers. For instance, if I replace the definition of the class 'A' by the following (technically equivalent) code that does not involve __m256d, everything works without problems. struct A { struct Foo { uint64_t a, b, c, d; }; A () { /* Binary encoding for double values 0..7 */ a = Foo{ 0x0000000000000000ull, 0x3FF0000000000000ull, 0x4000000000000000ull, 0x4008000000000000ull }; b = Foo{ 0x4010000000000000ull, 0x4014000000000000ull, 0x4018000000000000ull, 0x401C000000000000ull }; } const double *begin() { return c; } const double *end() { return c+8; } union { struct { Foo a, b; }; double c[8]; }; };
so.. any thoughts on this issue? PS: added Intel folks to CC in case it is related to X86 specifically.
(In reply to Wenzel Jakob from comment #3) > so.. any thoughts on this issue? > > PS: added Intel folks to CC in case it is related to X86 specifically. I don't think it's an x86 problem (unless there's some mis-specification of the x86 intrinsics in the front-end). There's nothing x86-specific in the IR when the undef appears. Here's the IR heading into -gvn with no undef on any printf, and "opt -tbaa -gvn" replaces it with undef. I don't know enough about aliasing or gvn to know what's happening here (cc'ing Davide and Daniel), but it does seem strange that we only make that one printf use undef. ; ModuleID = '32056.ll' source_filename = "32056.cpp" target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.12.0" %struct.A = type { %union.anon } %union.anon = type { %struct.anon } %struct.anon = type { <4 x double>, <4 x double> } @.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1 @.str.1 = private unnamed_addr constant [3 x i8] c"\0A\0A\00", align 1 @str = private unnamed_addr constant [2 x i8] c"\0A\00" ; Function Attrs: norecurse ssp uwtable define i32 @main(i32 %argc, i8** nocapture readnone %argv) local_unnamed_addr #0 { entry: %a = alloca %struct.A, align 32 %0 = bitcast %struct.A* %a to i8* call void @llvm.lifetime.start(i64 64, i8* nonnull %0) #3 %a.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0 store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %a.i.i, align 32, !tbaa !2 %b.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1 store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %b.i.i, align 32, !tbaa !6 %arraydecay.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0 %add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8 br label %for.body for.body: ; preds = %entry %1 = load double, double* %arraydecay.i, align 8, !tbaa !7 %call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1) %incdec.ptr = getelementptr inbounds double, double* %arraydecay.i, i64 1 %2 = load double, double* %incdec.ptr, align 8, !tbaa !7 %call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2) %incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1 %3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7 %call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3) %incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1 %4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7 %call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4) %incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1 %5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7 %call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5) %incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1 %6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7 %call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6) %incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1 %7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7 %call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7) %incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1 %8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7 %call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8) %incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1 %puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0)) call void @llvm.lifetime.end(i64 64, i8* nonnull %0) #3 ret i32 0 } ; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.start(i64, i8* nocapture) #1 ; Function Attrs: nounwind declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #2 ; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.end(i64, i8* nocapture) #1 ; Function Attrs: nounwind declare i32 @puts(i8* nocapture readonly) #3 attributes #0 = { norecurse ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { argmemonly nounwind } attributes #2 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #3 = { nounwind } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"PIC Level", i32 2} !1 = !{!"clang version 5.0.0 (trunk 296618)"} !2 = !{!3, !4, i64 0} !3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C++ TBAA"} !6 = !{!3, !4, i64 32} !7 = !{!8, !8, i64 0} !8 = !{!"double", !4, i64 0}
Sorry, I pasted a bigger version than what I was looking at last. This removes some of the unnecessary bits (but can almost certainly be reduced some more): %struct.A = type { %union.anon } %union.anon = type { %struct.anon } %struct.anon = type { <4 x double>, <4 x double> } @.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1 @str = private unnamed_addr constant [2 x i8] c"\0A\00" define i32 @main(i32 %argc, i8** nocapture readnone %argv) { entry: %a = alloca %struct.A, align 32 %0 = bitcast %struct.A* %a to i8* %a.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0 store <4 x double> <double 0.0, double 1.0, double 2.0, double 3.0>, <4 x double>* %a.i, align 32, !tbaa !2 %b.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1 store <4 x double> <double 4.0, double 5.0, double 6.0, double 7.0>, <4 x double>* %b.i, align 32, !tbaa !6 %arraydecay = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0 %add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8 br label %for.body for.body: %1 = load double, double* %arraydecay, align 8, !tbaa !7 %call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1) %incdec.ptr = getelementptr inbounds double, double* %arraydecay, i64 1 %2 = load double, double* %incdec.ptr, align 8, !tbaa !7 %call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2) %incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1 %3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7 %call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3) %incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1 %4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7 %call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4) %incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1 %5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7 %call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5) %incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1 %6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7 %call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6) %incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1 %7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7 %call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7) %incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1 %8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7 %call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8) %incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1 %puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0)) ret i32 0 } declare i32 @printf(i8* nocapture readonly, ...) declare i32 @puts(i8* nocapture readonly) !0 = !{i32 1, !"PIC Level", i32 2} !1 = !{!"clang version 5.0.0 (trunk 296618)"} !2 = !{!3, !4, i64 0} !3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C++ TBAA"} !6 = !{!3, !4, i64 32} !7 = !{!8, !8, i64 0} !8 = !{!"double", !4, i64 0}
This is definitely an aliasing issue. You don't need GVN to see it. A simple -print-memoryssa will show that aliasing is giving interesting answers. (note: Unions are wildly borked with TBAA right now, adn there is a discussion on the mailing list about it) Here's a minimal example: ; ModuleID = 'broken.ll' source_filename = "broken.ll" %struct.wibble = type { %struct.eggs } %struct.eggs = type { %struct.ham } %struct.ham = type { <4 x double>, <4 x double> } @global = private unnamed_addr constant [4 x i8] c"%f \00", align 1 @global.1 = private unnamed_addr constant [2 x i8] c"\0A\00" define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) { bb: %tmp = alloca %struct.wibble, align 32 %tmp2 = bitcast %struct.wibble* %tmp to i8* %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0 store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0 %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1 store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4 %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0 %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8 br label %bb7 bb7: ; preds = %bb %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1 %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1 %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1 %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1 %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1 %tmp23 = load double, double* %tmp22, align 8, !tbaa !5 %tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23) %tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0)) ret i32 0 } declare i32 @hoge(i8* nocapture readonly, ...) declare i32 @quux.2(i8* nocapture readonly) !0 = !{!1, !2, i64 0} !1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32} !2 = !{!"omnipotent char", !3, i64 0} !3 = !{!"Simple C++ TBAA"} !4 = !{!1, !2, i64 32} !5 = !{!6, !6, i64 0} !6 = !{!"double", !2, i64 0} print-memoryssa shows: define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) { bb: %tmp = alloca %struct.wibble, align 32 %tmp2 = bitcast %struct.wibble* %tmp to i8* %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0 ; 1 = MemoryDef(liveOnEntry) store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0 %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1 ; 2 = MemoryDef(1) store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4 %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0 %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8 br label %bb7 bb7: ; preds = %bb %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1 %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1 %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1 %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1 %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1 ; MemoryUse(liveOnEntry) %tmp23 = load double, double* %tmp22, align 8, !tbaa !5 ; 3 = MemoryDef(2) %tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23) ; 4 = MemoryDef(3) %tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0)) ret i32 0 } Note the load (which is the broken one), is a use of live on entry, instead of 1. This is wrong.
For simplicity, here's a version with one okay load, one broken one, and no other calls: ; ModuleID = 'broken.ll' source_filename = "broken.ll" %struct.wibble = type { %struct.eggs } %struct.eggs = type { %struct.ham } %struct.ham = type { <4 x double>, <4 x double> } @global = private unnamed_addr constant [4 x i8] c"%f \00", align 1 @global.1 = private unnamed_addr constant [2 x i8] c"\0A\00" define double @quux(i32 %arg, i8** nocapture readnone %arg1) { bb: %tmp = alloca %struct.wibble, align 32 %tmp2 = bitcast %struct.wibble* %tmp to i8* %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0 store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0 %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1 store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4 %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0 %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8 br label %bb7 bb7: ; preds = %bb %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1 %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1 %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1 %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1 %tmp20 = load double, double* %tmp19, align 8, !tbaa !5 %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1 %tmp23 = load double, double* %tmp22, align 8, !tbaa !5 %tmp24 = fadd double %tmp20, %tmp23 ret double %tmp24 } declare i32 @hoge(i8* nocapture readonly, ...) declare i32 @quux.2(i8* nocapture readonly) !0 = !{!1, !2, i64 0} !1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32} !2 = !{!"omnipotent char", !3, i64 0} !3 = !{!"Simple C++ TBAA"} !4 = !{!1, !2, i64 32} !5 = !{!6, !6, i64 0} !6 = !{!"double", !2, i64 0} The ret will become fadd %tmp20, undef
Okay, here is what happens: In our AA pipeline, basicaa is currently above tbaa (i thought we had it the other way around, so maybe this is a bug). For the first load in my example, BasicAA returns "MustAlias" and so we return that. We never ask TBAA. If we did, it would say "NoAlias". For the second load, BasicAA returns "MayAlias", so we continue on. We ask TBAA, it says "NoAlias" We return that. The reason TBAA returns NoAlias is because this tree walk is simply wrong. It is attempting to find NCA(tbaa !6, tbaa !1) The clear answer is TBAA !2. However, it gets "no common ancestor". It does the following: Walk A to root, see if we hit B Walk B to root, see if we hit A. If not, assume no NCA. Here is a counterexample: root | common ancestor / \ A B Here is a trivially correct way to do this: While walking A to root, build the set of seen nodes while walking B to root, if the node appears in the set from A, that is the common ancestor. This is O(N) The constant time way to do this is the same thing we do to the dominator tree: DFS number the TBAA tree, use the dfs numbers to answer containment. I will implement #1 as a correctness fix, then #2 if we need it.
(In reply to Daniel Berlin from comment #8) > Okay, here is what happens: > > In our AA pipeline, basicaa is currently above tbaa (i thought we had it the > other way around, so maybe this is a bug). > > For the first load in my example, BasicAA returns "MustAlias" and so we > return that. We never ask TBAA. > If we did, it would say "NoAlias". > > For the second load, BasicAA returns "MayAlias", so we continue on. > We ask TBAA, it says "NoAlias" > We return that. > > > The reason TBAA returns NoAlias is because this tree walk is simply wrong. > > It is attempting to find NCA(tbaa !6, tbaa !1) > > The clear answer is TBAA !2. > > However, it gets "no common ancestor". > > It does the following: > > Walk A to root, see if we hit B > Walk B to root, see if we hit A. > > If not, assume no NCA. > > Here is a counterexample: > > root > | > common ancestor > / \ > A B > > > Here is a trivially correct way to do this: > > While walking A to root, build the set of seen nodes > while walking B to root, if the node appears in the set from A, that is the > common ancestor. > > This is O(N) > The constant time way to do this is the same thing we do to the dominator > tree: > > DFS number the TBAA tree, use the dfs numbers to answer containment. > Ugh, this won't work because we have offset parents. What will work, however, is the union-find approach detailed in https://en.wikipedia.org/wiki/Lowest_common_ancestor > I will implement #1 as a correctness fix, then #2 if we need it.
So, as sanjoy points out, the way this tree is structured, NCA won't work either. In any case, this tree is clearly wrong. The trees are inverted from the representation i'm used to, which is one where the types are children of the struct nodes, not parents. Even so, this tree says that double and the vector types don't alias, because they do not directly end up in the ancestor tree of each other. IE double either needs to be a parent of the struct, and of every other node that has a double in it, or it needs to be a child of every node that cont if we kept the trees in the same order as gcc, you would just make them both children of a union node, use the union tbaa node and declare victory. IE union / \ double struct if you invert that tree into our form: double struct | / | / | / | / | / union Then both have to be parents of the union node, and the tbaa access specificed to the be the union node, and we don't allow that. So i can't see a way to structure a tree that works here without allowing multiple parents (or multiple children).
See thread [llvm-dev] RFC: Representing unions in TBAA for more details
A patch to remove TBAA info from union members is posted here: https://reviews.llvm.org/D31885
I just applied D31885 to Clang trunk to re-test the snippets here. Unfortunately, they still fail.
(In reply to Wenzel Jakob from comment #13) > I just applied D31885 to Clang trunk to re-test the snippets here. > Unfortunately, they still fail. I updated the patch and the testcase now works on my machine.
I'll close the issue as D33328 was merged, which indeed fixes the testcase that prompted this issue. Thanks, Wenzel