LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 32056 - Invalid code generation (aliasing between scalar and vector type)
Summary: Invalid code generation (aliasing between scalar and vector type)
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: LLVM Codegen (show other bugs)
Version: trunk
Hardware: PC All
: P enhancement
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-24 02:27 PST by Wenzel Jakob
Modified: 2017-07-27 05:58 PDT (History)
13 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wenzel Jakob 2017-02-24 02:27:27 PST
Consider the simple snippet of C++ code below. It is a somewhat contrived reduced example from a much larger codebase, where this issue was first detected. In a situation where a vector type and a normal scalar type alias (but where this is made very explicit to the compiler), Clang's aliasing optimization generates incorrect code.

The class "A" is a double array with 8 entries, which is initialized to [0..8]. The entries can be accessed using Intel-specific vector types (__m256d) and double variables. The begin() and end() functions expose a C++ iterator interface to step through the values, and the main() function then prints them.

When compiled on my machine (with "clang++ test.cpp -std=c++11 -O3 -mavx -o test"), I observe the following output 

0.000000 1.000000 2.000000 3.000000 4.000000 0.000000 6.000000 7.000000
                                             (^ note the zero value here, which should be 5)

If compiled with -fno-strict-aliasing, the fifth entry is equal to 5.000000, confirming that this is indeed an aliasing optimization-related issue.

In case this is an issue with the C++ code, I would be curious how I can signal to Clang that an array type may alias with its underlying scalar type (it appears to me that this is a fairly essential requirement in all sorts of situations)

Thanks,
Wenzel

======= C++ snippet =========

#include <immintrin.h>
#include <stdio.h>

struct A {
    A () {
        a = _mm256_setr_pd(0.0, 1.0, 2.0, 3.0);
        b = _mm256_setr_pd(4.0, 5.0, 6.0, 7.0);
    }

    const double *begin() { return c; }
    const double *end() { return c+8; }

    union {
        struct { __m256d a, b; };
        double c[8];
    };
};

int main(int argc, char *argv[]) {
    A a;
    for (double value : a)
        printf("%f ", value);

    return 0;
}

==== LLVM IR Disassembly =====

(note the *undef* in the line
  %12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
)


define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 {
  %3 = alloca %struct.Test, align 32
  %4 = bitcast %struct.Test* %3 to i8*
  call void @llvm.lifetime.start(i64 64, i8* nonnull %4) #3
  %5 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0
  store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %5, align 32, !tbaa !2
  %6 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 1
  store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %6, align 32, !tbaa !6
  %7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 0.000000e+00)
  %8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.000000e+00)
  %9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 2.000000e+00)
  %10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 3.000000e+00)
  %11 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 4.000000e+00)
  %12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
  %13 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 6
  %14 = load double, double* %13, align 16, !tbaa !7
  %15 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %14)
  %16 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 7
  %17 = load double, double* %16, align 8, !tbaa !7
  %18 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %17)
  call void @llvm.lifetime.end(i64 64, i8* nonnull %4) #3
  ret i32 0
}
Comment 1 Sanjay Patel 2017-02-24 09:41:48 PST
Does bug 31928 provide the answer?
Comment 2 Wenzel Jakob 2017-02-24 10:22:45 PST
I don't think it does -- this issue has something to do with the fact that 'a' and 'b' are vector types (__m256d).

Two other things that are interesting:

1. Clang only miscompiles when 'a' and 'b' are initialized with constants (thus a part of the computation can be done at compile time, at which point something goes wrong).  When the values are fetched from elsewhere (e.g. an external symbol), the issue disappears.

2. The issue seems to be specific to vector registers. For instance, if I replace the definition of the class 'A' by the following (technically equivalent) code that does not involve __m256d, everything works without problems.

struct A {
    struct Foo { uint64_t a, b, c, d; };

    A () {
        /* Binary encoding for double values 0..7 */
        a = Foo{ 0x0000000000000000ull, 0x3FF0000000000000ull,
                 0x4000000000000000ull, 0x4008000000000000ull };
        b = Foo{ 0x4010000000000000ull, 0x4014000000000000ull,
                 0x4018000000000000ull, 0x401C000000000000ull };
    }

    const double *begin() { return c; }
    const double *end() { return c+8; }

    union {
        struct { Foo a, b; };
        double c[8];
    };
};
Comment 3 Wenzel Jakob 2017-03-01 02:09:19 PST
so.. any thoughts on this issue?

PS: added Intel folks to CC in case it is related to X86 specifically.
Comment 4 Sanjay Patel 2017-03-01 10:17:21 PST
(In reply to Wenzel Jakob from comment #3)
> so.. any thoughts on this issue?
> 
> PS: added Intel folks to CC in case it is related to X86 specifically.

I don't think it's an x86 problem (unless there's some mis-specification of the x86 intrinsics in the front-end). There's nothing x86-specific in the IR when the undef appears.

Here's the IR heading into -gvn with no undef on any printf, and "opt -tbaa -gvn" replaces it with undef. I don't know enough about aliasing or gvn to know what's happening here (cc'ing Davide and Daniel), but it does seem strange that we only make that one printf use undef.

; ModuleID = '32056.ll'
source_filename = "32056.cpp"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@.str.1 = private unnamed_addr constant [3 x i8] c"\0A\0A\00", align 1
@str = private unnamed_addr constant [2 x i8] c"\0A\00"

; Function Attrs: norecurse ssp uwtable
define i32 @main(i32 %argc, i8** nocapture readnone %argv) local_unnamed_addr #0 {
entry:
  %a = alloca %struct.A, align 32
  %0 = bitcast %struct.A* %a to i8*
  call void @llvm.lifetime.start(i64 64, i8* nonnull %0) #3
  %a.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
  store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %a.i.i, align 32, !tbaa !2
  %b.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
  store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %b.i.i, align 32, !tbaa !6
  %arraydecay.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
  %add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
  br label %for.body

for.body:                                         ; preds = %entry
  %1 = load double, double* %arraydecay.i, align 8, !tbaa !7
  %call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
  %incdec.ptr = getelementptr inbounds double, double* %arraydecay.i, i64 1
  %2 = load double, double* %incdec.ptr, align 8, !tbaa !7
  %call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
  %incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
  %3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7
  %call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
  %incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
  %4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7
  %call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
  %incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
  %5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7
  %call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
  %incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
  %6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7
  %call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
  %incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
  %7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7
  %call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
  %incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
  %8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7
  %call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
  %incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
  %puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0))
  call void @llvm.lifetime.end(i64 64, i8* nonnull %0) #3
  ret i32 0
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start(i64, i8* nocapture) #1

; Function Attrs: nounwind
declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #2

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end(i64, i8* nocapture) #1

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #3

attributes #0 = { norecurse ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }
attributes #2 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { nounwind }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 5.0.0 (trunk 296618)"}
!2 = !{!3, !4, i64 0}
!3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{!3, !4, i64 32}
!7 = !{!8, !8, i64 0}
!8 = !{!"double", !4, i64 0}
Comment 5 Sanjay Patel 2017-03-01 10:20:11 PST
Sorry, I pasted a bigger version than what I was looking at last. This removes some of the unnecessary bits (but can almost certainly be reduced some more):

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@str = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @main(i32 %argc, i8** nocapture readnone %argv) {
entry:
  %a = alloca %struct.A, align 32
  %0 = bitcast %struct.A* %a to i8*
  %a.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
  store <4 x double> <double 0.0, double 1.0, double 2.0, double 3.0>, <4 x double>* %a.i, align 32, !tbaa !2
  %b.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
  store <4 x double> <double 4.0, double 5.0, double 6.0, double 7.0>, <4 x double>* %b.i, align 32, !tbaa !6
  %arraydecay = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
  %add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
  br label %for.body

for.body:
  %1 = load double, double* %arraydecay, align 8, !tbaa !7
  %call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
  %incdec.ptr = getelementptr inbounds double, double* %arraydecay, i64 1
  %2 = load double, double* %incdec.ptr, align 8, !tbaa !7
  %call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
  %incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
  %3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7
  %call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
  %incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
  %4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7
  %call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
  %incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
  %5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7
  %call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
  %incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
  %6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7
  %call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
  %incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
  %7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7
  %call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
  %incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
  %8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7
  %call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
  %incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
  %puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0))
  ret i32 0
}

declare i32 @printf(i8* nocapture readonly, ...)
declare i32 @puts(i8* nocapture readonly)

!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 5.0.0 (trunk 296618)"}
!2 = !{!3, !4, i64 0}
!3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{!3, !4, i64 32}
!7 = !{!8, !8, i64 0}
!8 = !{!"double", !4, i64 0}
Comment 6 Daniel Berlin 2017-03-01 12:43:19 PST
This is definitely an aliasing issue.
You don't need GVN to see it.
A simple -print-memoryssa will show that aliasing is giving interesting answers.
(note: Unions are wildly borked with TBAA right now, adn there is a discussion on the mailing list about it)

Here's a minimal example:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
  %tmp = alloca %struct.wibble, align 32
  %tmp2 = bitcast %struct.wibble* %tmp to i8*
  %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
  store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
  %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
  store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
  %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
  %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
  br label %bb7

bb7:                                              ; preds = %bb
  %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
  %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
  %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
  %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
  %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
  %tmp23 = load double, double* %tmp22, align 8, !tbaa !5
  %tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23)
  %tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0))
  ret i32 0
}

declare i32 @hoge(i8* nocapture readonly, ...)

declare i32 @quux.2(i8* nocapture readonly)

!0 = !{!1, !2, i64 0}
!1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32}
!2 = !{!"omnipotent char", !3, i64 0}
!3 = !{!"Simple C++ TBAA"}
!4 = !{!1, !2, i64 32}
!5 = !{!6, !6, i64 0}
!6 = !{!"double", !2, i64 0}


print-memoryssa shows:
define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
  %tmp = alloca %struct.wibble, align 32
  %tmp2 = bitcast %struct.wibble* %tmp to i8*
  %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
; 1 = MemoryDef(liveOnEntry)
  store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
  %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
; 2 = MemoryDef(1)
  store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
  %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
  %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
  br label %bb7

bb7:                                              ; preds = %bb
  %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
  %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
  %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
  %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
  %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
; MemoryUse(liveOnEntry)
  %tmp23 = load double, double* %tmp22, align 8, !tbaa !5
; 3 = MemoryDef(2)
  %tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23)
; 4 = MemoryDef(3)
  %tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0))
  ret i32 0
}


Note the load (which is the broken one), is a use of live on entry, instead of 1.

This is wrong.
Comment 7 Daniel Berlin 2017-03-01 12:47:30 PST
For simplicity, here's a version with one okay load, one broken one, and no other calls:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define double @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
  %tmp = alloca %struct.wibble, align 32
  %tmp2 = bitcast %struct.wibble* %tmp to i8*
  %tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
  store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
  %tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
  store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
  %tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
  %tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
  br label %bb7

bb7:                                              ; preds = %bb
  %tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
  %tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
  %tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
  %tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
  %tmp20 = load double, double* %tmp19, align 8, !tbaa !5
  %tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
  %tmp23 = load double, double* %tmp22, align 8, !tbaa !5
  %tmp24 = fadd double %tmp20, %tmp23
  ret double %tmp24
}

declare i32 @hoge(i8* nocapture readonly, ...)

declare i32 @quux.2(i8* nocapture readonly)

!0 = !{!1, !2, i64 0}
!1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32}
!2 = !{!"omnipotent char", !3, i64 0}
!3 = !{!"Simple C++ TBAA"}
!4 = !{!1, !2, i64 32}
!5 = !{!6, !6, i64 0}
!6 = !{!"double", !2, i64 0}


The ret will become fadd %tmp20, undef
Comment 8 Daniel Berlin 2017-03-01 13:23:32 PST
Okay, here is what happens:

In our AA pipeline, basicaa is currently above tbaa (i thought we had it the other way around, so maybe this is a bug). 

For the first load in my example, BasicAA returns "MustAlias" and so we return that. We never ask TBAA.
If we did, it would say "NoAlias".

For the second load, BasicAA returns "MayAlias", so we continue on.
We ask TBAA, it says "NoAlias"
We return that.


The reason TBAA returns NoAlias is because this tree walk is simply wrong.

It is attempting to find NCA(tbaa !6, tbaa !1)

The clear answer is TBAA !2.

However, it gets "no common ancestor".

It does the following:

Walk A to root, see if we hit B
Walk B to root, see if we hit A.

If not, assume no NCA.

Here is a counterexample:

   root
    |
common ancestor
  /      \
 A        B


Here is a trivially correct way to do this:

While walking A to root, build the set of seen nodes
while walking B to root, if the node appears in the set from A, that is the common ancestor.

This is O(N)
The constant time way to do this is the same thing we do to the dominator tree:

DFS number the TBAA tree, use the dfs numbers to answer containment.

I will implement #1 as a correctness fix, then #2 if we need it.
Comment 9 Daniel Berlin 2017-03-01 13:27:49 PST
(In reply to Daniel Berlin from comment #8)
> Okay, here is what happens:
> 
> In our AA pipeline, basicaa is currently above tbaa (i thought we had it the
> other way around, so maybe this is a bug). 
> 
> For the first load in my example, BasicAA returns "MustAlias" and so we
> return that. We never ask TBAA.
> If we did, it would say "NoAlias".
> 
> For the second load, BasicAA returns "MayAlias", so we continue on.
> We ask TBAA, it says "NoAlias"
> We return that.
> 
> 
> The reason TBAA returns NoAlias is because this tree walk is simply wrong.
> 
> It is attempting to find NCA(tbaa !6, tbaa !1)
> 
> The clear answer is TBAA !2.
> 
> However, it gets "no common ancestor".
> 
> It does the following:
> 
> Walk A to root, see if we hit B
> Walk B to root, see if we hit A.
> 
> If not, assume no NCA.
> 
> Here is a counterexample:
> 
>    root
>     |
> common ancestor
>   /      \
>  A        B
> 
> 
> Here is a trivially correct way to do this:
> 
> While walking A to root, build the set of seen nodes
> while walking B to root, if the node appears in the set from A, that is the
> common ancestor.
> 
> This is O(N)
> The constant time way to do this is the same thing we do to the dominator
> tree:
> 
> DFS number the TBAA tree, use the dfs numbers to answer containment.
> 

Ugh, this won't work because we have offset parents.

What will work, however, is the union-find approach detailed in https://en.wikipedia.org/wiki/Lowest_common_ancestor

> I will implement #1 as a correctness fix, then #2 if we need it.
Comment 10 Daniel Berlin 2017-03-01 14:22:02 PST
So, as sanjoy points out, the way this tree is structured, NCA won't work either.
In any case, this tree is clearly wrong.


The trees are inverted from the representation i'm used to, which is one where the types are children of the struct nodes, not parents.

Even so, this tree says that double and the vector types don't alias, because they do not directly end up in the ancestor tree of each other.

IE double either needs to be a parent of the struct, and of every other node that has a double in it, or it needs to be a child of every node that cont
if we kept the trees in the same order as gcc, you would just make them both children of a union node, use the union tbaa node and declare victory.

IE
  union
  /    \ 
double  struct
 
if you invert that tree into our form:

 double struct
  |     /
  |    /
  |   /
  |  /
  | /
  union

Then both have to be parents of the union node, and the tbaa access specificed to the be the union node, and we don't allow that.

So i can't see a way to structure a tree that works here without allowing multiple parents (or multiple children).
Comment 11 Daniel Berlin 2017-03-01 15:31:20 PST
See thread [llvm-dev] RFC: Representing unions in TBAA
for more details
Comment 12 Sanjay Patel 2017-04-10 08:19:18 PDT
A patch to remove TBAA info from union members is posted here:
https://reviews.llvm.org/D31885
Comment 13 Wenzel Jakob 2017-04-10 08:55:09 PDT
I just applied D31885 to Clang trunk to re-test the snippets here. Unfortunately, they still fail.
Comment 14 Krzysztof Parzyszek 2017-04-10 10:43:00 PDT
(In reply to Wenzel Jakob from comment #13)
> I just applied D31885 to Clang trunk to re-test the snippets here.
> Unfortunately, they still fail.

I updated the patch and the testcase now works on my machine.
Comment 15 Wenzel Jakob 2017-07-27 05:58:42 PDT
I'll close the issue as D33328 was merged, which indeed fixes the testcase that prompted this issue.

Thanks,
Wenzel