New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 2547 - Implement support for Type Based Alias Analysis
Summary: Implement support for Type Based Alias Analysis
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Core LLVM classes (show other bugs)
Version: 1.0
Hardware: PC All
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords: code-quality
: 5968 6010 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-07-13 23:15 PDT by Chris Lattner
Modified: 2011-04-08 13:17 PDT (History)
12 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Lattner 2008-07-13 23:15:02 PDT
TBAA is a feature that llvm has never supported, and it is capable of providing significant speedups.  For example, here is the performance of SingleSource/Benchmarks/Himeno with various compilers:

gcc 4.0 -O3: 4.69
gcc 4.0 -O3 -fno-strict-aliasing: 5.93
gcc 4.2 -O3: 4.54
gcc 4.2 -O3 -fno-strict-aliasing: 6.14
llvm-gcc -O3: 5.43 [which is no strict aliasing]
llvm-gcc -O4: 3.46 [which is no strict aliasing]

As you can see, at -O3, we lose to GCC by ~20%.  Empirically, looking at the code, this is because loads aren't getting hoisted out of loops, which makes strides loop variant, which prevents LSR from eliminating multiplies.  -O4/LTO makes up for the lack of TBAA in this case with extra inlining etc.

An interesting detail is that TBAA is disabled by default in GCC on the mac, but enabled by default everywhere else.  This is probably why our performance is generally seen as worse (compared to GCC) on linux than it is on the mac.

-Chris
Comment 1 Duncan Sands 2008-07-14 02:45:41 PDT
Are you imagining having front-ends propagate extra type info down to
LLVM, or doing the analysis at the level of LLVM types?  Also, is there
really no other way of getting these benefits?  My feeling is that type
based alias analysis is a surprise to most users, causes subtle bugs,
and is most liked by compiler writers who are trying to squeeze ultimate
performance out :)  Then there are all the funky front-end rules, like
the special case of unions in C.
Comment 2 Dale Johannesen 2008-07-14 12:34:47 PDT
gcc's implementation of TBAA is maximally stupid, treating references to the same area of memory in the same function as noninterfering if they have different types, even though it ought to be able to tell.  Thus it has broken the common and useful (but nonstandard) construct
  *((type2*)&type1_object) = type2_value
for example.    I agree an equally stupid implementation would be a bad thing, but there is no need to do that.  The Rationale to the C90 standard makes it clear that the intent is that the compiler doesn't need to be overly paranoid when it cannot tell at compile time, as when a pointer is passed in and the original definition is not visible.  They give the example:
  int a;
  void f(double *b) {
    a = 1;
    *b = 2.0;
    g(a);
  }
It is safe to optimize the call to g(1) according to the standard and I doubt this sort of thing would cause a problem in practice.  If TBAA is implemented in a way that is is a fallback used when aliasing can't be determined from information otherwise available to the compiler, I don't think there is great harm to it.

That said, I also doubt this is going to be a major effect.  The Himeno numbers are compelling but that's only one benchmark.  I can make any compiler look bad if I get to pick the benchmark.

Comment 3 Chris Lattner 2008-07-14 12:59:09 PDT
We are going to need another way to propagate source level type info down into llvm, e.g. similar to the alias set tree used by the GCC RTL backend for TBAA.  I agree with Dale that implementing the maximally stupid approach is a bad idea :), we should follow ICC's model.  Basically, it only consults TBAA if other AA implementations return "unknown" (vs GCC, which consults TBAA first).  This means that if basicaa returns "must alias" TBAA is never consulted.

TBAA is a big deal in many FP benchmarks, not just Himeno.  Anyone on linux could do an llvm-test run with and without -fstrict-aliasing passed to GCC and compare the relative numbers to see for sure.

That said, I think work on this should wait until after the fast debug info support (which doesn't serialize and then deserialize MachineModuleInfo into llvm bc file unless -emit-llvm is used) happens, they will use similar mechanisms for encoding extra information in the IR.

-Chris
Comment 4 Duncan Sands 2008-07-17 02:17:27 PDT
>  int a;
>  void f(double *b) {
>    a = 1;
>    *b = 2.0;
>    g(a);
>  }

For this case it might be enough to have a parameter
attribute typealias which gets applied to b:

define void @f(double* typealias b) {...

The attribute would mean that b doesn't alias
anything of a different type.  That way front-ends
can not set it if strict-aliasing is turned off
for this pointer (eg: because in the original
b wasn't a double* but a union { int, double }*).
On the other hand I suppose there is also the
problem that the global "a" might itself be a
union in which case I suppose a double* might
actually point to it even if "a" turned up as
i32 in LLVM.

PS: I'm finding it hard to imagine how TBA could
make a big difference for real-world programs...
Comment 5 Chris Lattner 2008-07-17 12:49:55 PDT
TBAA is very useful in many cases, not the least of which is when you build stuff at -O2.  Not everyone uses LTO obviously.
Comment 6 Chris Lattner 2008-07-17 12:50:26 PDT
However, I'm *not* saying that it is personally a very high priority to me, I'm just saying we should do it someday. :)
Comment 7 Chris Lattner 2008-09-21 16:14:36 PDT
Related proposal:
http://nondot.org/sabre/LLVMNotes/EmbeddedMetadata.txt
Comment 8 Chris Lattner 2009-10-27 23:31:13 PDT
One random idea that occurred to me while working on something else: once we have TBAA it would be really cool to keep track of TBAA tags in the mod/ref sets for a function (this could be collected by -globalsmodref-aa).  This would allow us to return "none" or "readonly" by knowing that a function never writes to the TBAA tag being queried, without doing any pointer analysis or "deep" queries.
Comment 9 Chris Lattner 2010-01-07 14:57:21 PST
*** Bug 5968 has been marked as a duplicate of this bug. ***
Comment 10 Anton Korobeynikov 2010-01-12 08:58:06 PST
*** Bug 6010 has been marked as a duplicate of this bug. ***
Comment 11 Chris Lattner 2010-07-27 09:53:10 PDT
Dan is starting to poke at this.
Comment 12 Chris Lattner 2011-04-08 13:17:20 PDT
Dan implemented this.