Testcase from the g++ testsuite ----------------------------------------- struct S { S(); virtual void xyzzy(); }; inline void foo(S *s) { s->xyzzy(); } void bar() { S s; foo(&s); } ---------------------------------------- With gcc we got call _ZN1S5xyzzyEv with llvm-g++ we got call *%rax there rax contains the pointer fetched from the vtable. The problem is that llvm-g++ knows what the vtable looks like, but doesn't communicate that to llvm. A simpler testcase also fails with llvm: ---------------------------------------- struct S { S() {}; virtual void xyzzy(); }; inline void foo(S *s) { s->xyzzy(); } void bar() { S s; foo(&s); } ---------------------------------------- Now we have the constructor body and we do a bit better: ---------------------------- call *_ZTV1S+16 --------------------------- but llvm-g++ only says @_ZTV1S = external constant [3 x i32 (...)*] so there is no way for llvm to const propagate the fetch.
The problem with the first testcase is that LLVM doesn't see the ctor, so it doesn't know what the vtable pointer is. It also doesn't see the vtable. The problem with the second case is that there is an out-of-line vtable in another translation unit, so llvm optimizers don't see the vtable. The right way to fix this one is to introduce a new linkage type that basically corresponds to GCC's "extern inline" linkage: we know that there is a strong symbol backing the global, but we can use the definition in the translation unit to inline or analyze.
For the "extern inline" linkage we just have to make sure we have more reasonable semantics then C :-) We should be able to use the information in other modules if we link and the information must be consistent with the strong definition. Is known_extern a good name for the new linkage? Should avoid confusion with the C one. For the first one, can we extent the type system so that llvm-g++ can inform llvm that it knows the vtable without knowing the constructor? I haven't put a lot of thought to it, but I should be possible by having llvm-g++ create two types for every class C with a vtable. One would correspond to exactly_C and the other would be C_or_subclass. The exactly_C type has a know pointer to a vtable. The C_or_subclass does not. When compiling a constructor, llvm-g++ makes in return exactly_C. Every function that takes a pointer to class C, is generated as taking a pointer to C_or_subclass, and casts are added in the call. Nick has some ideas on how to propagate information so that more devirtualization can be done, but I think we should be able to handle these two cases in file at a time mode. I say we fix the second testcase first :-)
I think that known_extern is a good linkage name, what do you think Duncan? Implementing this will also let llvm optimize "extern inline" better, and will actually help clang out as well (which currently doesn't inline extern inline at all, because it has no "tree inliner"). Lets split this bug into two PR's. This bug will track the second issue, and we can discuss the first issue in a different PR.
The proposed know_extern would not be a good way of implementing "extern inline" functions in clang. This is because of the strange requirement of the C standard. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf says: An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. Note that it is not legal to use a "extern inline" in another translation unit. It should not be written to disk. Maybe clang can implement "extern inline" by 1) Generating a normal inline body for it. 2) Running optimization which might or might not inline the function 3) Replacing the body with an extern declaration 4) Writing IL to disk Looks strange, but matches the "extern inline" requirements.
"extern inline" is a GNU extension, C99 doesn't apply to it.
no. The link on the previous comment is part of the standard.
Ok, well even if C can't use it, it would be great for C++ :)
See also Bug 3517 for extern inline issues.
Here's another case (same extern inline issue), from http://www.agner.org/optimize/optimizing_cpp.pdf // Example 7.19. Devirtualization class C0 { public: virtual void f(); }; class C1 : public C0 { public: virtual void f(); }; void g() { C1 obj1; C0 * p = & obj1; p->f(); // Virtual call to C1::f }
I added a new linkage type for "extern inline" and "c99 inline" named "available_externally" here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090413/076242.html Now we just need llvm-gcc and clang to generate it.
llvm-gcc is now generating available_externally for c99 inline functions (thanks duncan!), but not yet for C++ metadata like vtables etc.
Does this work with clang?
Not at the moment. I have a patch that implements it (with some other vtable cleanup) but I haven't committed that yet...
With r90099, clang will now devirtualize both Chris' example in comment 9 and Rafael's second example. We don't get Rafael's first example because clang doesn't emit any vtable in that case. Anders, is that what you're working on?
*** Bug 6054 has been marked as a duplicate of this bug. ***
What's the status of this?
Same state as before. Rafael's first example, to restate: struct S { S(); virtual void xyzzy(); }; inline void foo(S *s) { s->xyzzy(); } void bar() { S s; foo(&s); } emits "callq *(%rax)" when it could very well emit a call to _ZN1S5xyzzyEv. Clang isn't emitting the vtable at all in this TU, not even as available_externally.
FWIW, we still missing this. There remain two issues. In rafael's first example, we don't know what the vtable is because we don't see the ctor: struct S { S(); virtual void xyzzy(); }; inline void foo(S *s) { s->xyzzy(); } void bar() { S s; foo(&s); } In his second example, it does, but clang isn't emitting the vtable as available_externally: struct S { S() {}; virtual void xyzzy(); }; inline void foo(S *s) { s->xyzzy(); } void bar() { S s; foo(&s); } If it did, the optimizer would catch it. Agner Fog's example from Comment 9 is the same issue. We seem to have regressed, because Comment 14 indicated that we started getting these cases.
Reverified. I'm not sure what to do about the first "remaining testcase" from Comment 18, but the second one just needs the vtable to be emitted as available_externally. We have everything in place to do this now I think.
(In reply to comment #19) > Reverified. I'm not sure what to do about the first "remaining testcase" from > Comment 18, but the second one just needs the vtable to be emitted as > available_externally. We have everything in place to do this now I think. Someone suggested to me a while ago that we could add "assertions" to the IR to hint the optimizer about the vtable layout. I tried doing this, something like: define void @_Z1gv() nounwind { entry: %s = alloca %struct.S, align 8 call void @_ZN1SC1Ev(%struct.S* %s) %0 = bitcast %struct.S* %s to i8*** %vtable = load i8*** %0 %1 = icmp eq i8** %vtable, getelementptr inbounds ([3 x i8*]* @_ZTV1S, i64 0, i64 2) br i1 %1, label %vtable.eq, label %vtable.ne vtable.ne: ; preds = %entry unreachable vtable.eq: ; preds = %entry call void @_Z1fP1S(%struct.S* %s) ret void } But that didn't help. Is that the correct approach? Does this require changes to the optimizers?
With the changes I landed in 124565, we now devirtualize the second example: > struct S { S() {}; virtual void xyzzy(); }; > inline void foo(S *s) { s->xyzzy(); } > void bar() { S s; foo(&s); } > define void @_Z3barv() nounwind { entry: %s = alloca %struct.S, align 8 %0 = getelementptr inbounds %struct.S* %s, i64 0, i32 0 store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1S, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 call void @_ZN1S5xyzzyEv(%struct.S* %s) nounwind ret void }
For want of any better assignment, & because I'm rather interested in this bug, I'm assigning it to myself. To fix the remaining issue (a non-inline ctor that's obscuring the constant-ness of the vtable pointer) it'll probably take some kind of ability to provide assumptions/known facts to the back end. S
Sorry for the halfwritten update. Anyway, I've made this bug dependent on PR810 which is about, one way or another, ensuring LLVM can optimize based on conditions branching to unreachable blocks, using the condition as an assumed fact. I'll add more details to that bug to discuss why this isn't happening already.