LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 3100 - clang is unable to devirtualize a function call
Summary: clang is unable to devirtualize a function call
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: LLVM Codegen (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
: 6054 (view as bug list)
Depends on: 810
Blocks:
  Show dependency tree
 
Reported: 2008-11-19 08:56 PST by Rafael Ávila de Espíndola
Modified: 2014-08-29 02:43 PDT (History)
18 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rafael Ávila de Espíndola 2008-11-19 08:56:01 PST
Testcase from the g++ testsuite
-----------------------------------------
struct S { S(); virtual void xyzzy(); };
inline void foo(S *s) { s->xyzzy(); }
void bar() { S s; foo(&s); }
----------------------------------------

With gcc we got

call    _ZN1S5xyzzyEv

with llvm-g++ we got

call    *%rax

there rax contains the pointer fetched from the vtable. The problem is that llvm-g++ knows what the vtable looks like, but doesn't communicate that to llvm.

A simpler testcase also fails with llvm:

----------------------------------------
struct S { S() {}; virtual void xyzzy(); };
inline void foo(S *s) { s->xyzzy(); }
void bar() { S s; foo(&s); }
----------------------------------------

Now we have the constructor body and we do a bit better:
----------------------------
call    *_ZTV1S+16
---------------------------

but llvm-g++ only says

@_ZTV1S = external constant [3 x i32 (...)*]

so there is no way for llvm to const propagate the fetch.
Comment 1 Chris Lattner 2008-11-20 19:57:45 PST
The problem with the first testcase is that LLVM doesn't see the ctor, so it doesn't know what the vtable pointer is.  It also doesn't see the vtable.

The problem with the second case is that there is an out-of-line vtable in another translation unit, so llvm optimizers don't see the vtable.  The right way to fix this one is to introduce a new linkage type that basically corresponds to GCC's "extern inline" linkage: we know that there is a strong symbol backing the global, but we can use the definition in the translation unit to inline or analyze.
Comment 2 Rafael Ávila de Espíndola 2008-11-21 02:38:58 PST
For the "extern inline" linkage we just have to make sure we have more reasonable semantics then C :-)

We should be able to use the information in other modules if we link and the information must be consistent with the strong definition.

Is known_extern a good name for the new linkage? Should avoid confusion with the C one.

For the first one, can we extent the type system so that llvm-g++ can inform llvm that it knows the vtable without knowing the constructor?

I haven't put a lot of thought to it, but I should be possible by having llvm-g++ create two types for every class C with a vtable. One would correspond to exactly_C and the other would be C_or_subclass. The exactly_C type has a know pointer to a vtable. The C_or_subclass does not. When compiling a constructor, llvm-g++ makes in return exactly_C. Every function that takes a pointer to class C, is generated as taking a pointer to C_or_subclass, and casts are added in the call.

Nick has some ideas on how to propagate information so that more devirtualization can be done, but I think we should be able to handle these two cases in file at a time mode.

I say we fix the second testcase first :-)
Comment 3 Chris Lattner 2008-11-21 11:10:26 PST
I think that known_extern is a good linkage name, what do you think Duncan?  Implementing this will also let llvm optimize "extern inline" better, and will actually help clang out as well (which currently doesn't inline extern inline at all, because it has no "tree inliner").

Lets split this bug into two PR's.  This bug will track the second issue, and we can discuss the first issue in a different PR.
Comment 4 Rafael Ávila de Espíndola 2008-11-24 02:27:11 PST
The proposed know_extern would not be a good way of implementing "extern inline" functions in clang. This is because of the strange requirement of the C standard.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf says:

An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit.

Note that it is not legal to use a "extern inline" in another translation unit. It should not be written to disk. Maybe clang can implement "extern inline" by

1) Generating a normal inline body for it.
2) Running optimization which might or might not inline the function
3) Replacing the body with an extern declaration
4) Writing IL to disk

Looks strange, but matches the "extern inline" requirements.
Comment 5 Chris Lattner 2008-11-24 13:09:35 PST
"extern inline" is a GNU extension, C99 doesn't apply to it.
Comment 6 Rafael Ávila de Espíndola 2008-11-24 13:58:03 PST
no. The link on the previous comment is part of the standard.
Comment 7 Chris Lattner 2008-11-24 15:20:30 PST
Ok, well even if C can't use it, it would be great for C++ :)
Comment 8 Chris Lattner 2009-03-09 15:03:28 PDT
See also Bug 3517 for extern inline issues.
Comment 9 Chris Lattner 2009-03-13 00:10:11 PDT
Here's another case (same extern inline issue), from http://www.agner.org/optimize/optimizing_cpp.pdf

// Example 7.19. Devirtualization 
class C0 { 
   public: 
   virtual void f(); 
}; 
 
class C1 : public C0 { 
   public: 
   virtual void f(); 
}; 
 
void g() { 
   C1 obj1; 
   C0 * p = & obj1; 
   p->f();               // Virtual call to C1::f 
} 

Comment 10 Chris Lattner 2009-04-13 00:45:51 PDT
I added a new linkage type for "extern inline" and "c99 inline" named "available_externally" here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090413/076242.html

Now we just need llvm-gcc and clang to generate it.
Comment 11 Chris Lattner 2009-06-01 22:51:20 PDT
llvm-gcc is now generating available_externally for c99 inline functions (thanks duncan!), but not yet for C++ metadata like vtables etc.
Comment 12 Chris Lattner 2009-11-07 02:26:20 PST
Does this work with clang?
Comment 13 Anders Carlsson 2009-11-07 16:07:28 PST
Not at the moment. I have a patch that implements it (with some other vtable cleanup) but I haven't committed that yet...
Comment 14 Nick Lewycky 2009-11-29 19:09:04 PST
With r90099, clang will now devirtualize both Chris' example in comment 9 and Rafael's second example. We don't get Rafael's first example because clang doesn't emit any vtable in that case. Anders, is that what you're working on?
Comment 15 Nick Lewycky 2010-01-16 11:29:24 PST
*** Bug 6054 has been marked as a duplicate of this bug. ***
Comment 16 Chris Lattner 2010-04-20 18:27:35 PDT
What's the status of this?
Comment 17 Nick Lewycky 2010-04-20 19:45:19 PDT
Same state as before. Rafael's first example, to restate:

  struct S { S(); virtual void xyzzy(); };
  inline void foo(S *s) { s->xyzzy(); }
  void bar() { S s; foo(&s); }

emits "callq *(%rax)" when it could very well emit a call to _ZN1S5xyzzyEv. Clang isn't emitting the vtable at all in this TU, not even as available_externally.
Comment 18 Chris Lattner 2010-05-01 12:46:09 PDT
FWIW, we still missing this.  There remain two issues.  In rafael's first example, we don't know what the vtable is because we don't see the ctor:

struct S { S(); virtual void xyzzy(); };
inline void foo(S *s) { s->xyzzy(); }
void bar() { S s; foo(&s); }

In his second example, it does, but clang isn't emitting the vtable as available_externally:

struct S { S() {}; virtual void xyzzy(); };
inline void foo(S *s) { s->xyzzy(); }
void bar() { S s; foo(&s); }

If it did, the optimizer would catch it.  Agner Fog's example from Comment 9 is the same issue.  We seem to have regressed, because Comment 14 indicated that we started getting these cases.
Comment 19 Chris Lattner 2010-09-05 19:26:41 PDT
Reverified.  I'm not sure what to do about the first "remaining testcase" from Comment 18, but the second one just needs the vtable to be emitted as available_externally.  We have everything in place to do this now I think.
Comment 20 Anders Carlsson 2010-10-30 10:09:23 PDT
(In reply to comment #19)
> Reverified.  I'm not sure what to do about the first "remaining testcase" from
> Comment 18, but the second one just needs the vtable to be emitted as
> available_externally.  We have everything in place to do this now I think.

Someone suggested to me a while ago that we could add "assertions" to the IR to hint the optimizer about the vtable layout. I tried doing this, something like:


define void @_Z1gv() nounwind {
entry:
  %s = alloca %struct.S, align 8
  call void @_ZN1SC1Ev(%struct.S* %s)
  %0 = bitcast %struct.S* %s to i8***
  %vtable = load i8*** %0
  %1 = icmp eq i8** %vtable, getelementptr inbounds ([3 x i8*]* @_ZTV1S, i64 0, i64 2)
  br i1 %1, label %vtable.eq, label %vtable.ne

vtable.ne:                                        ; preds = %entry
  unreachable

vtable.eq:                                        ; preds = %entry
  call void @_Z1fP1S(%struct.S* %s)
  ret void
}

But that didn't help. Is that the correct approach? Does this require changes to the optimizers?
Comment 21 Anders Carlsson 2011-01-30 14:47:40 PST
With the changes I landed in 124565, we now devirtualize the second example:

> struct S { S() {}; virtual void xyzzy(); };
> inline void foo(S *s) { s->xyzzy(); }
> void bar() { S s; foo(&s); }
> 

define void @_Z3barv() nounwind {
entry:
  %s = alloca %struct.S, align 8
  %0 = getelementptr inbounds %struct.S* %s, i64 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1S, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8
  call void @_ZN1S5xyzzyEv(%struct.S* %s) nounwind
  ret void
}
Comment 22 David Blaikie 2011-10-30 11:27:57 PDT
For want of any better assignment, & because I'm rather interested in this bug, I'm assigning it to myself. 

To fix the remaining issue (a non-inline ctor that's obscuring the constant-ness of the vtable pointer) it'll probably take some kind of ability to provide assumptions/known facts to the back end. S
Comment 23 David Blaikie 2011-10-30 11:29:32 PDT
Sorry for the halfwritten update. Anyway, I've made this bug dependent on PR810 which is about, one way or another, ensuring LLVM can optimize based on conditions branching to unreachable blocks, using the condition as an assumed fact. I'll add more details to that bug to discuss why this isn't happening already.