Created attachment 19883 [details] slow bytecode We've recently upgraded to LLVM 6.0 (or at least what's close to the release_60 branch of LLVM) in Rust, but unfortunately we're seeing a regression in compile time now! First reported to us [1] it looks like the compile time to emit debug information has increased dramatically between LLVM 4.0 and LLVM 6.0. Doing some testing locally I've managed to extract a test case for LLVM I think. The attached IR completes codegen in ~2s on LLVM 4.0 but takes minutes on LLVM 6.0 I ran a bisection based on the results of the command: llc test.bc -filetype=obj -O2 -relocation-model=pic and this bisection pointed at https://reviews.llvm.org/D38229. I've cc'd those involved with that patch here, would y'all mind helping take a look into this regression? Also this may also just be rustc not producing debug information that LLVM is expecting, so if something's off about that please just let us know! [1]: https://github.com/rust-lang/rust/issues/48226
I'm not sure we'll actually end up blocking 6.0 on this, but let's keep an eye on it.
Do you have the original Rust code? I have tried to understands from the debug information what the Rust code looks like. A large amount of the debug values in the attached reproducer seems to be related to the handling of the array named 'x' in the original code. It seems to be an array of a struct (&str, usize) with 77 entries originating from line 1 of the file x11-dl-2.17.2/src/lib.rs
Created attachment 19925 [details] another slow file
Sure yeah! The original source for this is this crate * https://crates.io/crates/x11-dl/2.17.2 with the source code hosted roughly at * https://github.com/Daggerbot/x11-rs/tree/d66b29f2c898aa9d26c6bd13fa067b8656a8c6e5/x11-dl/src The Rust compiler, however, automatically splits up crates into object files (and also deals with monomorphization/inlining generics/etc), and I think the slowdown is actually a pattern not directly present in this crate but rather the standard library. I've uploaded another bytecode file that the Rust compiler is producing, and this is the unit of compilation, at least in rustc, which seems to be causing the major slowdown. (I basically told rustc to make as granular codegen units as possible and this is the one that took the longest). Most of the symbols here are related to std::sync::Once, defined here -- * https://github.com/rust-lang/rust/blob/27a046e9338fb0455c33b13e8fe28da78212dedc/src/libstd/sync/once.rs#L220 Unfortunately I wasn't able to create a standalone test case using just this which exhibited the same slowdown :( Does that help though? Or would it be more useful to have a minimal example from Rust source code as well?
The Rust source you pointed me at don't contain any local variable named 'x' in line 1 of the file x11-dl-2.17.2/src/lib.rs I guess that the debug info in test.bc point out the wrong location in the Rust source code. To me it feels like something is wrong with the debug values generated for the code when using -relocation-model=pic. At least I don't understands what it is doing. I need more time to figure out what's going on here. I can't use the second attachment ("another slow file") when trying to disassemble the bytecode (with llvm-dis) I get a lot of errors and it end with a "LLVM ERROR: Broken module found, compilation aborted!"
This is my view of the problem: The patch D38229 (a change in the pass Live Debug Variables) will on the input given in test.bc duplicate the debug values over and over again. The pass itself will not increase in compile time but a later pass Dwarf Debug will take a lot extra time when it try to handle all those extra debug values. However, the reason for Live Debug Variables to duplicate the debug values over and over again is because the debug information given as input to Live Debug Variables are broken. This have been broken even before the introduction of the patch D38229 (probably even in LLVM 4.0) but it have not shown in increased compile time (but only in broken debug information that lead to wrong value printed in a debugger (gdb) when debugging optimized code). The origin of the fault seems to be that the "X86 DAG->DAG Instruction Selection" pass don't always know where to insert debug values and then seems to insert them in the beginning of the basic block. Then we have a use of a virtual register that is not defed yet. That is bogus and have no meaning. I guess that the verifiers don't check liveness for debug values. Maybe it is possible to extend verifier (for debug info) to check for this. When the Live Debug Variable pass remove the debug values before register allocation it interpret this as the debug value is live from the basic block start (which is not true). I suggest that the Live Debug Variable should detect debug values with uses of virtual registers that are not defed and simply throw away those debug values (at least until a real fix in the DAG->DAG Instruction Selection pass in available). Then someone need to look at the DAG->DAG Instruction Selection pass and figure out why it insert DBG_VALUES at the start of the basic block.
Created attachment 19938 [details] llc bugpoint-reduced-simplified.ll -O2 -relocation-model=pic -o bugpoint-reduced-simplified.s -print-after-all -debug I have made a small reproducer from test.bc by running bugpoint. This reproducer will not show an increase in compile time as it is a minimal example only to view the broken behaviour in llvm. llc bugpoint-reduced-simplified.ll -O2 -relocation-model=pic -o bugpoint-reduced-simplified.s Add the option -print-after-all to view the output in between passes and -debug to get debug printouts: After the pass X86 DAG->DAG Instruction Selection bb.4 (%ir-block.4): ; predecessors: %bb.2 DBG_VALUE debug-use %4, debug-use $noreg, !"x", !DIExpression(DW_OP_LLVM_fragment, 10944, 64), debug-location !864; line no:1 %4:gr64 = LEA64r $rip, 1, $noreg, @str.GE, $noreg, debug-location !976 %5:gr64 = IMPLICIT_DEF MOV64mr killed %5, 1, $noreg, 0, $noreg, killed %4, debug-location !976; mem:ST8[undef] RET 0, debug-location !977 The DBG_VALUE is inserted first in the basic block and have a use of the virtual register %4 that is not yet defined. This is wrong. The DBG_VALUE should have been inserted directly after the line: %4:gr64 = LEA64r $rip, 1, $noreg, @str.GE, $noreg, debug-location !976" In the test.bc (the original reproducer) there is a lot more code in between the DBG_VALUE and the real def of %4. In the debug printout from Live Debug Variables you get: ********** COMPUTING LIVE DEBUG VARIABLES: _ZN3std4sync4once4Once9call_once28_$u7b$$u7b$closure$u7d$$u7d$17hab43846908a864c6E ********** ********** DEBUG VARIABLES ********** !"x,1 @[/home/alex/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.0.0/src/lazy.rs:24 @[ /checkout/src/libstd/sync/once.rs:227 ]]" [224B;272B):0 Loc0=debug-use %4 ... 224B bb.4 (%ir-block.4): ; predecessors: %bb.2 240B %4:gr64 = LEA64r $rip, 1, $noreg, @str.GE, $noreg, debug-location !976 256B MOV64mr undef %5:gr64, 1, $noreg, 0, $noreg, %4, debug-location !976; mem:ST8[undef] 272B RET 0, debug-location !977 The variable "x" that reside in the virtual register %4 is according to printout above valid from slot index 224B to 272B. Which mean the whole basic block, instead of from 240B the def of %4 to 272B. In this small example it really doesn't matter as no extra code exist in between 224B and 240B but in the real non reduced code (test.gb) a lot of extra code that happen to be allocated to the same physical register as %4 is now thought to affect the variable x. This is not correct.
Thanks for your analysis, Karl-Johan! We spent a lot of time optimizing LiveDebugValues, so a performance degradation in this pass usually points to some earlier pass inserting too many debug intrinsics, as you also found out.
Karl-Johan er sorry if the last bytecode I posted didn't work! Not entirely sure what was up with that... It sounds, though, like you've narrowed it down elsewhere? Also, when you say that the debuginfo is erroneous, is that the output of rustc which is erroneous or LLVM's internal representation?
(In reply to Alex Crichton from comment #9) > Karl-Johan er sorry if the last bytecode I posted didn't work! Not entirely > sure what was up with that... It sounds, though, like you've narrowed it > down elsewhere? > > Also, when you say that the debuginfo is erroneous, is that the output of > rustc which is erroneous or LLVM's internal representation? I think this comment from Karl-Johan answers that question: "The origin of the fault seems to be that the "X86 DAG->DAG Instruction Selection" pass don't always know where to insert debug values and then seems to insert them in the beginning of the basic block." Without much knowledge about rustc our conclusion was that it is the instruction selection that transforms IR like this: %a = ... dbg.value( "x[0] is in %a" ) %b = ... dbg.value( "x[1] is in %b" ) %c = ... dbg.value( "x[2] is in %c" ) into DBG_VALUE( "x[0] is in %vrega" ) DBG_VALUE( "x[1] is in %vregb" ) DBG_VALUE( "x[2] is in %vregc" ) %vrega = instr1 %vregb = instr2 %vregc = instr3 When the DBG_VALUE instructions are referring to a virtual register that isn't defined yet the backend it will be mapped to incorrect slot index ranges during register allocation (and live debug variable analysis). If for example the register allocator is reusing the same physical register (ecx) for all those virtual register we end up with something like this after register allocation etc: DBG_VALUE( "x[0] is in %rcx" ) DBG_VALUE( "x[1] is in %rcx" ) DBG_VALUE( "x[2] is in %rcx" ) %rcx = instr1 DBG_VALUE( "x[0] is in %rcx" ) DBG_VALUE( "x[1] is in %rcx" ) DBG_VALUE( "x[2] is in %rcx" ) %rcx = instr2 DBG_VALUE( "x[0] is in %rcx" ) DBG_VALUE( "x[1] is in %rcx" ) DBG_VALUE( "x[2] is in %rcx" ) %rcx = instr3 DBG_VALUE( "x[0] is in %rcx" ) DBG_VALUE( "x[1] is in %rcx" ) DBG_VALUE( "x[2] is in %rcx" ) If stepping in the debugger it will look like x[0]==x[1]==x[2]==%rcx, but that is not really true. The correct code after ISel for my example above should be %vrega = instr1 DBG_VALUE( "x[0] is in %vrega" ) %vregb = instr2 DBG_VALUE( "x[1] is in %vregb" ) %vregc = instr3 DBG_VALUE( "x[2] is in %vregc" ) and that would result in code like this after register allocation etc: %rcx = instr1 DBG_VALUE( "x[0] is in %rcx" ) %rcx = instr2 DBG_VALUE( "x[1] is in %rcx" ) %rcx = instr3 DBG_VALUE( "x[2] is in %rcx" ) As seen above the amount of DBG_VALUE instructions increase after register allocation when the DBG_VALUE instructions are scheduled before the live range of the virtual register that it is associated with. In your example with "slow bytecode" we ended up with lots of DBG_VALUE instructions which slowed down the last passes in the compiler. Most of those DBG_VALUE instructions were faulty, in the sense that they would give incorrect information to the debugger. One solution is to fix the fault in ISel. Another solution could be too try to discard this kind of faulty DBG_VALUE instructions (with use before def of a virtual register) before register allocation. The latter solution could avoid faulty values when debugging (instead you'll get <optimized out>), and it could help out to avoid the slow down when compiling. However, fixing the ISel would be needed to also get richer (and still correct) debug information. I actually think that we should try to implement both solutions. Both correcting ISel and also making the later passes more robust to bad input. PS. I hope the kind of pseudo-code I used above is possible to understand. Regards, Björn (co-worker to Karl-Johan who will be on vacation for a few days)
Ok sounds good to me, thanks for the info! If y'all need anything else from our end please just let me know!
I don't think we can block the 6.0 release on this.
Here is a proposal patch to let LiveDebugVariables discard DBG_VALUE instructions that otherwise could result in faulty debug information being generated: https://reviews.llvm.org/D43956
Ok I've now also confirmed that the https://reviews.llvm.org/D43956 fixes the issue on our end as well, thanks so much Bjorn!