Created attachment 11502 [details] preprocessed source for src/GDLParser.cpp generated by Apple clang++ 5.0 The GDLParser.cpp and GDLTreeParser.cpp files from gdl-0.9.4/src are compiled extremely slowly under clang++ from Xcode 5.0.1 and llvm 3.4svn at -O2. Using the attached GDLParser.ii.bz2 with llvm 3.4svn's clang++ on x86_64-apple-darwin12, I see… % time /sw/opt/llvm-3.4/bin/clang++ -O2 -c GDLParser.ii 101.615u 0.577s 1:42.29 99.8% 0+0k 2+12io 627pf+0w % time /sw/opt/llvm-3.4/bin/clang++ -O1 -c GDLParser.ii 3.858u 0.095s 0:03.96 99.4% 0+0k 0+15io 7pf+0w These tests were done on a 2009 MacPro with dual quad-Xeon processors and 12Gb of memory.
Created attachment 11503 [details] preprocessed source for src/GDLTreeParser.cpp generated by Apple clang++ 5.0
We looked at this earlier and it seemed like most of the time was in SROA. CC'ing Chandler.
The attached GDLTreeParser.ii.bz2 test case compiles so slowly at -O2 with llvm 3.4svn to be effectively stalled. % time /sw/opt/llvm-3.4/bin/clang++ -O2 -c GDLTreeParser.ii 1311.647u 2.018s 21:53.75 99.9% 0+0k 0+11io 15pf+0w % time /sw/opt/llvm-3.4/bin/clang++ -O1 -c GDLTreeParser.ii 4.492u 0.112s 0:04.61 99.7% 0+0k 0+9io 0pf+0w
Also filed as radr://15289369
I profiled this today. Looks like the slowdown is in SROA's invocation of SSAUpdater. r149654 fixed a miscompile, but it made FindAvailableVals() quadratic in the number of basic blocks. There's probably another way to fix the miscompile that avoids this slowdown.
This is related to bug 16756. It hits the same bottleneck (as well as an unrelated one).
Still present in 3.6-rc4... % time /sw/opt/llvm-3.6.0/bin/clang++ -O2 -c GDLParser.ii 118.526u 0.561s 1:59.19 99.9% 0+0k 0+12io 748pf+0w % time /sw/opt/llvm-3.6.0/bin/clang++ -O1 -c GDLParser.ii 4.858u 0.087s 0:04.95 99.5% 0+0k 0+10io 13pf+0w
r245820 fixes the SROA issue and improves -O3 compile-time from 88s to 23s on my machine. Top5 is now: Running Time Self (ms) Symbol Name 4393.0ms 17.9% 24.0 (anonymous namespace)::JumpThreading::runOnFunction(llvm::Function&) 2768.0ms 11.3% 45.0 (anonymous namespace)::PHIElimination::runOnMachineFunction(llvm::MachineFunction&) 2635.0ms 10.7% 0.0 (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) 1356.0ms 5.5% 9.0 (anonymous namespace)::SLPVectorizer::runOnFunction(llvm::Function&) 1341.0ms 5.4% 7.0 (anonymous namespace)::GVN::runOnFunction(llvm::Function&)
PR16756 is tracking the JumpThreading regression, let's focus here on PHIElimination (11.3%)
(In reply to comment #8) > r245820 fixes the SROA issue and improves -O3 compile-time from 88s to 23s > on my machine. Any chance we can get the fix from http://llvm.org/viewvc/llvm-project?view=revision&revision=245820 back ported into the llvm 3.7 branch (after the 3.7.0 release this week) for the upcoming 3.7.1 release?
Sure!