New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 17855 - PHIElimination takes 11.3% of the wall O3 compile time (C++ to object) (was GDLParser.cpp compiles excessively slowly on clang++ from Xcode 5 and llvm 3.4svn when compiled at -O2 or higher)
Summary: PHIElimination takes 11.3% of the wall O3 compile time (C++ to object) (was G...
Status: NEW
Alias: None
Product: new-bugs
Classification: Unclassified
Component: new bugs (show other bugs)
Version: trunk
Hardware: Macintosh MacOS X
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-08 17:36 PST by Jack Howarth
Modified: 2017-01-29 01:03 PST (History)
7 users (show)

See Also:
Fixed By Commit(s):


Attachments
preprocessed source for src/GDLParser.cpp generated by Apple clang++ 5.0 (177.41 KB, application/x-bzip2)
2013-11-08 17:36 PST, Jack Howarth
Details
preprocessed source for src/GDLTreeParser.cpp generated by Apple clang++ 5.0 (190.99 KB, application/x-bzip2)
2013-11-08 17:37 PST, Jack Howarth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Howarth 2013-11-08 17:36:26 PST
Created attachment 11502 [details]
preprocessed source for src/GDLParser.cpp generated by Apple clang++ 5.0

The GDLParser.cpp and GDLTreeParser.cpp files from gdl-0.9.4/src are compiled extremely slowly under clang++ from Xcode 5.0.1 and llvm 3.4svn at -O2. Using the attached GDLParser.ii.bz2 with llvm 3.4svn's clang++ on x86_64-apple-darwin12, I see…

% time /sw/opt/llvm-3.4/bin/clang++ -O2 -c GDLParser.ii
101.615u 0.577s 1:42.29 99.8%	0+0k 2+12io 627pf+0w

% time /sw/opt/llvm-3.4/bin/clang++ -O1 -c GDLParser.ii
3.858u 0.095s 0:03.96 99.4%	0+0k 0+15io 7pf+0w

These tests were done on a 2009 MacPro with dual quad-Xeon processors and 12Gb of memory.
Comment 1 Jack Howarth 2013-11-08 17:37:11 PST
Created attachment 11503 [details]
preprocessed source for src/GDLTreeParser.cpp generated by Apple clang++ 5.0
Comment 2 Bob Wilson 2013-11-08 17:45:46 PST
We looked at this earlier and it seemed like most of the time was in SROA.  CC'ing Chandler.
Comment 3 Jack Howarth 2013-11-08 17:47:16 PST
The attached GDLTreeParser.ii.bz2 test case compiles so slowly at -O2 with llvm 3.4svn to be effectively stalled.

% time /sw/opt/llvm-3.4/bin/clang++ -O2 -c GDLTreeParser.ii
1311.647u 2.018s 21:53.75 99.9%	0+0k 0+11io 15pf+0w

% time /sw/opt/llvm-3.4/bin/clang++ -O1 -c GDLTreeParser.ii
4.492u 0.112s 0:04.61 99.7%	0+0k 0+9io 0pf+0w
Comment 4 Jack Howarth 2013-11-08 17:47:58 PST
Also filed as radr://15289369
Comment 5 Duncan 2014-04-17 16:31:45 PDT
I profiled this today.  Looks like the slowdown is in SROA's invocation of SSAUpdater.  r149654 fixed a miscompile, but it made FindAvailableVals() quadratic in the number of basic blocks.

There's probably another way to fix the miscompile that avoids this slowdown.
Comment 6 Duncan 2014-04-17 16:52:30 PDT
This is related to bug 16756.  It hits the same bottleneck (as well as an unrelated one).
Comment 7 Jack Howarth 2015-02-24 16:59:52 PST
Still present in 3.6-rc4...

% time /sw/opt/llvm-3.6.0/bin/clang++ -O2 -c GDLParser.ii
118.526u 0.561s 1:59.19 99.9%	0+0k 0+12io 748pf+0w

% time /sw/opt/llvm-3.6.0/bin/clang++ -O1 -c GDLParser.ii
4.858u 0.087s 0:04.95 99.5%	0+0k 0+10io 13pf+0w
Comment 8 Mehdi Amini 2015-08-23 17:27:16 PDT
r245820 fixes the SROA issue and improves -O3 compile-time from 88s to 23s on my machine.

Top5 is now:

Running Time	Self (ms)		Symbol Name
4393.0ms   17.9%	24.0	 	            (anonymous namespace)::JumpThreading::runOnFunction(llvm::Function&)
2768.0ms   11.3%	45.0	 	             (anonymous namespace)::PHIElimination::runOnMachineFunction(llvm::MachineFunction&)
2635.0ms   10.7%	0.0	 	             (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&)
1356.0ms    5.5%	9.0	 	            (anonymous namespace)::SLPVectorizer::runOnFunction(llvm::Function&)
1341.0ms    5.4%	7.0	 	            (anonymous namespace)::GVN::runOnFunction(llvm::Function&)
Comment 9 Mehdi Amini 2015-08-23 17:30:05 PDT
PR16756 is tracking the JumpThreading regression, let's focus here on PHIElimination (11.3%)
Comment 10 Jack Howarth 2015-08-24 09:51:43 PDT
(In reply to comment #8)
> r245820 fixes the SROA issue and improves -O3 compile-time from 88s to 23s
> on my machine.

Any chance we can get the fix from http://llvm.org/viewvc/llvm-project?view=revision&revision=245820 back ported into the llvm 3.7 branch (after the 3.7.0 release this week) for the upcoming 3.7.1 release?
Comment 11 Mehdi Amini 2015-08-24 10:14:42 PDT
Sure!