LargeBlockInfo, we can now dramatically simplify their implementation
and speed them up at the same time. Now the code has time proportional
to the number of uses of the alloca, not the size of the block.
This also eliminates code that tried to batch up different allocas which
are used in the same blocks, and eliminates the 'retry list' logic which
was baroque and no unneccesary. In addition to being a speedup for crazy
cases, this is also a nice cleanup:
PromoteMemoryToRegister.cpp | 270 +++++++++++++++-----------------------------
1 file changed, 96 insertions(+), 174 deletions(-)
llvm-svn: 58229
a trivial dense map. Use this in RewriteSingleStoreAlloca to
avoid aggressively rescanning blocks over and over again. This
fixes PR2925, speeding up mem2reg on the testcase in that bug
from 4.56s to 0.02s in a debug build on my machine.
llvm-svn: 58227
In the old way, we computed and inserted phi nodes for the whole IDF of
the definitions of the alloca, then computed which ones were dead and
removed them.
In the new method, we first compute the region where the value is live,
and use that information to only insert phi nodes that are live. This
eliminates the need to compute liveness later, and stops the algorithm
from inserting a bunch of phis which it then later removes.
This speeds up the testcase in PR1432 from 2.00s to 0.15s (14x) in a
release build and 6.84s->0.50s (14x) in a debug build.
llvm-svn: 40825
stored value was a non-instruction value. Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
llvm-svn: 40813
1. Check for revisiting a block before checking domination, which is faster.
2. If the stored value isn't an instruction, we don't have to check for domination.
3. If we have a value used in the same block more than once, make sure to remove the
block from the UsingBlocks vector. Not doing so forces us to go through the slow
path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432. This speeds it up from 2.90s to 2.20s (31%)
llvm-svn: 40811
a using block from the list if we handle it. Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
llvm-svn: 40809
BBNumbers. Instead of using a bi-directional mapping, just use a single
densemap. This speeds up mem2reg on 176.gcc by 8%, from 1.3489 to
1.2485s.
llvm-svn: 33940
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
llvm-svn: 27984
has a single def. In this case, look for uses that are dominated by the def
and attempt to rewrite them to directly use the stored value.
This speeds up mem2reg on these values and reduces the number of phi nodes
inserted. This should address PR665.
llvm-svn: 24411
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
llvm-svn: 22664
The optimization for locally used allocas was not safe for allocas that
were read before they were written. This change disables that optimization
in that case.
llvm-svn: 22318
unneccesary. This allows us to delete several hundred phi nodes of the
form PHI(x,x,x,undef) from 253.perlbmk and probably other programs as well.
This implements Mem2Reg/UndefValuesMerge.ll
llvm-svn: 17098
whose addresses where used by trivial phi nodes and select instructions. This
is now performed by the instcombine pass, which is more powerful, is much
simpler, and is faster. This allows the deletion of a bunch of code, two
FIXME's and two gotos.
llvm-svn: 16406
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
llvm-svn: 16137
non-deterministic things like the ordering of blocks in the dominance
frontier of a BB. Unfortunately, I don't know of a better way to solve
this problem than to explicitly sort the BB's in function-order before
processing them. This is guaranteed to slow the pass down a bit, but
is absolutely necessary to get usable diffs between two different tools
executing the mem2reg or scalarrepl pass.
Before this, bazillions of spurious diff failures occurred all over the
place due to the different order of processing PHIs:
- %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.0, uint 0, uint 0
+ %tmp.111 = getelementptr %struct.Connector_struct* %upcon.0.1, uint 0, uint 0
Now, the diffs match.
llvm-svn: 14244
"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
llvm-svn: 8884
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
llvm-svn: 8881
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
llvm-svn: 8863
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
llvm-svn: 8861
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
llvm-svn: 8858
* Eliminate the KillList instance variable, instead, just delete loads and
stores as they are "renamed", and delete allocas when they are done
* Make the 'visited' set an instance variable to avoid passing it on the stack.
llvm-svn: 8857
* Make Mem2Reg assign version numbers now for renamed variables instead of
.mem2reg suffixes. This produces what people think of as SSA.
llvm-svn: 5771
* Renamed StatisticReporter.h/cpp to Statistic.h/cpp
* Broke constructor to take two const char * arguments instead of one, so
that indendation can be taken care of automatically.
* Sort the list by pass name when printing
* Make sure to print all statistics as a group, instead of randomly when
the statistics dtors are called.
* Updated ProgrammersManual with new semantics.
llvm-svn: 4002
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Add support for different "PassType's"
* Add new RegisterOpt/RegisterAnalysis templates for registering passes that
are to show up in opt or analyze
* Register Analyses now
* Change optimizations to use RegisterOpt instead of RegisterPass
* Remove getPassName implementations from various subclasses
llvm-svn: 3113
PromoteInstance. Make them local variables that are passed around as
appropriate. Especially in the case of CurrentValue, this makes the
code simpler.
llvm-svn: 2374
- Rename runOnMethod to runOnFunction
* Transform getAnalysisUsageInfo into getAnalysisUsage
- Method is now const
- It now takes one AnalysisUsage object to fill in instead of 3 vectors
to fill in
- Pass's now specify which other passes they _preserve_ not which ones
they modify (be conservative!)
- A pass can specify that it preserves all analyses (because it never
modifies the underlying program)
* s/Method/Function/g in other random places as well
llvm-svn: 2333