As noted in the comments, findBBsToSinkInto is
O(UseBBs.size() * ColdLoopBBs.size())
A very large function with a huge loop was incurring a high compile time
in this code. The size of the ColdLoopBBs set was over 14K. There is a
limit on the size of the UseBBs set, but not the ColdLoopBBs (and adding
a limit for the latter actually slowed down some later passes).
This change exits the loop early once we detect that there is no further
refinement possible for the BBsToSinkInto set. This is possible because
the ColdLoopBBs set is sorted in ascending magnitude of frequency.
This cut down the LoopSinkPass time by around 33% (78s to just over
50s).
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
This was trigerred by the debug check when comp(a, a) was called.
On line 216 it's checked that LoopBlockNumber should contain
all if there are more than 1. NFC for end users
Differential Revision: https://reviews.llvm.org/D157954
This change allows sinking defs from loop preheader with PHI-use into loop body. Loop sink can now see through PHI-use and select incoming blocks of value being used as candidate sink destination.
It makes loop sink more effective so more LICM can be undone if proven unprofitable with profile info. It addresses the motivating case in D87551, without resorting to profile guided LICM which breaks canonicalization.
This is the 2nd attempt after D152772.
This change allows sinking defs from loop preheader with PHI-use into loop body. Loop sink can now see through PHI-use and select incoming blocks of value being used as candidate sink destination.
It makes loop sink more effective so more LICM can be undone if proven unprofitable with profile info. It addresses the motivating case in D87551, without resorting to profile guided LICM which breaks canonicalization.
Differential Revision: https://reviews.llvm.org/D152772
The loop sink pass only does something if the function has
profile data. Move the check for that before analyses are fetched,
to avoid computing things like BFI or MSSA unnecessarily.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
This reverts commit 92f698f01fa0bb8967233a6391ada2d51849fa68.
The updated version of the patch includes handling for non-SCEVable
types. A test case has been added in ec86e9a99bca802.
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.
We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134614
This makes MemorySSA in LoopSink required, and removes the AST-based
implementation, as well as the related support code in LICM.
Differential Revision: https://reviews.llvm.org/D123288
LoopSink with the legacy pass manager still uses AST, because we
can't compute MemorySSA conditionally. I think now that the legacy
pass manager will be removed soon(TM) we don't need to care about
compile-time impact here anymore. Additionally, since MemorySSA is
no longer eagerly optimized, the impact is actually not that high
anymore (~0.2% geomean regression on CTMark).
This just makes legacy PM and new PM behavior line up -- as a
followup I'll drop these options entirely and make MemorySSA use
mandatory.
Differential Revision: https://reviews.llvm.org/D123216
Skip phi nodes in the preheader. They may not be considered loop
invariant by the assertion below.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D121010
Summary:
Set the default for the option enabling memory ssa use in the loop sink
pass to true for the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D92486
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
This reverts commit 562addba652e8bdabe49f9123fd92c21b7a0d640.
Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
While there is always a `Value::replaceAllUsesWith()`,
sometimes the replacement needs to be conditional.
I have only cleaned a few cases where `replaceUsesWithIf()`
could be used, to both add test coverage,
and show that it is actually useful.
Reviewers: jdoerfert, spatel, RKSimon, craig.topper
Reviewed By: jdoerfert
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, aheejin, george.burgess.iv, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65528
llvm-svn: 367548
Summary:
Make the flags in LICM + MemorySSA tuning options in the old and new
pass managers.
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60490
llvm-svn: 358772
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57967
llvm-svn: 353734
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
This patch fixes PR39695.
The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large.
Differential Revision: https://reviews.llvm.org/D54659
llvm-svn: 347325
In the PR, LoopSink was trying to sink into a catchswitch block, which
doesn't have a valid insertion point.
Differential Revision: https://reviews.llvm.org/D51307
llvm-svn: 340900
This method has three callers, each of which wanted distinct handling:
1) Sinking into a loop is moving an instruction known to execute before a loop into the loop. We don't need to worry about introducing a fault at all in this case.
2) Hoisting from a loop into a preheader already duplicated the check in the caller.
3) Sinking from the loop into an exit block was the only true user of the code within the routine. For the moment, this has just been lifted into the caller, but up next is examining the logic more carefully. Whitelisting of loads and calls - while consistent with the previous code - is rather suspicious. Either way, a behavior change is worthy of it's own patch.
llvm-svn: 338671
LoopBlockNumber is a DenseMap<BasicBlock*, int>, comparing the result of
find() will compare a pair<BasicBlock*, int>. That's of course depending
on pointer ordering which varies from run to run. Reverse iteration
doesn't find this because we're copying to a vector first.
This bug has been there since 2016 but only recently showed up on clang
selfhost with FDO and ThinLTO, which is also why I didn't manage to get
a reasonable test case for this. Add an assert that would've caught
this.
llvm-svn: 336439
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)
llvm-svn: 333954
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: kcc, pcc, danielcdh, jmolloy, sanjoy, dberlin, ruiu
Reviewed By: ruiu
Subscribers: ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D45142
llvm-svn: 330059
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331