Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
Many profitable optimizations cannot be performed at -Oz, due to
unrotated loops. While this is worse for size (minimally), many of the
optimizations significantly reduce code size, such as memcpy
optimizations and other patterns found by loop idiom recognition.
Related discussion can be found in issue #50308.
This patch adds an experimental, backend-only flag to allow loop header
duplication, regardless of the optimization level. Downstream consumers
can experiment with this flag, and if it is profitable, we can adjust
the compiler's defaults accordingly, and expose any useful frontend
flags to opt into the new behavior.
Reverted due to failures on buildbots, where a new cl flag was placed
in the wrong file, resulting in link errors.
https://lab.llvm.org/buildbot/#/builders/198/builds/8548
This reverts commit 0b398256b3f72204ad1f7c625efe4990204e898a.
This patch adds support for printing the proposed non-instruction debug
info ("RemoveDIs") out to textual IR. This patch does not add any
bitcode support, parsing support, or documentation.
Printing of the new format is controlled by a flag added in this patch,
`--write-experimental-debuginfo`, which defaults to false. The new
format will be printed *iff* this flag is true, so whether we use the IR
format is completely independent of whether we use non-instruction debug
info during LLVM passes (which is controlled by the
`--try-experimental-debuginfo-iterators` flag).
Even with the flag disabled, some existing tests need to be updated, as this
patch causes debug intrinsic declarations to be changed in a round trip,
such that they always appear at the end of a module and have no attributes
(this has no functional change on the module).
The design of this new IR format was proposed previously on
Discourse, and any further discussion about the design can still be
contributed there:
https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491
Because loops in coroutines may have a co_await statement that
reschedules the coroutine to another thread, we cannot cache addresses
of thread-local variables obtained inside a loop by moving the
computation of thoes addresses outside a loop.
Since LLVM doesn't have a model for coroutine memory accesses, this
patch fixes this bug by disabling this optimization for coroutines in
the same way as https://reviews.llvm.org/D135550 and
https://reviews.llvm.org/D151774.
Avoid editing a range of DPValues and then remapping them. This occurs
when we try to de-duplicate dbg.values, but then re-use the same
iterator range. We can instead remap them, and then erase any
duplicates.
At the same time refactor the computation of seen-intrinsic hashes, and
account for a peculiarity of loop-rotates existing behaviour: it will
only deduplicate dbg.values that are immediately before the preheaders
branch instruction, not just any dbg.value in the preheader.
Loop-rotate manually maintains dbg.value intrinsics -- it also needs to
manually maintain the replacement for dbg.value intrinsics, DPValue
objects. For the most part this patch adds parallel implementations
using the new type Some extra juggling is needed when loop-rotate hoists
loop-invariant instructions out of the loop: the DPValues attached to
such an instruction need to get rotated but not hoisted. Exercised by
the new test function invariant_hoist in dbgvalue.ll.
There's also a "don't insert duplicate debug intrinsics" facility in
LoopRotate. The value and correctness of this isn't clear, but to
continue preserving behaviour that's now tested in the "tak_dup"
function in dbgvalue.ll.
Other things in this patch include a helper DebugVariable constructor
for DPValues, a insertDebugValuesForPHIs handler for RemoveDIs
(exercised by the new tests), and beefing up the dbg.value checking in
dbgvalue.ll to ensure that each record is tested (and that there's an
implicit check-not).
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
The formula I added to LoopRotationUtils does not produce reasonable
results if some of the branch weights are zero. Add special case
handling for this.
This fixes https://github.com/llvm/llvm-project/issues/66675
This adds code to the loop rotation transformation to ensure that the
computed block execution counts for the loop bodies are the same before
and after the transformation. This isn't always true in practice, but I
believe this is because of numeric inaccuracies in the BlockFrequency
computation.
The invariants this is modeled on and heuristic choice of 0-trip loop
amount is explained in a lenghty comment in the new
`updateBranchWeights()` function.
Differential Revision: https://reviews.llvm.org/D157462
This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
Without this, pointer IVs in loops with small constant trip counts couldn't be proven no-self-wrap. This came up in a new LSR transform, but may also benefit other SCEV consumers as well.
Differential Revision: https://reviews.llvm.org/D146596
This is required because if there is a pure loop-invariant instruction, Loop Rotation
may decide to not clone it and just hoist it instead. If SCEV has previously cached
that it was loop-variant (not being smart enough to prove invariance), we may end
up with inconsistent cache state (which may later trigger false-negative assertion
failures checking that something was invariant).
This is a conservative fix that unconditionally drops the dispositions. We could
only drop it if the hoisting has actually happened, but it should take some time
understanding whether it's safe with all other things this function does.
Differential Revision: https://reviews.llvm.org/D134167
Reviewed By: fhahn
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string. This may result in
a very large name for a loop containing many blocks. Use the getName()
function on the loop instead.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
Callbr is no longer an indirect terminator in the sense that is
relevant here (that it's successors cannot be updated). The primary
effect of this change is that callbr no longer prevents formation
of loop simplify form.
I decided to drop the isIndirectTerminator() method entirely and
replace it with isa<IndirectBrInst>() checks. I assume this method
was added to abstract over indirectbr and callbr, but it never
really caught on, and there is nothing left to abstract anymore
at this point.
Differential Revision: https://reviews.llvm.org/D129849
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115410
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
```
foo(float *p) {
__builtin_assume(__isGlobal(p));
// From there, we could assume p is a global pointer instead of a
// generic one.
}
```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
This patch fixes problems reported in PR51981.
When rotating a loop it isn't enough to just forget SCEV for that
loop nest. When rotating we might clone some instructions from the
old header into the preheader, and insert new PHI nodes to merge
values together. There could be users of the original value that are
updated to use the PHI result. And those users were not necessarily
depending on a PHI node earlier, so they weren't cleaned up when just
forgetting all SCEV:s for the loop nest. So we need to explicitly
forget those values to avoid invalid cached SCEV expressions.
Reviewed By: fhahn, mkazantsev
Differential Revision: https://reviews.llvm.org/D110813
This is enabled by default. Drop explicit uses in preparation for
removing the option.
Also drop RUN lines that are now the same (typically modulo a
-verify-memoryssa option).
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.
This is a preliminary commit for https://reviews.llvm.org/D99080
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
Change several pass sequence sensitive tests to be indifferent
to the PreserveCFGChecker by explicitly settting the option
-verify-cfg-preserved=0. It is a preparation step that allows
a redesign of PreserveCFGChecker.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D99878
Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary.
This is based on the version from the Full Restrict paches (D68511).
The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94306
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117