The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
See the following case: https://alive2.llvm.org/ce/z/A-fBki
```
define i3 @src(i3 %0) {
%2 = mul i3 %0, %0
%3 = mul i3 %2, %0
%4 = mul i3 %3, %0
%5 = mul nsw i3 %4, %0
ret i3 %5
}
define i3 @tgt(i3 %0) {
%2 = mul i3 %0, %0
%5 = mul nsw i3 %2, %0
ret i3 %5
}
```
d7aeefebd6
introduced weight reduction during weights combination of the same
operand. As the weight of `%0` changes from 5 to 3, the nsw flag in `%5`
should be dropped.
However, the nsw flag isn't cleared by `RewriteExprTree` since `%5 = mul
nsw i3 %0, %4` is not included in the range of `[ExpressionChangedStart,
ExpressionChangedEnd)`.
```
Calculated Rank[] = 3
Combine negations for: %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for: %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for: %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for: %5 = mul nsw i3 %0, %4
LINEARIZE: %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND: %4 = mul i3 %0, %3 (1)
DIRECT ADD: %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND: %3 = mul i3 %0, %2 (1)
DIRECT ADD: %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND: %2 = mul i3 %0, %0 (1)
DIRECT ADD: %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut after CSE reorder: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RA: %5 = mul nsw i3 %0, %4
TO: %5 = mul nsw i3 %4, %0
RA: %4 = mul i3 %0, %3
TO: %4 = mul i3 %0, %0
```
The best way to fix this is to inform `RewriteExprTree` to clear flags
of the whole expr tree when weight reduction happens.
But I find that weight reduction based on Carmichael number never
happens in practice.
See the coverage result
https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323
I think it would be better to drop `IncorporateWeight`.
Fixes#91417
We can guarantee NSW on all operands in a reassociated add expression
tree when:
- All adds in an add operator tree are NSW, AND either
- All add operands are guaranteed to be nonnegative, OR
- All adds are also NUW
- Alive2:
- Nonnegative Operands
- 3 operands: https://alive2.llvm.org/ce/z/G4XW6Q
- 4 operands: https://alive2.llvm.org/ce/z/FWcZ6D
- NUW NSW adds: https://alive2.llvm.org/ce/z/vRUxeC
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.
Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.
At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.
I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.
This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
Currently, the builtins used for implementing `va_list` handling
unconditionally take their arguments as unqualified `ptr`s i.e. pointers
to AS 0. This does not work for targets where the default AS is not 0 or
AS 0 is not a viable AS (for example, a target might choose 0 to
represent the constant address space). This patch changes the builtins'
signature to take generic `anyptr` args, which corrects this issue. It
is noisy due to the number of tests affected. A test for an upstream
target which does not use 0 as its default AS (SPIRV for HIP device
compilations) is added as well.
This patch does the following folds:
```
(select A && B, T, F) -> (select A, (select B, T, F), F)
(select A || B, T, F) -> (select A, T, (select B, T, F))
```
if `(select B, T, F)` can be folded into a value or a canonicalized SPF.
Alive2: https://alive2.llvm.org/ce/z/4Bdrbu
The original motivation of this patch is to simplify the following
pattern:
```
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%cmp9.i = icmp ugt i64 %add.i, 1152921504606846975
%or.cond.i = or i1 %cmp7.i, %cmp9.i
%cond.i = select i1 %or.cond.i, i64 1152921504606846975, i64 %add.i
->
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%max = call i64 @llvm.umax.i64(i64 %add.i, 1152921504606846975)
%cond.i = select i1 %cmp7.i, i64 1152921504606846975, i64 %max
```
The later form has a better codegen for some backends. It is also more
analysis-friendly than the original one.
Godbolt: https://godbolt.org/z/eK6eb5jf1
Alive2: https://alive2.llvm.org/ce/z/VHlxL2
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7c71d3996a72b9b024622f23bf556539b961c88c&to=638ce8666fadaca1ab2639a3c2bc52a4a8508f40&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.02%|-0.00%|+0.02%|-0.03%|-0.00%|-0.05%|-0.00%|
It is an alternative to #76203 and #76363 because we can simplify
`select (icmp eq/ne a, b), a, b` into `b` or `a`.
Fixes#75784.
Fixes#76043.
Thank @XChy for providing additional tests.
Co-authored-by: XChy <xxs_chy@outlook.com>
Original message:
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
This patch re-implements a variety of debug-info maintenence functions
to use DPValues instead of DbgValueInst's: supporting the "new"
non-intrinsic representation of debug-info. As per [0], we need to have
parallel implementations of various utilities for a time, and these are
the most fundamental utilities used throughout the compiler.
I've added --try-experimental-debuginfo-iterators to a variety of RUN
lines: this is a flag that turns on "new debug-info" if it's built into
LLVM, and not otherwise. This should ensure that we have the same
behaviour for the same IR inputs, but using a different internal
representation. For the most part these changes affect SROA/Mem2Reg
promotion of dbg.declares into dbg.value intrinsics (now DPValues),
we're leaving dbg.declares as instructions until later in the day.
There's also some salvaging changes made.
I believe the tests that I've added cover almost all the code being
updated here. The only thing I'm not confident about is SimplifyCFG,
which calls rewriteDebugUsers down a variety of code paths. Those
changes can't immediately get full coverage as an additional patch is
needed that updates handling of Unreachable instructions, will upload
that shortly.
[0]
https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939/9
Reassociation destroys nsw/nuw flags from BinOps that are changed. But if the
expression at the end of a tree that was altered, but didn't change itself,
the flags do not need to be removed. For example, if %a, %b and %c are
reassociated in
%x = add nsw i32 %a, %c
%y = add nsw i32 %x, %b
%z = add nsw i32 %y, %d
The value of %y and so add %y %d remains the same, and %z needn't drop the nsw
flags.
https://alive2.llvm.org/ce/z/_juAiV
Differential Revision: https://reviews.llvm.org/D154289
# TL;DR #
This patch constrains how much freedom the heuristic that tries to from CSE
expressions has. The added constrain is that the CSE-able expressions must be
within the same basic block as the expressions they get moved before.
# Details #
The reassociation pass currently tweaks the rewrite of the final expression
towards surfacing pairs of operands that would be CSE-able.
This heuristic applies after the regular ordering of the expression.
The regular ordering uses the program structure to choose in which order each
subexpression is materialized. That order follows the topological order.
Now, to expose more CSE opportunities, this heurisitc effectively bypasses the
previous ordering normally defined by the program and pushes up sub-expressions
that are arbitrary deep in the CFG.
E.g., let's say the program order (top to bottom) gives `((a*b)*c)*d)*e` and
`b*e` appears the most in the program. The expression will be reordered in
`(((b*e)*a)*c)*d`
This reordering implies that all the sub expressions (in this example `xx*a`,
then `yy*c`, etc.) will need to appear after the CSE-able expression.
This may over-constrain where the (sub) expressions may live and in particular
it may create loop-dependent expressions.
This patch only allows to move expressions up the expression chain when the
related values are definied in the same basic block as the ones they
"push-down".
This constrain is far for being perfect but at least it avoids accidentally
creating loop dependent variables.
If we really want to expose CSE-able expressions in a proper way, we would need
a profitability metric and also make the decision globally as opposed to one
chain at a time.
I've put the new constrain behind an option to make comparing the old and new
versions easy. However, I believe that even if we find cases where the old
version performs better it is probably by accident. What I am aiming for with
this change is more predictability, then we can improve if need be.
This fixes www.llvm.org/PR61458
Differential Revision: https://reviews.llvm.org/D147457
Should cover most of the tests for GVN, GVNHoist, GVNSink, GlobalOpt,
GlobalSplit, InstCombine, Reassociate, SROA and TailCallElim that
had not been updated earlier.
canonicalize-neg-const.ll had some issues. The script somehow decided
to delete half the run line and merge it with the example expression
(which it also deleted most of).
Another step towards getting rid of dependencies to the legacy
pass manager.
Primary change here is to just do -passes=foo instead of -foo in
simple situations (when running a single transform pass). But also
updated a few test running multiple passes.
Also removed some "duplicated" RUN lines in a few tests that where
using both -foo and -passes=foo syntax. No need to do the same kind
of testing twice.
`(A * -2**C) + B --> B - (A << C)`
https://alive2.llvm.org/ce/z/A6BWkf
This inverts what Negator was doing before:
D134310 / 0f32a5dea0e9
Analysis and codegen are generally better without multiply,
so we should favor this form even if we trade add for sub
(because those are generally equivalent cost operations).
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.
This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.
Originally part of D129660.
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.
This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.
Fixes: #56483
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129523
In analyzing issue #56483, it was noticed that running `opt` with
`-reassociate` was missing some minor optimizations. For example,
there were cases where the running `opt` on IR with floating-point
instructions that have the `fast` flags applied, sometimes resulted in
less efficient code than the input IR (things like dead instructions
left behind, and missed reassociations). These were sometimes noted
in the test-files with TODOs, to investigate further. This commit
fixes some of these problems, removing some TODOs in the process.
FTR, I refer to these as "minor" missed optimizations, because when
running a full clang/llvm compilation, these inefficiencies are not
happening, as other passes clean that residue up. Regardless, having
cleaner IR produced by `opt`, makes assessing the quality of fixes done
in `opt` easier.
Previously some constants were not pushed to the top of the resulting
expression tree as intended by the algorithm. We can remove the logic
from simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do
that.
Differential Revision: https://reviews.llvm.org/D117302
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc297cda121a3cc8a36887a6ecbcfc68.
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.
This reverts commit 4397b7095d640f9b9426c4d0135e999c5a1de1c5.
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.
The causes of the previous build errors have been fixed in revisions
aa3e78a59fdf3b211be72f1b3221af831665e67d, and
140757bfaaa00110a92d2247a910c847e6e3bcc8
This reverts commit f40976bd01032f4905dde361e709166704581077.