When ranking operands for an expression tree the reassociate pass also
perform canonicalization, putting constants on the right hand side. Such
transforms was however not registered as modifying the IR. So at the end
of the pass, if not having made any other changes, the pass returned
that all analyses should be kept.
With this patch we make sure to set MadeChange to true when modifying
the IR via canonicalizeOperands. This is to make sure analyses such as
DemandedBits are properly invalidated when instructions are modified.
As part of reassociating add instructions, we may factorize some of the
adds and produce a mul instruction; this patch propagates the source
location of the reassociated tree of instructions to the new mul.
Found using https://github.com/llvm/llvm-project/pull/107279.
As part of RemoveFactorFromExpression, we attempt to remove a factor
from a mul/fmul expression; this may involve generating new
instructions, e.g. to negate the result if the factor was negative in
the original expression. When this happens, the new instructions should
have a DebugLoc set from the instruction that the factored expression is
being used to compute.
Found using https://github.com/llvm/llvm-project/pull/107279.
Currently in Reassociate we may create a set of new instructions when
optimizing an `add`, but we do not set DebugLocs on the new
instructions; this patch propagates the add's DebugLoc to the new
instructions.
Found using #107279.
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
In NegateValue in Reassociate, we return the negation of an existing
value in order to break a subtract into an negate + add, potentially
creating a new instruction to perform the negation, but we neglect to
propagate the DebugLoc of the sub being replaced to the negate
instruction if one is created. This patch adds that propagation.
Found using https://github.com/llvm/llvm-project/pull/107279.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
See the following case: https://alive2.llvm.org/ce/z/A-fBki
```
define i3 @src(i3 %0) {
%2 = mul i3 %0, %0
%3 = mul i3 %2, %0
%4 = mul i3 %3, %0
%5 = mul nsw i3 %4, %0
ret i3 %5
}
define i3 @tgt(i3 %0) {
%2 = mul i3 %0, %0
%5 = mul nsw i3 %2, %0
ret i3 %5
}
```
d7aeefebd6
introduced weight reduction during weights combination of the same
operand. As the weight of `%0` changes from 5 to 3, the nsw flag in `%5`
should be dropped.
However, the nsw flag isn't cleared by `RewriteExprTree` since `%5 = mul
nsw i3 %0, %4` is not included in the range of `[ExpressionChangedStart,
ExpressionChangedEnd)`.
```
Calculated Rank[] = 3
Combine negations for: %2 = mul i3 %0, %0
Calculated Rank[] = 4
Combine negations for: %3 = mul i3 %0, %2
Calculated Rank[] = 5
Combine negations for: %4 = mul i3 %0, %3
Calculated Rank[] = 6
Combine negations for: %5 = mul nsw i3 %0, %4
LINEARIZE: %5 = mul nsw i3 %0, %4
OPERAND: i3 %0 (1)
ADD USES LEAF: i3 %0 (1)
OPERAND: %4 = mul i3 %0, %3 (1)
DIRECT ADD: %4 = mul i3 %0, %3 (1)
OPERAND: i3 %0 (1)
OPERAND: %3 = mul i3 %0, %2 (1)
DIRECT ADD: %3 = mul i3 %0, %2 (1)
OPERAND: i3 %0 (1)
OPERAND: %2 = mul i3 %0, %0 (1)
DIRECT ADD: %2 = mul i3 %0, %0 (1)
OPERAND: i3 %0 (1)
OPERAND: i3 %0 (1)
RAIn: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RAOut after CSE reorder: mul i3 [ %0, #3] [ %0, #3] [ %0, #3]
RA: %5 = mul nsw i3 %0, %4
TO: %5 = mul nsw i3 %4, %0
RA: %4 = mul i3 %0, %3
TO: %4 = mul i3 %0, %0
```
The best way to fix this is to inform `RewriteExprTree` to clear flags
of the whole expr tree when weight reduction happens.
But I find that weight reduction based on Carmichael number never
happens in practice.
See the coverage result
https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/home/dtcxzyw/llvm-project/llvm/lib/Transforms/Scalar/Reassociate.cpp.html#L323
I think it would be better to drop `IncorporateWeight`.
Fixes#91417
We can guarantee NSW on all operands in a reassociated add expression
tree when:
- All adds in an add operator tree are NSW, AND either
- All add operands are guaranteed to be nonnegative, OR
- All adds are also NUW
- Alive2:
- Nonnegative Operands
- 3 operands: https://alive2.llvm.org/ce/z/G4XW6Q
- 4 operands: https://alive2.llvm.org/ce/z/FWcZ6D
- NUW NSW adds: https://alive2.llvm.org/ce/z/vRUxeC
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.
Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.
At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.
I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.
This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
Currently, the builtins used for implementing `va_list` handling
unconditionally take their arguments as unqualified `ptr`s i.e. pointers
to AS 0. This does not work for targets where the default AS is not 0 or
AS 0 is not a viable AS (for example, a target might choose 0 to
represent the constant address space). This patch changes the builtins'
signature to take generic `anyptr` args, which corrects this issue. It
is noisy due to the number of tests affected. A test for an upstream
target which does not use 0 as its default AS (SPIRV for HIP device
compilations) is added as well.
This patch does the following folds:
```
(select A && B, T, F) -> (select A, (select B, T, F), F)
(select A || B, T, F) -> (select A, T, (select B, T, F))
```
if `(select B, T, F)` can be folded into a value or a canonicalized SPF.
Alive2: https://alive2.llvm.org/ce/z/4Bdrbu
The original motivation of this patch is to simplify the following
pattern:
```
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%cmp9.i = icmp ugt i64 %add.i, 1152921504606846975
%or.cond.i = or i1 %cmp7.i, %cmp9.i
%cond.i = select i1 %or.cond.i, i64 1152921504606846975, i64 %add.i
->
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%max = call i64 @llvm.umax.i64(i64 %add.i, 1152921504606846975)
%cond.i = select i1 %cmp7.i, i64 1152921504606846975, i64 %max
```
The later form has a better codegen for some backends. It is also more
analysis-friendly than the original one.
Godbolt: https://godbolt.org/z/eK6eb5jf1
Alive2: https://alive2.llvm.org/ce/z/VHlxL2
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7c71d3996a72b9b024622f23bf556539b961c88c&to=638ce8666fadaca1ab2639a3c2bc52a4a8508f40&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.02%|-0.00%|+0.02%|-0.03%|-0.00%|-0.05%|-0.00%|
It is an alternative to #76203 and #76363 because we can simplify
`select (icmp eq/ne a, b), a, b` into `b` or `a`.
Fixes#75784.
Fixes#76043.
Thank @XChy for providing additional tests.
Co-authored-by: XChy <xxs_chy@outlook.com>
Original message:
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
This patch re-implements a variety of debug-info maintenence functions
to use DPValues instead of DbgValueInst's: supporting the "new"
non-intrinsic representation of debug-info. As per [0], we need to have
parallel implementations of various utilities for a time, and these are
the most fundamental utilities used throughout the compiler.
I've added --try-experimental-debuginfo-iterators to a variety of RUN
lines: this is a flag that turns on "new debug-info" if it's built into
LLVM, and not otherwise. This should ensure that we have the same
behaviour for the same IR inputs, but using a different internal
representation. For the most part these changes affect SROA/Mem2Reg
promotion of dbg.declares into dbg.value intrinsics (now DPValues),
we're leaving dbg.declares as instructions until later in the day.
There's also some salvaging changes made.
I believe the tests that I've added cover almost all the code being
updated here. The only thing I'm not confident about is SimplifyCFG,
which calls rewriteDebugUsers down a variety of code paths. Those
changes can't immediately get full coverage as an additional patch is
needed that updates handling of Unreachable instructions, will upload
that shortly.
[0]
https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939/9
Reassociation destroys nsw/nuw flags from BinOps that are changed. But if the
expression at the end of a tree that was altered, but didn't change itself,
the flags do not need to be removed. For example, if %a, %b and %c are
reassociated in
%x = add nsw i32 %a, %c
%y = add nsw i32 %x, %b
%z = add nsw i32 %y, %d
The value of %y and so add %y %d remains the same, and %z needn't drop the nsw
flags.
https://alive2.llvm.org/ce/z/_juAiV
Differential Revision: https://reviews.llvm.org/D154289
# TL;DR #
This patch constrains how much freedom the heuristic that tries to from CSE
expressions has. The added constrain is that the CSE-able expressions must be
within the same basic block as the expressions they get moved before.
# Details #
The reassociation pass currently tweaks the rewrite of the final expression
towards surfacing pairs of operands that would be CSE-able.
This heuristic applies after the regular ordering of the expression.
The regular ordering uses the program structure to choose in which order each
subexpression is materialized. That order follows the topological order.
Now, to expose more CSE opportunities, this heurisitc effectively bypasses the
previous ordering normally defined by the program and pushes up sub-expressions
that are arbitrary deep in the CFG.
E.g., let's say the program order (top to bottom) gives `((a*b)*c)*d)*e` and
`b*e` appears the most in the program. The expression will be reordered in
`(((b*e)*a)*c)*d`
This reordering implies that all the sub expressions (in this example `xx*a`,
then `yy*c`, etc.) will need to appear after the CSE-able expression.
This may over-constrain where the (sub) expressions may live and in particular
it may create loop-dependent expressions.
This patch only allows to move expressions up the expression chain when the
related values are definied in the same basic block as the ones they
"push-down".
This constrain is far for being perfect but at least it avoids accidentally
creating loop dependent variables.
If we really want to expose CSE-able expressions in a proper way, we would need
a profitability metric and also make the decision globally as opposed to one
chain at a time.
I've put the new constrain behind an option to make comparing the old and new
versions easy. However, I believe that even if we find cases where the old
version performs better it is probably by accident. What I am aiming for with
this change is more predictability, then we can improve if need be.
This fixes www.llvm.org/PR61458
Differential Revision: https://reviews.llvm.org/D147457
Should cover most of the tests for GVN, GVNHoist, GVNSink, GlobalOpt,
GlobalSplit, InstCombine, Reassociate, SROA and TailCallElim that
had not been updated earlier.
canonicalize-neg-const.ll had some issues. The script somehow decided
to delete half the run line and merge it with the example expression
(which it also deleted most of).
Another step towards getting rid of dependencies to the legacy
pass manager.
Primary change here is to just do -passes=foo instead of -foo in
simple situations (when running a single transform pass). But also
updated a few test running multiple passes.
Also removed some "duplicated" RUN lines in a few tests that where
using both -foo and -passes=foo syntax. No need to do the same kind
of testing twice.
`(A * -2**C) + B --> B - (A << C)`
https://alive2.llvm.org/ce/z/A6BWkf
This inverts what Negator was doing before:
D134310 / 0f32a5dea0e9
Analysis and codegen are generally better without multiply,
so we should favor this form even if we trade add for sub
(because those are generally equivalent cost operations).
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.
This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.
Originally part of D129660.