Currently, the builtins used for implementing `va_list` handling
unconditionally take their arguments as unqualified `ptr`s i.e. pointers
to AS 0. This does not work for targets where the default AS is not 0 or
AS 0 is not a viable AS (for example, a target might choose 0 to
represent the constant address space). This patch changes the builtins'
signature to take generic `anyptr` args, which corrects this issue. It
is noisy due to the number of tests affected. A test for an upstream
target which does not use 0 as its default AS (SPIRV for HIP device
compilations) is added as well.
This patch does the following folds:
```
(select A && B, T, F) -> (select A, (select B, T, F), F)
(select A || B, T, F) -> (select A, T, (select B, T, F))
```
if `(select B, T, F)` can be folded into a value or a canonicalized SPF.
Alive2: https://alive2.llvm.org/ce/z/4Bdrbu
The original motivation of this patch is to simplify the following
pattern:
```
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%cmp9.i = icmp ugt i64 %add.i, 1152921504606846975
%or.cond.i = or i1 %cmp7.i, %cmp9.i
%cond.i = select i1 %or.cond.i, i64 1152921504606846975, i64 %add.i
->
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%max = call i64 @llvm.umax.i64(i64 %add.i, 1152921504606846975)
%cond.i = select i1 %cmp7.i, i64 1152921504606846975, i64 %max
```
The later form has a better codegen for some backends. It is also more
analysis-friendly than the original one.
Godbolt: https://godbolt.org/z/eK6eb5jf1
Alive2: https://alive2.llvm.org/ce/z/VHlxL2
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7c71d3996a72b9b024622f23bf556539b961c88c&to=638ce8666fadaca1ab2639a3c2bc52a4a8508f40&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.02%|-0.00%|+0.02%|-0.03%|-0.00%|-0.05%|-0.00%|
It is an alternative to #76203 and #76363 because we can simplify
`select (icmp eq/ne a, b), a, b` into `b` or `a`.
Fixes#75784.
Fixes#76043.
Thank @XChy for providing additional tests.
Co-authored-by: XChy <xxs_chy@outlook.com>
Original message:
We still have to keep the noCommonBitsSet call to handle multiple reassociations in one pass. We'll lose the flag on the first reassociation.
This patch re-implements a variety of debug-info maintenence functions
to use DPValues instead of DbgValueInst's: supporting the "new"
non-intrinsic representation of debug-info. As per [0], we need to have
parallel implementations of various utilities for a time, and these are
the most fundamental utilities used throughout the compiler.
I've added --try-experimental-debuginfo-iterators to a variety of RUN
lines: this is a flag that turns on "new debug-info" if it's built into
LLVM, and not otherwise. This should ensure that we have the same
behaviour for the same IR inputs, but using a different internal
representation. For the most part these changes affect SROA/Mem2Reg
promotion of dbg.declares into dbg.value intrinsics (now DPValues),
we're leaving dbg.declares as instructions until later in the day.
There's also some salvaging changes made.
I believe the tests that I've added cover almost all the code being
updated here. The only thing I'm not confident about is SimplifyCFG,
which calls rewriteDebugUsers down a variety of code paths. Those
changes can't immediately get full coverage as an additional patch is
needed that updates handling of Unreachable instructions, will upload
that shortly.
[0]
https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939/9
Reassociation destroys nsw/nuw flags from BinOps that are changed. But if the
expression at the end of a tree that was altered, but didn't change itself,
the flags do not need to be removed. For example, if %a, %b and %c are
reassociated in
%x = add nsw i32 %a, %c
%y = add nsw i32 %x, %b
%z = add nsw i32 %y, %d
The value of %y and so add %y %d remains the same, and %z needn't drop the nsw
flags.
https://alive2.llvm.org/ce/z/_juAiV
Differential Revision: https://reviews.llvm.org/D154289
# TL;DR #
This patch constrains how much freedom the heuristic that tries to from CSE
expressions has. The added constrain is that the CSE-able expressions must be
within the same basic block as the expressions they get moved before.
# Details #
The reassociation pass currently tweaks the rewrite of the final expression
towards surfacing pairs of operands that would be CSE-able.
This heuristic applies after the regular ordering of the expression.
The regular ordering uses the program structure to choose in which order each
subexpression is materialized. That order follows the topological order.
Now, to expose more CSE opportunities, this heurisitc effectively bypasses the
previous ordering normally defined by the program and pushes up sub-expressions
that are arbitrary deep in the CFG.
E.g., let's say the program order (top to bottom) gives `((a*b)*c)*d)*e` and
`b*e` appears the most in the program. The expression will be reordered in
`(((b*e)*a)*c)*d`
This reordering implies that all the sub expressions (in this example `xx*a`,
then `yy*c`, etc.) will need to appear after the CSE-able expression.
This may over-constrain where the (sub) expressions may live and in particular
it may create loop-dependent expressions.
This patch only allows to move expressions up the expression chain when the
related values are definied in the same basic block as the ones they
"push-down".
This constrain is far for being perfect but at least it avoids accidentally
creating loop dependent variables.
If we really want to expose CSE-able expressions in a proper way, we would need
a profitability metric and also make the decision globally as opposed to one
chain at a time.
I've put the new constrain behind an option to make comparing the old and new
versions easy. However, I believe that even if we find cases where the old
version performs better it is probably by accident. What I am aiming for with
this change is more predictability, then we can improve if need be.
This fixes www.llvm.org/PR61458
Differential Revision: https://reviews.llvm.org/D147457
Should cover most of the tests for GVN, GVNHoist, GVNSink, GlobalOpt,
GlobalSplit, InstCombine, Reassociate, SROA and TailCallElim that
had not been updated earlier.
canonicalize-neg-const.ll had some issues. The script somehow decided
to delete half the run line and merge it with the example expression
(which it also deleted most of).
Another step towards getting rid of dependencies to the legacy
pass manager.
Primary change here is to just do -passes=foo instead of -foo in
simple situations (when running a single transform pass). But also
updated a few test running multiple passes.
Also removed some "duplicated" RUN lines in a few tests that where
using both -foo and -passes=foo syntax. No need to do the same kind
of testing twice.
`(A * -2**C) + B --> B - (A << C)`
https://alive2.llvm.org/ce/z/A6BWkf
This inverts what Negator was doing before:
D134310 / 0f32a5dea0e9
Analysis and codegen are generally better without multiply,
so we should favor this form even if we trade add for sub
(because those are generally equivalent cost operations).
As shown in the examples in issue #57683, we allow matching
vectors with poison (undef) in this transform (and possibly more),
but we can't then use the partially defined value as a replacement
value in other expressions blindly.
This seems to be avoided in simpler examples of reassociation,
and other passes should be able to clean up the redundant op
seen in these tests.
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.
Originally part of D129660.
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.
This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.
Fixes: #56483
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129523
In analyzing issue #56483, it was noticed that running `opt` with
`-reassociate` was missing some minor optimizations. For example,
there were cases where the running `opt` on IR with floating-point
instructions that have the `fast` flags applied, sometimes resulted in
less efficient code than the input IR (things like dead instructions
left behind, and missed reassociations). These were sometimes noted
in the test-files with TODOs, to investigate further. This commit
fixes some of these problems, removing some TODOs in the process.
FTR, I refer to these as "minor" missed optimizations, because when
running a full clang/llvm compilation, these inefficiencies are not
happening, as other passes clean that residue up. Regardless, having
cleaner IR produced by `opt`, makes assessing the quality of fixes done
in `opt` easier.
Previously some constants were not pushed to the top of the resulting
expression tree as intended by the algorithm. We can remove the logic
from simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do
that.
Differential Revision: https://reviews.llvm.org/D117302
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc297cda121a3cc8a36887a6ecbcfc68.
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.
This reverts commit 4397b7095d640f9b9426c4d0135e999c5a1de1c5.
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.
The causes of the previous build errors have been fixed in revisions
aa3e78a59fdf3b211be72f1b3221af831665e67d, and
140757bfaaa00110a92d2247a910c847e6e3bcc8
This reverts commit f40976bd01032f4905dde361e709166704581077.
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic
operations with two or more non-const operands; this includes the GetElementPtr
instruction, and most Binary Operator instructions. These salvages produce
DIArgList locations and are only valid for dbg.values, as currently variadic
DIExpressions must use DW_OP_stack_value. This functionality is also only added
for salvageDebugInfoForDbgValues; other functions that directly call
salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be
updated in a later patch.
Differential Revision: https://reviews.llvm.org/D91722
As discussed in:
https://llvm.org/PR49055
We invert instcombine's add->or transform here
because it makes it easier to identify factorization
transforms like the mul in the motivating test.
This extends the logic added with:
https://reviews.llvm.org/rG70472f3https://reviews.llvm.org/rG93f3d7f
(I intentionally kept the formatting fix in this patch
to provide more context about the calling logic.)
With the addition of the `willreturn` attribute, functions that may
not return (e.g. due to an infinite loop) are well defined, if they are
not marked as `willreturn`.
This patch updates `wouldInstructionBeTriviallyDead` to not consider
calls that may not return as dead.
This patch still provides an escape hatch for intrinsics, which are
still assumed as willreturn unconditionally. It will be removed once
all intrinsics definitions have been reviewed and updated.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94106
As Wei Mi is reporting in post-commit review
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.
For now, simply exclude any such `or` that appears to be a root of
load widening idiom from the or->add transformation.
Note that the heuristic is greedy, it doesn't ensure that loads
can *actually* be widened into a single load.
As Wei Mi is reporting in post-commit review:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.
While that's great, it prevents the actual load widening in backend,
and that is not good. We should have load widening in middle-end,
but for now we should at least not regress the naive patterns..
This is slightly better compile-time wise,
since we avoid potentially-costly knownbits analysis that will
ultimately not allow us to actually do anything with said `add`.
InstCombine is quite aggressive in doing the opposite transform,
folding `add` of operands with no common bits set into an `or`,
and that not many things support that new pattern..
In this case, teaching Reassociate about it is easy,
there's preexisting art for `sub`/`shl`:
just convert such an `or` into an `add`:
https://rise4fun.com/Alive/Xlyv
The two tests rely on LegacyInlinerBase::doFinalization to remove
Function::isDefTriviallyDead() functions. The new PM does not have the behavior.
The -NEXT patterns checking the emptiness are actually sufficient.
Note, reassociate-deadinst.ll has become stale - it no longer catches
the problem r285380 intended. Unfortunately it is difficult to craft a
new test because it is actually pretty difficult to break it with
`MadeChange = true;` all over the file.
This pass is like DeadCodeEliminationPass, but only does one pass
through a function instead of iterating on users of eliminated
instructions.
DeadCodeEliminationPass should be used in all cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87933
As discussed in D86843, -earlycse-debug-hash should be used in more regression
tests to catch inconsistency between the hashing and the equivalence check.
Differential Revision: https://reviews.llvm.org/D86863
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D84694