This is partial revert of cbca9ce91c6440f8815742b8a73a27aa81e806e6.
That commit removed the code guarding against min/max SPF patterns,
because those are now canonicalized to min/max intrinsics. However,
this is only true for integer min/max, while FP min/max can not
always be canonicalized to an intrinsic. As such, restore a
simplified version of the guard that handles only the FP case.
Fixes https://github.com/llvm/llvm-project/issues/64937.
When the 'int Four = Two;' is sunk into the 'case 0:' block,
the debug value for 'Three' is set incorrectly to 'poison'.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D158171
This check is redundant as it is covered by the call to
isPotentiallyReachable.
Depends on D155726.
Differential Revision: https://reviews.llvm.org/D155718
The zext constant expression was detected by the fold, but then
handled as a sext. Use ZExtOperator instead of ZExtInst to handle
constant expressions.
Fixes https://github.com/llvm/llvm-project/issues/64669.
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.
This means that the phi operand is set to poison even if for
critical edges without an intermediate block.
There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
Improve foldOpIntoPhi() for icmp operator to check if incoming PHI value can be replaced with constant based on implied condition.
Depends on D156619.
Differential Revision: https://reviews.llvm.org/D156620
In degenerate cases, it is possible for unreachable code removal
to remove the current instruction. However, we still return the
instruction to report a change, resulting in a use after free.
Instead, perform the change reporting in the same way as
eraseInstFromFunction() does, by directly setting MadeIRChange
and returning nullptr.
Fixes https://github.com/llvm/llvm-project/issues/64235.
Fixes a regression introduced by D75362 for irreducible control
flow. In that case, we may visit the predecessor that renders
the current block live only later, and incorrectly determine
that a block is dead.
Instead, switch to using the same DeadEdges based implementation
we also use during the main InstCombine iteration.
This temporarily regresses some cases that need replacement of
dead phi operands with poison, which is currently only done during
the main run, but not worklist population. This will be addressed
in a followup, to keep it separate from the correctness fix here.
Fixes https://github.com/llvm/llvm-project/issues/64259.
InstCombine is a worklist-driven algorithm, which works roughly
as follows:
* All instructions are initially pushed to the worklist.
The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
worklist.
* When the use-count of an instruction decreases, it gets added
back to the worklist.
* And a few of other heuristics on when we should revisit
instructions.
On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.
In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:
"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,
The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.
In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.
This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.
This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.
In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).
Differential Revision: https://reviews.llvm.org/D154579
InstComine currently processes blocks in an arbitrary
depth-first order. This can break the usual invariant that the
operands of an instruction should be simplified before the
instruction itself, if uses across basic blocks (particularly
inside phi nodes) are involved.
This patch switches the initial worklist population to use RPO
instead, which will ensure that predecessors are visited before
successors (back-edges notwithstanding).
This allows us to fold more cases within a single InstCombine
iteration, in preparation for D154579. This change by itself
is a minor compile-time regression of about 0.1%, which will
be more than recovered by switching to single-iteration InstCombine.
Differential Revision: https://reviews.llvm.org/D75362
This allows us to handle dead blocks with multiple incoming edges,
where we can determine that all of those edges are dead (or cycles).
This allows InstCombine to handle certain dead code patterns that
can be produced by LoopVectorize in a single iteration.
This is in preparation for D154579.
Do not assume scalar types when folding binops of `select`
operations and `zext`/`sext` of their condition.
Reported-by: Benjins, dyung
Proofs: https://alive2.llvm.org/ce/z/GmDLns
Use ConstantFoldBinaryOpOperands() instead of ConstantExpr::get().
This will continue working with binary operands that are not
supported as constant expressions.
If the and/or operand is an immediate constant, it will get folded
away anyway. Don't try to freely invert those operands.
A particularly degenerate case of this arises when both operands
are constant and the result is a constant, in which case we try
to invert users of a constant, resulting in an assertion failure.
Fixes https://github.com/llvm/llvm-project/issues/63791.
The successor is unreachable if either this is the only edge, or
this is an edge into a loop, in which case other predecessors
don't matter. This is exactly what the edge dominance check does.
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.
Fixed clang test broken in initial land.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153815
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153815
The inserted instructions can usually be simplified. Make sure this
happens in the same InstCombine iteration by adding them to the
worklist.
We happen to get some better optimization in two cases, but this is
just a lucky accident. https://github.com/llvm/llvm-project/issues/63472
tracks implementing a fold for that case.
This doesn't track all inserted instructions yet, for that we would
also have to include those created by ObjectSizeOffsetEvaluator.
Instruction after a non-terminator unreachable are ... unreachable,
so remove them. Reuse the same logic we use for removing
instructions from dead blocks.
If `Mask` and `Amt` are not constants and `binop1` and `binop2` are
the same we can transform to:
`(binop (lshift (binop X, Y), Amt), Mask)`
If `binop` is `add`, `lshift` must be `shl`.
If `Mask` and `Amt` are constants `C` and `C1` respectively.
We can transform to:
`(lshift1 (binop1 (binop2 X, (inv_lshift1 C, C1), Y)), C1)`
Saving an instruction IFF:
`lshift1` is same opcode as `lshift2`
Either `bitwise1` and/or `bitwise2` is `and`.
Proofs(1/2): https://alive2.llvm.org/ce/z/BjN-m_
Proofs(2/2): https://alive2.llvm.org/ce/z/bZn5QB
This is to help fix the regression caused in D151807
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152568
The last use was removed by:
commit 934c82d31801e65aa3bbe99a0e64f903621c2e04
Author: Florian Hahn <flo@fhahn.com>
Date: Fri Feb 24 13:39:32 2023 +0100
Once I remove createInstructionCombiningPass, then:
InstructionCombiningPass::InstructionCombiningPass(unsigned MaxIterations)
becomes unused. Once I remove that:
InstructionCombiningPass::MaxIterations is always initialized with
InstCombineDefaultMaxIterations, so this patch does the constant
propagation and removes InstructionCombiningPass::MaxIterations as
well.
Differential Revision: https://reviews.llvm.org/D152641
We try to fold constant computeKnownBits() with context for return
instructions only. Otherwise, we rely on SimplifyDemandedBits() to
fold instructions with constant known bits.
The presence of this special fold for returns is dangerous, because
it makes our tests lie about what works and what doesn't. Tests are
usually written by returning the result we're interested in, but
will go through this separate code path that is not used for anything
else. This patch removes the special fold.
This primarily regresses patterns of the style "assume(x); return x".
The responsibility of handling such patterns lies with passes like
EarlyCSE/GVN anyway, which will do this reliably, and not just for
returns.
Differential Revision: https://reviews.llvm.org/D151099
We already do this during initial worklist population. Doing this
as part of primary combining allows us to remove instructions in
blocks that were rendered dead by condition folding within the
same instcombine iteration.
If the branch condition is undef, then behavior is undefined and
neither of the successors are live.
This is to ensure that optimization quality does not decrease
when a constant gets replaced with undef/poison in this context.
When sinking and users are dropped, add the using instructions
to the worklist, as they can likely be removed as well.
This should be NFC apart from worklist order effects.
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
The bitcast was not being added to the worklist. I could switch
this to use the IRBuilder instead, but this bitcast is not relevant
with opaque pointers anyway, so just drop it entirely.
The last use of Descale was removed on Apr 6, 2023 in commit
db6b30b1831095c216378a9df215b7c0ae6b959f.
Differential Revision: https://reviews.llvm.org/D150045
D146349 Introduced the ability to use the information from the
`select` condition to deduce constants as we folded a binop into
select. I.e if the `select` cond was `icmp eq %A, 10`, then in the
true-arm of `select`, we would be able to replace usage of `A` with
`10`.
This is broken for vectors that contain `undef` elements. I.e with
`icmp eq %A, <10, undef>`, subsituting `<10, undef>` for `A` can
result in creating a more undefined result than we otherwise would
have.
We fix the issue with simply checking if the candidate constant for
substituting may contain `undef` elements and don't do it in that
case.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149592
This patch reverts rGae739aefd7473517d3f08b5c8d08a66c7f469198 to address performance regressions reported by our [CI](https://github.com/dtcxzyw/llvm-ci/issues/137) after rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.
For example:
```
define ptr @const_gep_chain(ptr %p, i64 %a) {
%p1 = getelementptr inbounds i8, ptr %p, i64 %a
%p2 = getelementptr inbounds i8, ptr %p1, i64 1
%p3 = getelementptr inbounds i8, ptr %p2, i64 2
%p4 = getelementptr inbounds i8, ptr %p3, i64 3
ret ptr %p4
}
```
The last three GEPs will not be folded since rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.
I think it is appropriate to remove this code because there is no compile-time regression reported in our benchmarks.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149240
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
Since D146813, LICM will reassociate GEPs to expose hoisting
opportunities itself. Don't perform this transform in InstCombine,
where it is fragile because it depends on an optional LoopInfo
analysis.
Make the fold use the information present in the condition for deducing constants i.e:
```
%c = icmp eq i8 %x, 10
%s = select i1 %c, i8 3, i8 2
%r = mul i8 %x, %s
```
If we fold the `mul` into the select, on the true side we insert `10` for `%x` in the `mul`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D146349
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.
Calls to these initialization functions can simply be dropped.
Differential Revision: https://reviews.llvm.org/D145043
No need to include CallGraphSCCPass.h from the IPO/Inliner.
Also removed the include of LegacyPassManager.h in a couple of files
that do not really depend on that header file.
Differential Revision: https://reviews.llvm.org/D148083
- As the address space cast may not be valid on a specific target,
`addrspacecast` is not handled when an `alloca` is able to be replaced
with the source of memcpy/memmove. This patch addresses that by
querying a target hook on whether that address space cast is valid.
For example, on most GPU targets, the cast from a global pointer to a
generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
replacement is enhanced to handle that `addrspacecast` as well.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D147025