Following 2d87319f06ef936233ba6aaa612da9586c427d68, this PR rewrites the
`binop_left_to_zero` rule using MIR Patterns.
The new pattern uses `GIReplaceReg` in the apply clause. According to
[MIRPatterns.rst](5b4a5cf51f/llvm/docs/GlobalISel/MIRPatterns.rst (L222)),
`GIReplaceReg` checks `canReplaceReg`, so the new apply pattern is
equivalent to the old `matchOperandIsZero` implementation.
Added tests for all the opcodes covered by this rule `(G_SHL, G_LSHR,
G_ASHR, G_SDIV, G_UDIV, G_SREM, G_UREM, G_MUL)`.
- Use `LLT::changeElementType()` instead of `LLT::changeElementSize()`
in `LegalizerHelper::lowerMinMax()` to avoid a crash in the case that
the destination type is a pointer vector;
- Reject `G_*MIN`/`G_*MAX` of pointers and pointer vectors in
`MachineVerifier`;
- Don't combine `G_SELECT`+`G_ICMP` pairs into `G_*MIN`/`G_*MAX` generic
instructions when the operands are pointers / pointer vectors.
Fixes#166556
This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.
I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.
Fixes#168495
This PR adds a new combine to the `post-legalizer-combiner` pass. The
new combine checks for vectors being unmerged and subsequently padded
with `G_IMPLICIT_DEF` values by building a new vector. If such a case is
found, the vector being unmerged is instead just concatenated with a
`G_IMPLICIT_DEF` that is as wide as the vector being unmerged.
This removes unnecessary `mov` instructions in a few places.
We want to be able to produce extr instructions post-legalization. They
are legal for scalars, acting as a funnel shift with a constant shift
amount. Unfortunately I'm not sure if there is a way currently to
represent that in the legalization rules, but it might be useful for
several operations - to be able to treat and test operands with constant
operands as legal or not.
This adds a change to the existing matchOrShiftToFunnelShift so that
AArch64 can generate such instructions post-legalization providing that
the operation is scalar and the shift amount is constant.
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
This change adds a new folding pattern, folding a G_FPEXT(G_FCONSTANT)
to a G_FCONSTANT.
To make this work on AArch64, the `G_FCONSTANT` should not be widened
due to the `G_FCONSTANT` being converted to a `G_CONSTANT`. This should
fix some other floating point combines when the `G_FCONSTANT` is widened
due to being an fp16.
The `cmpxchg` instruction has two memory orders, one for success and one
for failure.
Prior to this patch `LegalityQuery` only exposed a single memory order,
that of the success case. This meant that it was not generally possible
to legalize `cmpxchg` instructions based on their memory orders.
Add a `FailureOrdering` field to `LegalityQuery::MemDesc`; it is only
set for `cmpxchg` instructions, otherwise it is `NotAtomic`. I didn't
rename `Ordering` to `SuccessOrdering` or similar to avoid breaking
changes for out of tree targets.
The new field does not increase `sizeof(MemDesc)`, it falls into
previous padding bits due to alignment, so I'd expect there to be no
performance impact for this change.
Verified no breakage via check-llvm in build with AMDGPU, AArch64, and X86 targets
enabled.
This is a port of the SDAG DAGCombiner::combineRepeatedFPDivisors
combine that looks for multiple fdiv operations with the same divisor
and converts them to a single reciprocal fdiv and multiple fmuls. It is
currently a fairly faithful port, with some additions to make sure that
the newly created fdiv dominates all new uses. Compared to the SDAG
version it also drops some logic about splat uses which assumes no
vector fdivs and some logic about x/sqrt(x) which does not yet apply to
GISel.
So far, GlobalISel's G_PTR_ADD combines have ignored MIFlags like nuw, nusw,
and inbounds. That was in many cases unnecessarily conservative and in others
unsound, since reassociations re-used the existing G_PTR_ADD instructions
without invalidating their flags. This patch aims to improve that.
I've checked the transforms in this PR with Alive2 on corresponding middle-end
IR constructs.
A longer-term goal would be to encapsulate the logic that determines which
GEP/ISD::PTRADD/G_PTR_ADD flags can be preserved in which case, since this
occurs in similar forms in the middle end, the SelectionDAG combines, and the
GlobalISel combines here.
For SWDEV-516125.
This patch allows srem by a constant to be expanded more efficiently to
avoid the need for expensive sdiv instructions. This is the last part of
the patches which fixes#118090
Allows expand of sdiv->mul by constant combine for the general case.
Previously this was only occurring in the exact case. This is part of
the resolution to issue #118090
This patch allows urem by a constant to be expanded more efficiently to
avoid the need for expensive udiv instructions. This is part of the
resolution to issue #118090
In the pre-legalizer combiner, there exists a bug with UseVectorTruncate
match-apply optimization. When the destinations' types do not match the
vector element type of the G_UNMERGE_VALUES instruction, the resulting
collapsed truncate does not preserve original functional behavior. This
commit introduces a simple type check to ensure that the destination
types match the vector element type.
This patch dismantles G_SHUFFLE_VECTOR before lowering. The original
lowering would emit extract vector element ops. We found that by using
unmerged values the build vector op combine could find ways to fold.
Only enabled on AMDGPU.
This resolves#123631
The SelectionDAG Isel supports the both version of combines mentioned
below :
```
select Cond, Pow2, 0 --> (zext Cond) << log2(Pow2)
select Cond, 0, Pow2 --> (zext !Cond) << log2(Pow2)
```
The GlobalIsel for now only supports the first one defined in it's
generic combinerHelper.cpp. This patch adds the missing second one.
There are a number of backends (specifically AArch64, AMDGPU, Mips, and
RISCV) which contain a “TODO: make CombinerHelper methods const”
comment. This PR does just that and makes all of the CombinerHelper
methods const, removes the TODO comments and makes the associated
instances const. This change makes some sense because the CombinerHelper
class simply modifies the state of _other_ objects to which it holds
pointers or references.
Note that AMDGPU contains an identical comment for an instance of
AMDGPUCombinerHelper (a subclass of CombinerHelper). I deliberately
haven’t modified the methods of that class in order to limit the scope
of the change. I’m happy to do so either now or as a follow-up.
In case both LeftHandInst and RightHandInst are IMPLICIT_DEF with no input
operands, this patch protects against the post-legalizer-combiner
matchHoistLogicOpWithSameOpcodeHands with no operands. The
prelegalizercombiner-hoist-same-hands.mir test was cleaned up a little in the
process, and has a post-legalizer run line added so that the implicit_def do
not get folded awwy.
Add `ICmpInst::compare()` overload accepting `KnownBits`, similar to the
existing one accepting `APInt`. This is not directly part of KnownBits
(or APInt) for layering reasons.
The increase in fallbacks that was previously reported were not caused
by this change.
Original description:
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
This fixes a bug that started triggering after #111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.
%SrcReg = load..
%DstReg = sign_extend_inreg %SrcReg
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
Combine is needed to clear redundant ANDs with 1 that will be
created by reg-bank-select to clean-up high bits in register.
Fix replaceRegWith from CombinerHelper:
If copy had to be inserted, first create copy then delete MI.
If MI is deleted first insert point is not valid.