This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
expanding into a stack store and reload.
We can use a small amount of integer arithmetic to round FP32 to BF16
and extend BF16 to FP32.
While a number of operations still require promotion, this can be
reduced for some rather simple operations like abs, copysign, fneg but
these can be done in a follow-up.
A few neat optimizations are implemented:
- round-inexact-to-odd is used for F64 to BF16 rounding.
- quieting signaling NaNs for f32 -> bf16 tries to detect if a prior
operation makes it unnecessary.
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.
A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.
Closes#75662
Global ISel now selects `UMULL` and `UMULL2` instructions.
G_MUL instruction with input operands coming from `SEXT` or `ZEXT`
operations are turned into UMULL
G_MUL instructions with v2s64 result type is always scalarised except:
`mul ( unmerge( ext ), unmerge( ext ))`
So the extend could be unmerged and fold away the unmerge in the middle:
`mul ( unmerge( ext ), unmerge( ext ))` =>
`mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge
))))` =>
`mul ( ext(unmerge)), ( ext( unmerge ))) `
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.
It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.
It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.
- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
- Many fields have been moved from derived classes into that class.
- It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.
Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.
Depends on D158710
Reviewed By: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D158713
This allows us to select G_SHUFFLE_VECTOR with identity masks (possibly
including undef elements), but avoid the actual EXT instruction if the shift
amount is 0.
It's the only combine (AFAIK) that didn't use an apply function.
There is no reason for it to mutate instructions in the matcher, so split it up.
Reviewed By: aemerson, arsenm
Differential Revision: https://reviews.llvm.org/D154947
Only a few minor test changes needed because I removed the "helper" suffix from the combiner name, as it's not really a helper anymore but more like the implementation itself.
Depends on D153757
NOTE: This would land iff D153757 (RFC) lands too.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D153850
There is no case where those functions return false. It's always return true.
Even if they were to return false, it's not really something we should rely on I think.
With the current combiner implementation, it would just make `tryCombineAll` return false without retrying anymore rules.
I also believe that if an applyer were to return false, it would mean that the match function is not good enough. Asserting on failure in an apply function is a better idea, IMO.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153619
This adds v4f16 and v8f16 lowering for fp16 vector compares. It splits the
getActionDefinitionsBuilder of G_FCMP from G_ICMP, as they are quite different
operations, and adds fp16 vector lowering.
Differential Revision: https://reviews.llvm.org/D147947
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.
The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.
This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.
This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.
Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.
Differential Revision: https://reviews.llvm.org/D135044
This is a port of an existing optimization in AArch64 ISelLowering, handling
a case when the same input vector can be used for both ext inputs.
Differential Revision: https://reviews.llvm.org/D134891
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT
Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.
Relevant matchers now return both DefVReg and APInt/APFloat.
Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant
In other places use integer only constant match.
Differential Revision: https://reviews.llvm.org/D104409
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector
The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
same number of elements, then use LLT::vector(OtherTy.getElementCount())
or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
or operator*. That is because there is no reason to specifically restrict
the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
just use fixed_vector. This will need to be fixed up in the future when
modifying the algorithm to also work for scalable vectors, and will need
then need additional tests to confirm the behaviour works the same for
scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
this is replaced by LLT::scalable_vector.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104451
This is roughly equivalent to the floating point portion of
`AArch64TargetLowering::LowerVSETCC`. Main part that's missing is the v4s16 bit.
This also adds helpers equivalent to `EmitVectorComparison`, and
`changeVectorFPCCToAArch64CC`. This moves `changeFCMPPredToAArch64CC` out of
the selector into AArch64GlobalISelUtils for the sake of code reuse.
This is done in post-legalizer lowering with pseudos to simplify selection.
The imported patterns end up handling selection for us this way.
Differential Revision: https://reviews.llvm.org/D101782
This adds support for swapping comparison operands when it may introduce new
folding opportunities.
This is roughly the same as the code added to AArch64ISelLowering in
162435e7b5e026b9f988c730bb6527683f6aa853.
For an example of a testcase which exercises this, see
llvm/test/CodeGen/AArch64/swap-compare-operands.ll
(Godbolt for that testcase: https://godbolt.org/z/43WEMb)
The idea behind this is that sometimes, we may be able to fold away, say, a
shift or extend in a compare by swapping its operands.
e.g. in the case of this compare:
```
lsl x8, x0, #1
cmp x8, x1
cset w0, lt
```
The following is equivalent:
```
cmp x1, x0, lsl #1
cset w0, gt
```
Most of the code here is just a reimplementation of what already exists in
AArch64ISelLowering.
(See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.)
Note that most of the AND code in the testcase doesn't actually fold. It seems
like we're missing selection support for that sort of fold right now, since SDAG
happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the
original .ll testcase)
Differential Revision: https://reviews.llvm.org/D89422
For <2 x s32>, we can use G_DUPLANE32, but with a <4 x s32> source. To make it
work, we can just widen the original source with a concat_vectors.
Doing this allows <2 x float> indexed fmul instruction selection patterns to
fire, which gives a nice 0.3% code size saving on Bullet with -Os.
Differential Revision: https://reviews.llvm.org/D98059
If we have
```
%vec = G_BUILD_VECTOR %reg, %reg, ..., %reg
```
Then lower it to
```
%vec = G_DUP %reg
```
Also update the selector to handle constant splats on G_DUP.
This will not combine when the splat is all zeros or ones. Tablegen-imported
patterns rely on these being G_BUILD_VECTOR.
Minor code size improvements on CTMark at -Os.
Also adds some utility functions to make it a bit easier to recognize splats,
and an AArch64-specific splat helper.
Differential Revision: https://reviews.llvm.org/D97731
The order in which the nested calls to Builder.buildWhatever are
evaluated in differs between GCC and Clang.
This caused a bot failure because the MIR in the testcase was
coming out in a different order than expected.
Rather than using nested calls, pull them out in order to fix the
order of evaluation.
This reverts commit 867e379c0e14527eb7aa68485a10324693e35f5d.
For some reason this is upsetting Linux/Windows bots. Reverting while I try to
reproduce.
Match a G_SHUFFLE_VECTOR with a mask that allows it to be represented as a
G_INSERT_VECTOR_ELT and a G_EXTRACT_VECTOR_ELT.
This ports `isINSMask` from AArch64ISelLowering and the portion of
`AArch64TargetLowering::LowerVECTOR_SHUFFLE` which handles the equivalent
transformation.
This provides more opportunities for matching DUP. We don't have all of the
necessary combines to actually make DUP out of these yet, but this is better for
size than the full TBL expansion for G_SHUFFLE_VECTOR.
This is a -0.1% code size improvement on CTMark/Bullet at -Os.
IR example: https://godbolt.org/z/sdcevT
Differential Revision: https://reviews.llvm.org/D97214
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.
Add
- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.
Also update a few places which use idioms related to the new matchers.
Differential Revision: https://reviews.llvm.org/D91397
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).
The pattern matching code has been simply moved to postlegalize lowering.
Differential Revision: https://reviews.llvm.org/D90820
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.
This
- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.
- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)
- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.
- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.
Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.
Differential Revision: https://reviews.llvm.org/D89823
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)
It still makes sense to select these instructions at -O0.
Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.
This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.
Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.
And also add a 'r' which somehow got dropped from a bunch of function names.
https://reviews.llvm.org/D89820