45 Commits

Author SHA1 Message Date
David Green
ac321cbb03
[AArch64][GlobalISel] Legalize Insert vector element (#81453)
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
  TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
  make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
  use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
  expanding into a stack store and reload.
2024-04-08 08:44:13 +01:00
Tuan Chuong Goh
13a78fd1ac [AArch64][GlobalISel] Re-commit Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038)
Legalize smaller/larger than legal vectors with i8 and i16 element sizes.
Vectors with elements smaller than i8 will get widened to i8 elements.
2024-03-04 15:03:55 +00:00
David Majnemer
3dd6750027 [AArch64] Add more complete support for BF16
We can use a small amount of integer arithmetic to round FP32 to BF16
and extend BF16 to FP32.

While a number of operations still require promotion, this can be
reduced for some rather simple operations like abs, copysign, fneg but
these can be done in a follow-up.

A few neat optimizations are implemented:
- round-inexact-to-odd is used for F64 to BF16 rounding.
- quieting signaling NaNs for f32 -> bf16 tries to detect if a prior
  operation makes it unnecessary.
2024-03-03 22:39:50 +00:00
chuongg3
a7cfff8dc6
[AArch64][GlobalISel] Lower Shuffle Vector to REV (#79591)
Add lowering for i16 and i32 vectors for Shuffle Vector instructions
with REV mask
2024-01-28 20:35:02 +00:00
David Green
c0931d4950 [AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.

A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.

Closes #75662
2023-12-21 09:22:23 +00:00
chuongg3
45f51f9f7c
[AArch64][GlobalISel] Select UMULL instruction (#65469)
Global ISel now selects `UMULL` and `UMULL2` instructions.
G_MUL instruction with input operands coming from `SEXT` or `ZEXT`
operations are turned into UMULL

G_MUL instructions with v2s64 result type is always scalarised except: 
`mul ( unmerge( ext ), unmerge( ext ))` 

So the extend could be unmerged and fold away the unmerge in the middle:
`mul ( unmerge( ext ), unmerge( ext ))` =>
`mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge
))))` =>
`mul ( ext(unmerge)), ( ext( unmerge ))) `
2023-09-25 09:34:51 +01:00
Vladislav Dzhidzhoev
13b7629a58 [GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v).
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.

It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
2023-09-05 16:14:44 +02:00
Vladislav Dzhidzhoev
7eeeeb0cc9 Revert "[GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v)."
This reverts commit 6b37a65264bb4e7d400d5283a65f9e8e1575f2d7.
Accindentally pushed before squashing.
2023-09-05 16:13:27 +02:00
Vladislav Dzhidzhoev
bb1a03df47 Addressed @aemerson comments 2023-09-05 16:00:49 +02:00
Vladislav Dzhidzhoev
0e826f0e6d Refactored, added MIR test. 2023-09-05 16:00:48 +02:00
Vladislav Dzhidzhoev
6b37a65264 [GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v).
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.

It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
2023-09-05 16:00:48 +02:00
pvanhout
aaf6755631 [GlobalISel] Refactor Combiner API
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.

- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
    - Many fields have been moved from derived classes into that class.
    - It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.

Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.

Depends on D158710

Reviewed By: aemerson, dsanders

Differential Revision: https://reviews.llvm.org/D158713
2023-09-05 08:19:05 +02:00
David Green
a047dfe0d5 [AArch64][GISel] Lower EXT of 0 to a COPY
This allows us to select G_SHUFFLE_VECTOR with identity masks (possibly
including undef elements), but avoid the actual EXT instruction if the shift
amount is 0.
2023-08-16 17:12:15 +01:00
David Green
bbe945b8a1 [AArch64][GISel] Expand G_DUP and G_DUPLANE to v8s8 and v4s16
This fills in the gaps with v8s8 and v4s8 vectors for G_DUP and G_DUPLANE,
using the existing code that is generalized to more types.
2023-08-04 12:43:53 +01:00
pvanhout
af67b6760b [AArch64] Split lowerVectorFCMP combine
It's the only combine (AFAIK) that didn't use an apply function.
There is no reason for it to mutate instructions in the matcher, so split it up.

Reviewed By: aemerson, arsenm

Differential Revision: https://reviews.llvm.org/D154947
2023-07-12 13:13:37 +02:00
pvanhout
655714a300 [AArch64] Use GlobalISel MatchTable Combiner Backend
Only a few minor test changes needed because I removed the "helper" suffix from the combiner name, as it's not really a helper anymore but more like the implementation itself.

Depends on D153757

NOTE: This would land iff D153757 (RFC) lands too.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D153850
2023-07-11 11:27:14 +02:00
pvanhout
5eb8cb0949 [NFC][GlobalISel] Don't return bool from apply functions
There is no case where those functions return false. It's always return true.
Even if they were to return false, it's not really something we should rely on I think.
With the current combiner implementation, it would just make `tryCombineAll` return false without retrying anymore rules.

I also believe that if an applyer were to return false, it would mean that the match function is not good enough. Asserting on failure in an apply function is a better idea, IMO.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153619
2023-06-26 09:23:58 +02:00
David Green
c663b2576f [AArch64][GISel] Add FP16 fcmp lowering
This adds v4f16 and v8f16 lowering for fp16 vector compares. It splits the
getActionDefinitionsBuilder of G_FCMP from G_ICMP, as they are quite different
operations, and adds fp16 vector lowering.

Differential Revision: https://reviews.llvm.org/D147947
2023-04-17 17:22:46 +01:00
Tim Northover
6b98824a58 AArch64: emit fcmp ord %a, zeroinitializer as a single fcmeq.
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
2022-12-07 19:17:30 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Kazu Hirata
f0105ee968 [GISel] Use std::optional in AArch64PostLegalizerLowering.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:20:07 -08:00
Amara Emerson
8055aa8e8a [AArch64][GlobalISel] Make vector G_SEXT_INREG legal and allow combining.
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.

The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.

This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
2022-10-05 00:28:08 +01:00
Amara Emerson
3daf7ddaef [GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo.
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.

This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.

Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.

Differential Revision: https://reviews.llvm.org/D135044
2022-10-03 07:36:18 +01:00
Amara Emerson
7653586d88 [AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT.
This is a port of an existing optimization in AArch64 ISelLowering, handling
a case when the same input vector can be used for both ext inputs.

Differential Revision: https://reviews.llvm.org/D134891
2022-09-29 22:53:24 +01:00
Kazu Hirata
b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Kazu Hirata
3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Petar Avramovic
d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Amara Emerson
56a6686e0c [AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128.
We don't support truncating s128 stores, so don't form them.
2021-07-20 00:04:34 -07:00
Sander de Smalen
c9acd2f32e [GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104453
2021-06-25 15:54:00 +01:00
Sander de Smalen
d5e14ba88c [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
Amara Emerson
ae2b36e8bd [AArch64][GlobalISel] Support truncstorei8/i16 w/ combine to form truncating G_STOREs.
This needs some tablegen changes so that we can actually import the patterns properly.

Differential Revision: https://reviews.llvm.org/D102204
2021-05-11 11:33:03 -07:00
Jessica Paquette
79be9c59c6 [AArch64][GlobalISel] Add post-legalizer lowering for NEON vector fcmps
This is roughly equivalent to the floating point portion of
`AArch64TargetLowering::LowerVSETCC`. Main part that's missing is the v4s16 bit.

This also adds helpers equivalent to `EmitVectorComparison`, and
`changeVectorFPCCToAArch64CC`. This moves `changeFCMPPredToAArch64CC` out of
the selector into AArch64GlobalISelUtils for the sake of code reuse.

This is done in post-legalizer lowering with pseudos to simplify selection.
The imported patterns end up handling selection for us this way.

Differential Revision: https://reviews.llvm.org/D101782
2021-05-10 15:40:06 -07:00
Jessica Paquette
49c3565b9b [AArch64][GlobalISel] Swap compare operands when it may be profitable
This adds support for swapping comparison operands when it may introduce new
folding opportunities.

This is roughly the same as the code added to AArch64ISelLowering in
162435e7b5e026b9f988c730bb6527683f6aa853.

For an example of a testcase which exercises this, see
llvm/test/CodeGen/AArch64/swap-compare-operands.ll

(Godbolt for that testcase: https://godbolt.org/z/43WEMb)

The idea behind this is that sometimes, we may be able to fold away, say, a
shift or extend in a compare by swapping its operands.

e.g. in the case of this compare:

```
lsl x8, x0, #1
cmp x8, x1
cset w0, lt
```

The following is equivalent:

```
cmp x1, x0, lsl #1
cset w0, gt
```

Most of the code here is just a reimplementation of what already exists in
AArch64ISelLowering.

(See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.)

Note that most of the AND code in the testcase doesn't actually fold. It seems
like we're missing selection support for that sort of fold right now, since SDAG
happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the
original .ll testcase)

Differential Revision: https://reviews.llvm.org/D89422
2021-04-09 15:46:48 -07:00
David Blaikie
b627802e81 Remove unused variable (rolling it into an assert) 2021-03-09 16:06:44 -08:00
Amara Emerson
45a9dca015 [AArch64][GlobalISel] Form G_DUPLANE32 for <2 x s32> shufflevectors in lowering.
For <2 x s32>, we can use G_DUPLANE32, but with a <4 x s32> source. To make it
work, we can just widen the original source with a concat_vectors.

Doing this allows <2 x float> indexed fmul instruction selection patterns to
fire, which gives a nice 0.3% code size saving on Bullet with -Os.

Differential Revision: https://reviews.llvm.org/D98059
2021-03-09 11:36:26 -08:00
Jessica Paquette
5c26be214d [AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP
If we have

```
%vec = G_BUILD_VECTOR %reg, %reg, ..., %reg
```

Then lower it to

```
%vec = G_DUP %reg
```

Also update the selector to handle constant splats on G_DUP.

This will not combine when the splat is all zeros or ones. Tablegen-imported
patterns rely on these being G_BUILD_VECTOR.

Minor code size improvements on CTMark at -Os.

Also adds some utility functions to make it a bit easier to recognize splats,
and an AArch64-specific splat helper.

Differential Revision: https://reviews.llvm.org/D97731
2021-03-08 13:01:10 -08:00
Jessica Paquette
daf7d7f0dc [AArch64][GlobalISel] Correct function evaluation order in applyINS
The order in which the nested calls to Builder.buildWhatever are
evaluated in differs between GCC and Clang.

This caused a bot failure because the MIR in the testcase was
coming out in a different order than expected.

Rather than using nested calls, pull them out in order to fix the
order of evaluation.
2021-02-23 16:21:11 -08:00
Jessica Paquette
ef1f7f1d7d Recommit "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt"
Attempted fix for the added test failing.

https://lab.llvm.org/buildbot/#/builders/104/builds/2355/steps/5/logs/stdio

I can't reproduce the failure anywhere, so I'm going to guess that passing a
std::function as MatchInfo is sketchy in this context.

Switch it to a std::tuple and hope for the best.
2021-02-23 11:55:16 -08:00
Jessica Paquette
662402a8b3 Revert "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt"
This reverts commit 867e379c0e14527eb7aa68485a10324693e35f5d.

For some reason this is upsetting Linux/Windows bots. Reverting while I try to
reproduce.
2021-02-22 17:36:17 -08:00
Jessica Paquette
867e379c0e [AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt
Match a G_SHUFFLE_VECTOR with a mask that allows it to be represented as a
G_INSERT_VECTOR_ELT and a G_EXTRACT_VECTOR_ELT.

This ports `isINSMask` from AArch64ISelLowering and the portion of
`AArch64TargetLowering::LowerVECTOR_SHUFFLE` which handles the equivalent
transformation.

This provides more opportunities for matching DUP. We don't have all of the
necessary combines to actually make DUP out of these yet, but this is better for
size than the full TBL expansion for G_SHUFFLE_VECTOR.

This is a -0.1% code size improvement on CTMark/Bullet at -Os.

IR example: https://godbolt.org/z/sdcevT

Differential Revision: https://reviews.llvm.org/D97214
2021-02-22 14:44:09 -08:00
Matt Arsenault
581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Jessica Paquette
b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Amara Emerson
f347d78cca [AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates.
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).

The pattern matching code has been simply moved to postlegalize lowering.

Differential Revision: https://reviews.llvm.org/D90820
2020-11-05 11:18:11 -08:00
Jessica Paquette
19dc9c9780 [AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.

This

- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.

- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)

- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.

- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.

Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.

Differential Revision: https://reviews.llvm.org/D89823
2020-10-22 15:27:36 -07:00
Jessica Paquette
147b9497e7 [AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)

It still makes sense to select these instructions at -O0.

Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.

This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.

Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.

And also add a 'r' which somehow got dropped from a bunch of function names.

https://reviews.llvm.org/D89820
2020-10-22 14:43:25 -07:00