252 Commits

Author SHA1 Message Date
David Green
ac321cbb03
[AArch64][GlobalISel] Legalize Insert vector element (#81453)
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
  TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
  make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
  use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
  expanding into a stack store and reload.
2024-04-08 08:44:13 +01:00
Eli Friedman
c83f23d6ab
[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.

On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).

To reflect this, this patch:

- Enables aggressive folding of shifts into loads by default.

- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).

- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.

I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
2024-04-04 11:25:44 -07:00
Evgenii Kudriashov
d365a45cb3
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
2024-03-23 13:12:44 +01:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Dhruv Chawla (work)
1d900e2984
[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452)
It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef
values choose random registers for copying values from.
2024-03-12 11:57:07 +05:30
Amara Emerson
f6b825f51e Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load.""
Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd.

This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.
2024-03-07 23:28:33 -08:00
Florian Mayer
ee24409c40 Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load."
This reverts commit 7524ad9aa7b1b5003fe554a6ac8e434d50027dfb.

Broke sanitizer build bots, e.g. https://lab.llvm.org/buildbot/#/builders/5/builds/41588/steps/9/logs/stdio
2024-03-07 09:43:21 -08:00
Amara Emerson
7524ad9aa7 [AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load.
This load isn't selected by tablegen due to the anyext, but wasn't generating
a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop
crashes later in codegen select it manually for now.
2024-03-07 00:12:17 -08:00
Fangrui Song
201572e34b
[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel
Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change `fastLowerCall` to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in `call-rv-marker.ll` is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Pull Request: https://github.com/llvm/llvm-project/pull/78890
2024-03-05 13:55:29 -08:00
David Green
3564000490
[AArch64][GlobalISel] FNeg constant materialization (#80643)
This is a Global ISel equivalent of #80641, creating fneg(movi) instead
of the alternative constant pool load or gpr dup.
2024-02-15 16:22:12 +00:00
Jay Foad
d57515bd10
[LLT] Add and use isPointerVector and isPointerOrPointerVector. NFC. (#81283) 2024-02-13 08:21:35 +00:00
David Green
887ed6d287
[AArch64][GlobalISel] Remove mulh c++ lowering (#81105)
I believe these should be selectable via tablegen patterns nowadays.
2024-02-11 11:20:53 +00:00
chuongg3
10943695f7
[AArch64][GlobalISel] Legalize BSWAP for Vector Types (#80036)
Add support of i16 vector operation for BSWAP and change to TableGen to
select instructions

Handle vector types that are smaller/larger than legal for BSWAP
2024-02-02 10:45:46 +00:00
David Green
f297d0bc6d
[AArch64][GlobalISel] More FCmp legalization. (#78734)
This fills out the fcmp handling to be more like the other instructions,
adding better support for fp16 and some larger vectors.

Select of f16 values is still not handled optimally in places as the
select is only legal for s32 values, not s16. This would be correct for
integer but not necessarily for fp. It is as if we need to do
legalization -> regbankselect -> extra legaliation -> selection.
2024-01-28 15:42:36 +00:00
Amara Emerson
c32d02efd2 [AArch64][GlobalISel] Fix not extending GPR32->GPR64 result of anyext indexed load.
Was causing assertions to fail.
2024-01-15 08:22:39 -08:00
Anatoly Trosinenko
8c777415a6
[AArch64][GISel] Drop custom selectors for ptrauth_* intrinsics (#75328)
Drop custom selector code for ptrauth_(sign|strip|blend) intrinsics from
AArch64InstructionSelector::selectIntrinsic function.

The code for strip and blend intrinsics was needed because of a bug in
TableGen fixed in 78623b079b3be841e96ce968ae5156fe26f6c565. The
ptrauth_sign intrinsic was presumably fixed long ago.
2023-12-19 18:14:46 +03:00
chuongg3
a604c4b562
[AArch64][GlobalISel] TableGen Selection for G_VECREDUCE_ADD (#70785)
Instruction Selection for G_VECREDUCE_ADD now uses TableGen
2023-11-13 10:32:24 +00:00
Fangrui Song
a62b86a3e6
[AArch64,ELF] Restrict MOVZ/MOVK to non-PIC large code model (#70178)
There is no PIC support for -mcmodel=large

(https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst)
and Clang recently rejects -mcmodel= with PIC (#70262).

The current backend code assumes that the large code model is non-PIC.
This patch adds `!getTargetMachine().isPositionIndependent()` conditions
to clarify that the support is non-PIC only. In addition, add some tests
as change detectors in case PIC large code model is supported in the
future.

If other front-ends/JITs use the large code model with PIC, they will
get small code model code sequence, instead of potentially-incorrect
MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence
will cause text relocations using ELF linkers.

(The small code model code sequence is usually sufficient as ADRP+ADD or
ADRP+LDR targets [-2**32,2**32), which has a doubled range of x86-64
R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-2**32,2**32).)
2023-11-01 12:10:44 -07:00
Antonio Frighetto
9fe5700611 [AArch64] Add support for v8.4a ldapur/stlur
AArch64 backend now features v8.4a atomic Load-Acquire
RCpc and Store-Release register unscaled support.
2023-10-30 19:27:48 +01:00
Amara Emerson
c9e8b73694
[AArch64][GlobalISel] Add support for extending indexed loads. (#70373) 2023-10-26 13:38:09 -07:00
Benjamin Kramer
f7de498403 [AArch64][GlobalISel] Fold variable into assert
Avoids unused variable warnings in release builds. NFCI.
2023-10-26 20:39:11 +02:00
Amara Emerson
93659947d2
[AArch64][GlobalISel] Add support for pre-indexed loads/stores. (#70185)
The pre-index matcher just needs some small heuristics to make sure it
doesn't cause regressions. Apart from that it's a simple change, since
the only difference is an immediate operand of '1' vs '0' in the
instruction.
2023-10-26 10:29:12 -07:00
Amara Emerson
1b11729dc0
[AArch64][GlobalISel] Add support for post-indexed loads/stores. (#69532)
Gives small code size improvements across the board at -Os CTMark.

Much of the work is porting the existing heuristics in the DAGCombiner.
2023-10-24 13:51:59 -07:00
Tobias Stadler
b1a6b2cc40
[AArch64][GlobalISel] Fix miscompile on carry-in selection (#68840)
Eliding the vReg to NZCV conversion instruction for G_UADDE/... is illegal if
it causes the carry generating instruction to become dead because ISel
will just remove the dead instruction.
I accidentally introduced this here: https://reviews.llvm.org/D153164.
As far as I can tell, this is not exposed on the default clang settings,
because on O0 there is always a G_AND between boolean defs and uses, so
the optimization doesn't apply. Thus, when I tried to commit
https://reviews.llvm.org/D159140, which removes these G_ANDs on O0, I
broke some UBSan tests.
We fix this by recursively selecting the previous (NZCV-setting) instruction before continuing selection for the current instruction.
2023-10-19 19:50:46 +02:00
David Green
20fc2ffb15 [AArch64][GlobalISel] Handle fp constant splats
This changes the DUP(constant) -> MOVI code to handle either integer or fp
types, allowing more constant to be selected, and fixes up some cases where fp
constants were being incorrectly selected.
2023-10-04 08:50:21 +01:00
chuongg3
140a094f5f
[AArch64][GlobalISel] More type support for G_VECREDUCE_ADD (#67433)
G_VECREDUCE_ADD is now able to have v4i16 and v8i8 vector types as
source registers
2023-09-28 11:47:26 +01:00
Mark Harley
eb96d6e2fb [AArch64][GlobalISel] Vector Constant Materialization
Vector constants are always lowered via constant pool loads. This patch selects
MOVI/MVNI in more cases where appropriate.
2023-09-25 13:40:33 +01:00
Kazu Hirata
ce8c22856e Use llvm::drop_begin and llvm::drop_end (NFC) 2023-09-22 17:29:10 -07:00
Vladislav Dzhidzhoev
4e970d7bd8
[AArch64][GlobalISel] Select llvm.aarch64.neon.st* intrinsics (#65491)
Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2023-09-15 16:35:21 +02:00
Vladislav Dzhidzhoev
c464896dbe
[AArch64][GlobalISel] Select llvm.aarch64.neon.ld* intrinsics (#65630)
Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp.
2023-09-15 14:03:48 +02:00
David Green
41fd143244 [AArch64][GlobalISel] Fix || / && precedence warning in assert. NFC 2023-09-10 12:48:44 +01:00
Vladislav Dzhidzhoev
0de6baab91 [AArch64][GlobalISel] Look through COPY and G_BITCAST while selecting fcvtl2 (fpext)
It tackles some regressions introduced in
https://reviews.llvm.org/D144670.
2023-09-07 14:08:20 +02:00
Amara Emerson
49d5bb4b34 [AArch64][GlobalISel] Materialize 64b FP immediates instead of loading if profitable.
This just mimics what the SDAG backend does.
2023-08-31 22:23:36 -07:00
Daniel Paoliello
0c5c7b52f0 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-31 12:06:50 -07:00
Arthur Eubanks
0a4fc4ac1c Revert "Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables"
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.

Causes crashes, see comments in https://reviews.llvm.org/D149367.

Some follow-up fixes are also reverted:

This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
2023-08-25 18:34:15 -07:00
Daniel Paoliello
8d0c3db388 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-25 10:19:17 -07:00
David Green
42b3419339 [AArch64] Split LSLFast into Addr and ALU parts
As far as I can tell FeatureLSLFast was originally added to specify that a lsl
of <= 3 was cheap when folded into an addressing operand, so should override
the one-use checks usually intended to make sure we don't perform redundant
work. At a later point it also came to also mean that add x0, x1, x2, lsl N
with N <= 4 was cheap, in that it took a single cycle not multiple cycles that
more complex adds usually take.

This patch splits those two concepts out into separate subtarget features. The
biggest change is the change to AArch64DAGToDAGISel::isWorthFoldingALU, making
ALU operations now produce a ADDWrs if the shift is <= 4.

Otherwise the patch is mostly an NFC as it tries to keep the subtarget features
the same for each cpu. I believe that the Arm OoO CPUs should eventually be
changed to a new subtarget feature that specifies that a shift of 2 or 3 with
any extend should be treated as cheap (just not shifts of 1 or 4).

Differential Revision: https://reviews.llvm.org/D157982
2023-08-18 08:59:24 +01:00
David Green
cf65afbf93 [AArch64][GISel] Extend lowering for fp round intrinsics.
This extends the lowering of ceil, floor, nearbyint, rint, round, roundeven and
trunc. They are all very similar, so can reuse the same legalization info.
selectIntrinsicTrunc and selectIntrinsicRound can be removed as they can be
selected via tablegen patterns, and G_INTRINSIC_ROUNDEVEN is marked as a gisel
equivalent of froundeven. Otherwise this reuses the existing code, filling it
out to handle more types.

Differential Revision: https://reviews.llvm.org/D157679
2023-08-17 16:25:32 +01:00
Sameer Sahasrabuddhe
7c760b224b Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556

This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
2023-07-27 14:49:17 +05:30
David Green
495bdfc7bb [AArch64] Lower fcvtl2 (fpext) via tablegen patterns.
This patch does two things. First it removes the tryHighFPExt DAG2DAG method
used to select fcvtl2 instructions, using tablegen patterns through
SelectExtractHigh instead. This essentially undoes D71515, in a way that should
hopefully avoid any regressions. The second is that a GI equivalent of
SelectExtractHigh is added in selectExtractHigh, from G_UNMERGE_VALUES. The
end result is that GlobalISel (and some constrained fpext) can now make use of
the fcvtl2 instructions, saving an extra dup/ext.

Differential Revision: https://reviews.llvm.org/D155871
2023-07-23 19:17:11 +01:00
hezuoqiang
f4ba1db5bf [GlobalISel] Fix the error transformation of BRCOND to BCC
Fix https://github.com/llvm/llvm-project/issues/62309

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D150527
2023-07-13 21:20:01 +08:00
pvanhout
8444038d16 [AMDGPU] Use GlobalISel MatchTable Combiner Backend
Use the new matchtable-based combiner backend for all AMDGPU combiners.
This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one.

Depends on D153757

NOTE: This would land iff D153757 (RFC) lands too.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153758
2023-07-11 11:27:13 +02:00
pvanhout
1fe7d9c799 [GlobalISel] Generalize InstructionSelector Match Tables
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.

Some notes:
 - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
 - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
 - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153755
2023-07-11 09:42:30 +02:00
Ahmed Bougacha
b3272f5ddb [AArch64][PAC] Select MOVK for ptrauth.blend intrinsic.
Blend combines two discriminator values used by other ptrauth ops.
On AArch64 here, it does that by replacing the high 16 bits of the
LHS with the low 16 bits of the RHS.

Usually the RHS is a constant, which lets us do this efficiently in
a single MOVK.  When the RHS isn't constant, we can do a BFI.

In a sense, this is implementing an ABI decision (how to lower the
software construct of "blend"), but if there are interesting variants to
consider, this could be made object-file-format-specific in some way.

Differential Revision: https://reviews.llvm.org/D132384
2023-06-26 09:43:37 -07:00
Tobias Stadler
84a6a057e6 [AArch64][GlobalISel] Select G_UADDE/G_SADDE/G_USUBE/G_SSUBE
This implements the remaining overflow generating instructions in the AArch64
GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG
anymore. We make use of PostSelectOptimize to cleanup the hereby generated
flag-setting operations when the carry-out is unused. Since we do not fallback
anymore when selecting add/sub atomics on O0 some test changes were required
there.

Fixes: https://github.com/llvm/llvm-project/issues/59407

Differential Revision: https://reviews.llvm.org/D153164
2023-06-25 14:32:00 -07:00
David Green
68a09c9290 [AArch64] Remove G_VECREDUCE_FADD from selectReduction
I believe that for fp reductions we can use the imported tablegen patterns for
selection, as opposed to going via selectReduction. Integer reductions are more
difficult, as the return types in selection DAG will be promoted to i32.

Differential Revision: https://reviews.llvm.org/D153244
2023-06-22 12:46:54 +01:00
Vladislav Dzhidzhoev
e291179e2d [AArch64][GlobalISel] Selection support for v8s8, v4s16, v16s8 G_INSERT_VECTOR_ELT with GPR scalar
This is to support some NEON intrinsics on GlobalISel.

Differential Revision: https://reviews.llvm.org/D146780
2023-05-19 17:38:22 +02:00
Kazu Hirata
b9c4b95b11 [llvm] Use ConstantInt::{isZero,isOne} (NFC) 2023-03-21 17:40:35 -07:00
Kazu Hirata
a7baaab952 Use APInt::isZero instead of APInt::isNulLValue (NFC)
Note that APInt::isNullValue has been soft-deprecated in favor of
APInt::isZero.
2023-02-19 22:23:58 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00