llvm-project

Author	SHA1	Message	Date
David Green	ac321cbb03	[AArch64][GlobalISel] Legalize Insert vector element (#81453 ) This attempts to standardize and extend some of the insert vector element lowering. Most notably: - More types are handled by splitting illegal vectors. - The index type for G_INSERT_VECTOR_ELT is canonicalized to TLI.getVectorIdxTy(), similar to extact_vector_element. - Some of the existing patterns now have the index type specified to make sure they can apply to GISel too. - The C++ selection code has been removed, relying on tablegen patterns. - G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to use a i32 type, allowing the existing patterns to apply. - Variable index inserts are lowered in post-legalizer lowering, expanding into a stack store and reload.	2024-04-08 08:44:13 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00
Evgenii Kudriashov	d365a45cb3	[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941 ) Here we introduce three new GMIR instructions to cover a set of trap intrinsics. The idea behind it is that generic intrinsics shouldn't be used with G_INTRINSIC opcode. These new instructions can match perfectly with existing trap ISD nodes. It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for selection and avoid manual selection. However AMDGPU is an exception. It selects traps during legalization regardless SelectionDAG or GlobalISel. Since there are not many places where traps are used, this change attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So, there is no stage when both G_TRAP and G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.	2024-03-23 13:12:44 +01:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Dhruv Chawla (work)	1d900e2984	[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452 ) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.	2024-03-12 11:57:07 +05:30
Amara Emerson	f6b825f51e	Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load."" Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd. This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.	2024-03-07 23:28:33 -08:00
Florian Mayer	ee24409c40	Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load." This reverts commit 7524ad9aa7b1b5003fe554a6ac8e434d50027dfb. Broke sanitizer build bots, e.g. https://lab.llvm.org/buildbot/#/builders/5/builds/41588/steps/9/logs/stdio	2024-03-07 09:43:21 -08:00
Amara Emerson	7524ad9aa7	[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load. This load isn't selected by tablegen due to the anyext, but wasn't generating a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop crashes later in codegen select it manually for now.	2024-03-07 00:12:17 -08:00
Fangrui Song	201572e34b	[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: https://github.com/llvm/llvm-project/pull/78890	2024-03-05 13:55:29 -08:00
David Green	3564000490	[AArch64][GlobalISel] FNeg constant materialization (#80643 ) This is a Global ISel equivalent of #80641, creating fneg(movi) instead of the alternative constant pool load or gpr dup.	2024-02-15 16:22:12 +00:00
Jay Foad	d57515bd10	[LLT] Add and use isPointerVector and isPointerOrPointerVector. NFC. (#81283 )	2024-02-13 08:21:35 +00:00
David Green	887ed6d287	[AArch64][GlobalISel] Remove mulh c++ lowering (#81105 ) I believe these should be selectable via tablegen patterns nowadays.	2024-02-11 11:20:53 +00:00
chuongg3	10943695f7	[AArch64][GlobalISel] Legalize BSWAP for Vector Types (#80036 ) Add support of i16 vector operation for BSWAP and change to TableGen to select instructions Handle vector types that are smaller/larger than legal for BSWAP	2024-02-02 10:45:46 +00:00
David Green	f297d0bc6d	[AArch64][GlobalISel] More FCmp legalization. (#78734 ) This fills out the fcmp handling to be more like the other instructions, adding better support for fp16 and some larger vectors. Select of f16 values is still not handled optimally in places as the select is only legal for s32 values, not s16. This would be correct for integer but not necessarily for fp. It is as if we need to do legalization -> regbankselect -> extra legaliation -> selection.	2024-01-28 15:42:36 +00:00
Amara Emerson	c32d02efd2	[AArch64][GlobalISel] Fix not extending GPR32->GPR64 result of anyext indexed load. Was causing assertions to fail.	2024-01-15 08:22:39 -08:00
Anatoly Trosinenko	8c777415a6	[AArch64][GISel] Drop custom selectors for ptrauth_* intrinsics (#75328 ) Drop custom selector code for ptrauth_(sign\|strip\|blend) intrinsics from AArch64InstructionSelector::selectIntrinsic function. The code for strip and blend intrinsics was needed because of a bug in TableGen fixed in 78623b079b3be841e96ce968ae5156fe26f6c565. The ptrauth_sign intrinsic was presumably fixed long ago.	2023-12-19 18:14:46 +03:00
chuongg3	a604c4b562	[AArch64][GlobalISel] TableGen Selection for G_VECREDUCE_ADD (#70785 ) Instruction Selection for G_VECREDUCE_ADD now uses TableGen	2023-11-13 10:32:24 +00:00
Fangrui Song	a62b86a3e6	[AArch64,ELF] Restrict MOVZ/MOVK to non-PIC large code model (#70178 ) There is no PIC support for -mcmodel=large (https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst) and Clang recently rejects -mcmodel= with PIC (#70262). The current backend code assumes that the large code model is non-PIC. This patch adds `!getTargetMachine().isPositionIndependent()` conditions to clarify that the support is non-PIC only. In addition, add some tests as change detectors in case PIC large code model is supported in the future. If other front-ends/JITs use the large code model with PIC, they will get small code model code sequence, instead of potentially-incorrect MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence will cause text relocations using ELF linkers. (The small code model code sequence is usually sufficient as ADRP+ADD or ADRP+LDR targets [-232,232), which has a doubled range of x86-64 R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-232,232).)	2023-11-01 12:10:44 -07:00
Antonio Frighetto	9fe5700611	[AArch64] Add support for v8.4a `ldapur`/`stlur` AArch64 backend now features v8.4a atomic Load-Acquire RCpc and Store-Release register unscaled support.	2023-10-30 19:27:48 +01:00
Amara Emerson	c9e8b73694	[AArch64][GlobalISel] Add support for extending indexed loads. (#70373 )	2023-10-26 13:38:09 -07:00
Benjamin Kramer	f7de498403	[AArch64][GlobalISel] Fold variable into assert Avoids unused variable warnings in release builds. NFCI.	2023-10-26 20:39:11 +02:00
Amara Emerson	93659947d2	[AArch64][GlobalISel] Add support for pre-indexed loads/stores. (#70185 ) The pre-index matcher just needs some small heuristics to make sure it doesn't cause regressions. Apart from that it's a simple change, since the only difference is an immediate operand of '1' vs '0' in the instruction.	2023-10-26 10:29:12 -07:00
Amara Emerson	1b11729dc0	[AArch64][GlobalISel] Add support for post-indexed loads/stores. (#69532 ) Gives small code size improvements across the board at -Os CTMark. Much of the work is porting the existing heuristics in the DAGCombiner.	2023-10-24 13:51:59 -07:00
Tobias Stadler	b1a6b2cc40	[AArch64][GlobalISel] Fix miscompile on carry-in selection (#68840 ) Eliding the vReg to NZCV conversion instruction for G_UADDE/... is illegal if it causes the carry generating instruction to become dead because ISel will just remove the dead instruction. I accidentally introduced this here: https://reviews.llvm.org/D153164. As far as I can tell, this is not exposed on the default clang settings, because on O0 there is always a G_AND between boolean defs and uses, so the optimization doesn't apply. Thus, when I tried to commit https://reviews.llvm.org/D159140, which removes these G_ANDs on O0, I broke some UBSan tests. We fix this by recursively selecting the previous (NZCV-setting) instruction before continuing selection for the current instruction.	2023-10-19 19:50:46 +02:00
David Green	20fc2ffb15	[AArch64][GlobalISel] Handle fp constant splats This changes the DUP(constant) -> MOVI code to handle either integer or fp types, allowing more constant to be selected, and fixes up some cases where fp constants were being incorrectly selected.	2023-10-04 08:50:21 +01:00
chuongg3	140a094f5f	[AArch64][GlobalISel] More type support for G_VECREDUCE_ADD (#67433 ) G_VECREDUCE_ADD is now able to have v4i16 and v8i8 vector types as source registers	2023-09-28 11:47:26 +01:00
Mark Harley	eb96d6e2fb	[AArch64][GlobalISel] Vector Constant Materialization Vector constants are always lowered via constant pool loads. This patch selects MOVI/MVNI in more cases where appropriate.	2023-09-25 13:40:33 +01:00
Kazu Hirata	ce8c22856e	Use llvm::drop_begin and llvm::drop_end (NFC)	2023-09-22 17:29:10 -07:00
Vladislav Dzhidzhoev	4e970d7bd8	[AArch64][GlobalISel] Select llvm.aarch64.neon.st* intrinsics (#65491 ) Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp	2023-09-15 16:35:21 +02:00
Vladislav Dzhidzhoev	c464896dbe	[AArch64][GlobalISel] Select llvm.aarch64.neon.ld* intrinsics (#65630 ) Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp.	2023-09-15 14:03:48 +02:00
David Green	41fd143244	[AArch64][GlobalISel] Fix \|\| / && precedence warning in assert. NFC	2023-09-10 12:48:44 +01:00
Vladislav Dzhidzhoev	0de6baab91	[AArch64][GlobalISel] Look through COPY and G_BITCAST while selecting fcvtl2 (fpext) It tackles some regressions introduced in https://reviews.llvm.org/D144670.	2023-09-07 14:08:20 +02:00
Amara Emerson	49d5bb4b34	[AArch64][GlobalISel] Materialize 64b FP immediates instead of loading if profitable. This just mimics what the SDAG backend does.	2023-08-31 22:23:36 -07:00
Daniel Paoliello	0c5c7b52f0	Emit the CodeView `S_ARMSWITCHTABLE` debug symbol for jump tables The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information: * The address of the branch instruction that uses the jump table. * The address of the jump table. * The "base" address that the values in the jump table are relative to. * The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted). Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to. Documentation for the symbol can be found in the Microsoft PDB library dumper: `0fe89a942f/cvdump/dumpsym7.cpp (L5518)` This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D149367	2023-08-31 12:06:50 -07:00
Arthur Eubanks	0a4fc4ac1c	Revert "Emit the CodeView `S_ARMSWITCHTABLE` debug symbol for jump tables" This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3. Causes crashes, see comments in https://reviews.llvm.org/D149367. Some follow-up fixes are also reverted: This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085. This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6. This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.	2023-08-25 18:34:15 -07:00
Daniel Paoliello	8d0c3db388	Emit the CodeView `S_ARMSWITCHTABLE` debug symbol for jump tables The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information: * The address of the branch instruction that uses the jump table. * The address of the jump table. * The "base" address that the values in the jump table are relative to. * The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted). Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to. Documentation for the symbol can be found in the Microsoft PDB library dumper: `0fe89a942f/cvdump/dumpsym7.cpp (L5518)` This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D149367	2023-08-25 10:19:17 -07:00
David Green	42b3419339	[AArch64] Split LSLFast into Addr and ALU parts As far as I can tell FeatureLSLFast was originally added to specify that a lsl of <= 3 was cheap when folded into an addressing operand, so should override the one-use checks usually intended to make sure we don't perform redundant work. At a later point it also came to also mean that add x0, x1, x2, lsl N with N <= 4 was cheap, in that it took a single cycle not multiple cycles that more complex adds usually take. This patch splits those two concepts out into separate subtarget features. The biggest change is the change to AArch64DAGToDAGISel::isWorthFoldingALU, making ALU operations now produce a ADDWrs if the shift is <= 4. Otherwise the patch is mostly an NFC as it tries to keep the subtarget features the same for each cpu. I believe that the Arm OoO CPUs should eventually be changed to a new subtarget feature that specifies that a shift of 2 or 3 with any extend should be treated as cheap (just not shifts of 1 or 4). Differential Revision: https://reviews.llvm.org/D157982	2023-08-18 08:59:24 +01:00
David Green	cf65afbf93	[AArch64][GISel] Extend lowering for fp round intrinsics. This extends the lowering of ceil, floor, nearbyint, rint, round, roundeven and trunc. They are all very similar, so can reuse the same legalization info. selectIntrinsicTrunc and selectIntrinsicRound can be removed as they can be selected via tablegen patterns, and G_INTRINSIC_ROUNDEVEN is marked as a gisel equivalent of froundeven. Otherwise this reuses the existing code, filling it out to handle more types. Differential Revision: https://reviews.llvm.org/D157679	2023-08-17 16:25:32 +01:00
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
David Green	495bdfc7bb	[AArch64] Lower fcvtl2 (fpext) via tablegen patterns. This patch does two things. First it removes the tryHighFPExt DAG2DAG method used to select fcvtl2 instructions, using tablegen patterns through SelectExtractHigh instead. This essentially undoes D71515, in a way that should hopefully avoid any regressions. The second is that a GI equivalent of SelectExtractHigh is added in selectExtractHigh, from G_UNMERGE_VALUES. The end result is that GlobalISel (and some constrained fpext) can now make use of the fcvtl2 instructions, saving an extra dup/ext. Differential Revision: https://reviews.llvm.org/D155871	2023-07-23 19:17:11 +01:00
hezuoqiang	f4ba1db5bf	[GlobalISel] Fix the error transformation of BRCOND to BCC Fix https://github.com/llvm/llvm-project/issues/62309 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D150527	2023-07-13 21:20:01 +08:00
pvanhout	8444038d16	[AMDGPU] Use GlobalISel MatchTable Combiner Backend Use the new matchtable-based combiner backend for all AMDGPU combiners. This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one. Depends on D153757 NOTE: This would land iff D153757 (RFC) lands too. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153758	2023-07-11 11:27:13 +02:00
pvanhout	1fe7d9c799	[GlobalISel] Generalize `InstructionSelector` Match Tables Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner. Some notes: - Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now. - `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific. - Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153755	2023-07-11 09:42:30 +02:00
Ahmed Bougacha	b3272f5ddb	[AArch64][PAC] Select MOVK for ptrauth.blend intrinsic. Blend combines two discriminator values used by other ptrauth ops. On AArch64 here, it does that by replacing the high 16 bits of the LHS with the low 16 bits of the RHS. Usually the RHS is a constant, which lets us do this efficiently in a single MOVK. When the RHS isn't constant, we can do a BFI. In a sense, this is implementing an ABI decision (how to lower the software construct of "blend"), but if there are interesting variants to consider, this could be made object-file-format-specific in some way. Differential Revision: https://reviews.llvm.org/D132384	2023-06-26 09:43:37 -07:00
Tobias Stadler	84a6a057e6	[AArch64][GlobalISel] Select G_UADDE/G_SADDE/G_USUBE/G_SSUBE This implements the remaining overflow generating instructions in the AArch64 GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG anymore. We make use of PostSelectOptimize to cleanup the hereby generated flag-setting operations when the carry-out is unused. Since we do not fallback anymore when selecting add/sub atomics on O0 some test changes were required there. Fixes: https://github.com/llvm/llvm-project/issues/59407 Differential Revision: https://reviews.llvm.org/D153164	2023-06-25 14:32:00 -07:00
David Green	68a09c9290	[AArch64] Remove G_VECREDUCE_FADD from selectReduction I believe that for fp reductions we can use the imported tablegen patterns for selection, as opposed to going via selectReduction. Integer reductions are more difficult, as the return types in selection DAG will be promoted to i32. Differential Revision: https://reviews.llvm.org/D153244	2023-06-22 12:46:54 +01:00
Vladislav Dzhidzhoev	e291179e2d	[AArch64][GlobalISel] Selection support for v8s8, v4s16, v16s8 G_INSERT_VECTOR_ELT with GPR scalar This is to support some NEON intrinsics on GlobalISel. Differential Revision: https://reviews.llvm.org/D146780	2023-05-19 17:38:22 +02:00
Kazu Hirata	b9c4b95b11	[llvm] Use ConstantInt::{isZero,isOne} (NFC)	2023-03-21 17:40:35 -07:00
Kazu Hirata	a7baaab952	Use APInt::isZero instead of APInt::isNulLValue (NFC) Note that APInt::isNullValue has been soft-deprecated in favor of APInt::isZero.	2023-02-19 22:23:58 -08:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00

1 2 3 4 5 ...

252 Commits