llvm-project

Author	SHA1	Message	Date
Craig Topper	bc282605df	[SelectionDAG] Require last operand of (STRICT_)FP_ROUND to be a TargetConstant. (#117639 ) Fix all the places I could find that did't do this. We were already mostly correct for FP_ROUND after 9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.	2024-11-25 21:36:33 -08:00
Benjamin Maxwell	cc721dba4e	[AArch64][Codegen] Improve small shufflevector/concat lowering for SME (#116662 ) This now tries to widen the shuffle before generating a possibly expensive SVE TBL, this may allow the shuffle to be matched as something cheaper like a ZIP1.	2024-11-22 10:15:23 +00:00
Paul Walker	71b87d1267	[LLVM][SVE] Ensure all fixed length mask bits are defined. (#116819 ) convertFixedMaskToScalableVector expects the mask input to honour the BoolContents scheme employed by the target. For AArch64 this means a mask should be zero or all ones, and thus when promoting a mask we must use a sign extend.	2024-11-20 13:54:50 +00:00
Sergei Barannikov	a160e51500	[AArch64] Fix SDNode type mismatches between .td files and ISel (#116523 ) `MRS`, `PTEST` and FP comparisons were missing "flags" result, and were sometimes created with invalid types (f32, Glue, Other). * `REV16`, `REV32`, `REV64`, and `CMGEz` were sometimes created with an extra operand. * `TLSDESC_CALLSEQ` had `SDNPInGlue` property, but the node was never created with a glue operand.	2024-11-20 15:55:28 +03:00
David Green	bca846d462	[AArch64] Improve mull generation (#114997 ) This attempts to clean up and improve where we generate smull/umull using known-bits. For v2i64 types (where no mul is present), we try to create mull more aggressively to avoid scalarization.	2024-11-20 09:12:22 +00:00
Craig Topper	ce0cc8e9eb	[AArch64][VE][X86] Use getSignedTargetConstant. NFC	2024-11-18 13:12:23 -08:00
Sergei Barannikov	baf59be89b	[SelectionDAG] Fix return types of TC_RETURN for several targets (#116504 ) TC_RETURN nodes do not have a glue result.	2024-11-17 02:14:05 +03:00
Sergei Barannikov	b69f646c46	[AArch64] Remove unused SDNodes (NFC) (#116236 ) The corresponding enum members were only used by `EmitMOPS`, which immediately translated them to machine opcodes. Just pass the machine opcodes instead.	2024-11-16 13:14:42 +03:00
James Chesterman	4e1db6a318	[AArch64][SVE] Add AArch64ISD nodes for wide add instructions (#115895 ) When lowering from a partial reduction to a pair of wide adds, previously the corresponding intrinsics were returned as nodes. Now there are AArch64ISD nodes that are returned.	2024-11-15 11:01:10 +00:00
Ricardo Jesus	e52238b59f	[AArch64] Add @llvm.experimental.vector.match (#101974 ) This patch introduces an experimental intrinsic for matching the elements of one vector against the elements of another. For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH instruction for supported fixed and scalar vector types.	2024-11-14 09:00:19 +00:00
James Chesterman	c3c2e1e161	[AArch64][SVE] Add codegen support for partial reduction lowering to wide add instructions (#114406 ) For partial reductions in the situation of the number of elements being halved, a pair of wide add instructions can be used.	2024-11-12 13:53:35 +00:00
Kazu Hirata	a41922ad75	[AArch64] Remove unused includes (NFC) (#115685 ) Identified with misc-include-cleaner.	2024-11-11 07:35:08 -08:00
Daniil Kovalev	da083e358e	[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811 ) This re-applies #96164 after revert in #102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got	2024-11-01 12:21:10 +03:00
Yingwei Zheng	cf9d1c1486	[SDAG] Simplify `SDNodeFlags` with bitwise logic (#114061 ) This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.	2024-10-31 08:10:07 +08:00
David Green	83ae171722	[AArch64] Add ComputeNumSignBits for VASHR. (#113957 ) As with a normal ISD::SRA node, they take the number of sign bits of the incoming value and increase it by the shifted amount.	2024-10-29 21:02:32 +00:00
David Green	8274be509e	[AArch64] Remove header dependencies of AArch64ISelLowering.h. NFC This patch aims to reduce the include used by AArch64ISelLowering, allowing it to be included by unittests so that they can reference the AArch64ISD nodes. It: - Moves the inclusion of AArch64SMEAttributes.h to the uses. - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that AArch64PACKey is not required in the header. - Moves the definitions of getExceptionPointerRegister to the cpp file, to remove the reference of AArch64::X0.	2024-10-28 18:53:37 +00:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Sander de Smalen	db0e376044	[AArch64] Fix failure with inline asm and svcount (#112537 ) This fixes an issue where the compiler runs into an assertion failure for the following example: register svcount_t pred asm("pn8") = svptrue_c8(); asm("ld1w { z0.s, z4.s, z8.s, z12.s }, %[pred]/z, [x0]\n" : : [pred] "Uph" (pred) : "memory", "cc"); Here the register constraint that ends up in the LLVM IR is "{pn8}", but the code in `TargetRegisterInfo::getRegForInlineAsmConstraint` that parses that string, follows a path where it queries a suitable register class for this register (<=> PPRorPNR regclass), for which it then chooses `nxv16i1` as a suitable type. These choices individually are correct, but the combined result isn't, because the type should be `aarch64svcount`. This then results in issues later on in SelectionDAGBuilder.cpp in CopyToReg because the type of the actual value and the computed type from the constraint don't match. This PR pre-empts this issue by parsing the predicate explicitly and returning the correct register class.	2024-10-24 17:41:07 +01:00
Ricardo Jesus	8a9921f569	[AArch64] Use INDEX for constant Neon step vectors (#113424 ) When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.: ``` uint32x4_t foo() { return (uint32x4_t){0, 1, 2, 3}; } ``` Currently: ``` foo(): adrp x8, .LCPI1_0 ldr q0, [x8, :lo12:.LCPI1_0] ret ``` With INDEX: ``` foo(): index z0.s, #0, #1 ret ``` The logic for this was already in `LowerBUILD_VECTOR`, though it was hidden under a check for `!Subtarget->isNeonAvailable()`. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).	2024-10-23 15:20:33 +01:00
Paul Walker	f1ade1f874	[LLVM][CodeGen][AArch64] while_le(#,max_int) -> all_active (#111183 ) When the second operand of an incrementing while instruction is the maximum value, comparisons that include equality can never fail.	2024-10-22 12:28:39 +01:00
James Chesterman	11c818816d	[AArch64] Improve index selection for histograms (#111150 ) Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.	2024-10-22 11:14:00 +01:00
David Green	009fb567ce	[AArch64] Add patterns for combining qxtn+rshr to qrshrn Similar to bd861d0e690cfd05184d86, this adds some patterns for converting signed and unsigned variants of rshr+qxtn to qrshrn.	2024-10-21 21:06:48 +01:00
David Green	bd861d0e69	[AArch64] Add some basic patterns for qshrn. With the truncssat nodes these are relatively simple tablegen patterns to add. The existing intrinsics are converted to shift+truncsat to they can lower using the new patterns. Fixes #112925.	2024-10-21 15:04:20 +01:00
David Green	ecfeacd152	[AArch64] Convert aarch64_neon_sqxtn to ISD::TRUNCATE_SSAT_S and replace tablegen patterns This lowers the aarch64_neon_sqxtn intrinsics to the new TRUNCATE_SSAT_S ISD nodes, performing the same for sqxtun and uqxtn. This allows us to clean up the tablegen patterns a little and in a future commit add combines for sqxtn.	2024-10-21 14:29:21 +01:00
Simon Pilgrim	f0b3b6d15b	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) (REAPPLIED) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423 Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0	2024-10-20 14:23:21 +01:00
Martin Storsjö	b26df3e463	Revert "[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 )" This reverts commit a630771b28f4b252e2754776b8f3ab416133951a. This caused compilation to hang for Windows/ARM, see https://github.com/llvm/llvm-project/pull/112710 for details.	2024-10-20 00:49:16 +03:00
Simon Pilgrim	a630771b28	[DAG] isConstantIntBuildVectorOrConstantInt - peek through bitcasts (#112710 ) Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this. Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions. X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type Minor cleanup that helps with #107423	2024-10-18 10:52:55 +01:00
Benjamin Maxwell	5f7502bf1f	[AArch64][SVE] Support lowering fixed-length BUILD_VECTORS to ZIPs (#111698 ) This allows lowering fixed-length (non-constant) BUILD_VECTORS (<= 128-bit) to a chain of ZIP1 instructions when Neon is not available, rather than using the default lowering, which is to spill to the stack and reload. For example, ``` t5: v4f32 = BUILD_VECTOR(t0, t1, t2, t3) ``` Becomes: ``` zip1 z0.s, z0.s, z1.s // z0 = t0,t1,... zip1 z2.s, z2.s, z3.s // z2 = t2,t3,... zip1 z0.d, z0.d, z2.d // z0 = t0,t1,t2,t3,... ``` When values are already in FRPs, this generally seems to lead to a more compact output with less movement to/from the stack.	2024-10-18 10:19:22 +01:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
Jay Foad	9255850e89	[LLVM] Remove unused variables after #112546	2024-10-16 16:15:34 +01:00
Jay Foad	d9c95efb6c	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546 ) Convert almost every instance of: CreateCall(Intrinsic::getOrInsertDeclaration(...), ...) to the equivalent CreateIntrinsic call.	2024-10-16 15:43:30 +01:00
David Green	a07639f4bb	[AArch64] Increase inline memmove limit to 16 stored registers (#111848 ) The memcpy inline limit has been 16 for a long time, this patch makes the memmove inline limit the same, allowing small-constant sized memmoves to be emitted inline. The 16 is the number of registers stored, which equates to a limit of 256 bytes.	2024-10-14 08:57:32 +01:00
Benjamin Maxwell	c3a10dc849	[AArch64] Disable consecutive store merging when Neon is unavailable (#111519 ) Lowering fixed-size BUILD_VECTORS without Neon may introduce stack spills, leading to more stores/reloads than if the stores were not merged. In some cases, it can also prevent using paired store instructions. In the future, we may want to relax when SVE is available, but currently, the SVE lowerings for BUILD_VECTOR are limited to a few specific cases.	2024-10-11 14:15:01 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
YunQiang Su	72fb379225	AArch64: Select FCANONICALIZE (#104429 ) FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. --------- Co-authored-by: Your Name <you@example.com>	2024-10-11 08:45:14 +08:00
YunQiang Su	8d35ab80fc	AArch64: Add FMINNUM_IEEE and FMAXNUM_IEEE support (#107855 ) FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Add fp-maximumnum-minimumnum.ll, with nnan testcases only. V1F64 is not supported yet. If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem: both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`. AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()` for FMAXNUM and FMINNUM. We should fix this problem, while it will be in future patch.	2024-10-10 15:09:47 +08:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Paul Walker	bfe066676b	[LLVM][CodeGen][SVE2] Implement nxvf64 fpround to nxvbf16. (#111012 ) NOTE: SVE2 only because that is when FCVTX is available, which is required to perform the necessary two-step rounding.	2024-10-08 12:01:57 +01:00
Paul Walker	02dd6b1014	[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803 ) Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc	2024-10-07 13:01:59 +01:00
Jorge Botto	a4516da49f	[AArch64] - Fold and and cmp into tst (#110347 ) Fixes https://github.com/llvm/llvm-project/issues/102703. https://godbolt.org/z/nfj8xsb1Y The following pattern: ``` %2 = and i32 %0, 254 %3 = icmp eq i32 %2, 0 ``` is optimised by instcombine into: ```%3 = icmp ult i32 %0, 2``` However, post instcombine leads to worse aarch64 than the unoptimised version. Pre instcombine: ``` tst w0, #0xfe cset w0, eq ret ``` Post instcombine: ``` and w8, w0, #0xff cmp w8, #2 cset w0, lo ret ``` In the unoptimised version, SelectionDAG converts `SETCC (AND X 254) 0 EQ` into `CSEL 0 1 1 (ANDS X 254)`, which gets emitted as a `tst`. In the optimised version, SelectionDAG converts `SETCC (AND X 255) 2 ULT` into `CSEL 0 1 2 (SUBS (AND X 255) 2)`, which gets emitted as an `and`/`cmp`. This PR adds an optimisation to `AArch64ISelLowering`, converting `SETCC (AND X Y) Z ULT` into `SETCC (AND X (Y & ~(Z - 1))) 0 EQ` when `Z` is a power of two. This makes SelectionDAG/Codegen produce the same optimised code for both examples.	2024-10-03 17:56:01 +01:00
Paul Walker	d283705829	[AArch64][SVE] Fix definition of bfloat fcvt intrinsics. (#110281 ) Affected intrinsics: llvm.aarch64.sve.fcvt.bf16f32 llvm.aarch64.sve.fcvtnt.bf16f32 The named intrinsics took a predicate based on the smallest element type when it should be based on the largest. The intrinsics have been replace by v2 equivalents and affected code ported to use them. Patch includes changes to getSVEPredicateBitCast() that ensure the generated code for the auto-upgraded old intrinsics is unchanged.	2024-10-03 12:36:01 +01:00
James Chesterman	b2a6814126	[AArch64][NEON][SVE] Lower i8 to i64 partial reduction to a dot product (#110220 ) An i8 to i64 partial reduction can instead be done with an i8 to i32 dot product followed by a sign extension.	2024-10-01 13:26:38 +01:00
CarolineConcatto	308c9a9451	[Clang][LLVM][AArch64] Add intrinsic for MOVT SME2 instruction (#97602 ) This patch adds these intrinsics: // Variants are also available for: // [_s8], [_u16], [_s16], [_u32], [_s32], [_u64], [_s64] // [_bf16], [_f16], [_f32], [_f64] void svwrite_lane_zt[_u8](uint64_t zt0, svuint8_t zt, uint64_t idx) __arm_streaming __arm_inout("zt0"); void svwrite_zt[_u8](uint64_t zt0, svuint8_t zt) __arm_streaming __arm_inout("zt0"); according to PR#324[1] [1]https://github.com/ARM-software/acle/pull/324	2024-10-01 10:11:32 +01:00
Paul Walker	3e3780ef6a	[LLVM][CodeGen][SVE] Implement nxvf32 fpround to nxvbf16. (#107420 )	2024-09-24 13:15:26 +01:00
Sam Tebbs	f7714342ae	[AArch64][NEON][SVE] Lower mixed sign/zero extended partial reductions to usdot (#107566 ) This PR adds lowering for partial reductions of a mix of sign/zero extended inputs to the usdot intrinsic.	2024-09-19 14:00:45 +01:00
David Green	960c975acd	[AArch64] Expand scmp/ucmp vector operations with sub (#108830 ) Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select, under Neon we can use the arithmetic expansion to generate fewer instructions. Notably it also prevents the scalarization of vselect during vector-legalization.	2024-09-16 18:44:52 +01:00
David Green	5c7957dd4f	[AArch64] Allow i16->f64 uitofp tbl shuffles Just as we convert i8->f32 uitofp to tbl to perform the zext, we can do the same for i16->f64.	2024-09-11 22:21:52 +01:00
adprasad-nvidia	23595d1b96	[AArch64] Lower __builtin_bswap16 to rev16 if bswap followed by any_extend (#105375 ) GCC compiles the built-in function `__builtin_bswap16`, to the ARM instruction rev16, which reverses the byte order of 16-bit data. On the other Clang compiles the same built-in function to e.g. ``` rev w8, w0 lsr w0, w8, #16 ``` i.e. it performs a byte reversal of a 32-bit register, (which moves the lower half, which contains the 16-bit data, to the upper half) and then right shifts the reversed 16-bit data back to the lower half of the register. We can improve Clang codegen by generating `rev16` instead of `rev` and `lsr`, like GCC.	2024-09-10 10:57:07 +01:00

1 2 3 4 5 ...

2099 Commits