llvm-project

Author	SHA1	Message	Date
Momchil Velikov	4b6968952e	[AArch64] Implement spill/fill of predicate pair register classes (#76068 ) We are getting ICE with, e.g. ``` #include <arm_sve.h> void g(); svboolx2_t f0(int64_t i, int64_t n) { svboolx2_t r = svwhilelt_b16_x2(i, n); g(); return r; } ```	2023-12-22 15:54:12 +00:00
David Green	48b9106656	[AArch64] Add an strict fp reduction test. NFC	2023-12-22 13:25:00 +00:00
Matt Arsenault	0e46b49de4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86. PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667	2023-12-22 16:46:22 +07:00
Vitaly Buka	0ccc1e7acd	Revert "[AArch64] Fold more load.x into load.i with large offset" Issue #76202 This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.	2023-12-21 21:12:40 -08:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Tomas Matheson	5992ce90b8	[AArch64] Codegen support for FEAT_PAuthLR - Adds a new +pc option to -mbranch-protection that will enable the use of PC as a diversifier in PAC branch protection code. - When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) are used. Documentation for the relevant instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/ Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-21 14:18:33 +00:00
Paschalis Mpeis	2e3d77d6ed	[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642 ) [TLI] Pass replace-with-veclib works with Scalable Vectors. The pass is heavily refactored. It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors. Improve tests for ArmPL and SLEEF Intrinsics: - Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines. - Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]` - Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.	2023-12-21 12:37:57 +00:00
zhongyunde 00443407	f568763641	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2023-12-21 18:54:15 +08:00
zhongyunde 00443407	32878c2065	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917	2023-12-21 18:54:14 +08:00
zhongyunde 00443407	4bad0cb359	[AArch64] Precommit tests for PR75343, NFC	2023-12-21 18:54:14 +08:00
David Green	c0931d4950	[AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT for each lane, allowing all the existing tablegen patterns to apply to them. A couple of tablegen patterns need to be altered to make sure the type of the constant operand is known, so that the patterns are recognized under global isel. Closes #75662	2023-12-21 09:22:23 +00:00
Hassnaa Hamdi	f3dcc0cba9	[LLVM][AArch64][tblgen]: Match clamp pattern (#75529 ) Add isel pattern to replase min(max(v1,v2),v3) by clamp Add tests for uclamp, sclamp, bfclamp, fclamp.	2023-12-20 14:36:58 +00:00
Rin	0894c2ee5f	[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792 ) Avoid the pre-truncate of BUILD_VECTOR sources when there is more than one use. This can avoid using unnecessary movs later down the instruction selection pipeline.	2023-12-19 15:25:38 +00:00
Antonio Frighetto	9aeb3336fd	[AArch64] Ensure `SplatBitSize` conforms with the original lane width A miscompilation issue has been addressed with improved checking. Fixes: https://github.com/llvm/llvm-project/issues/75822.	2023-12-19 16:03:56 +01:00
Kerry McLaughlin	e9af57dfea	[Clang][SME2] Add builtins for moving multi-vectors to/from ZA (#71191 ) Adds the following SME2 builtins: - svread_hor/ver, - svwrite_hor/ver, - svread_za64, - svwrite_za64 See https://github.com/ARM-software/acle/pull/217	2023-12-19 13:51:10 +00:00
Nathan Sidwell	d0285a31c8	aarch64: fix testcase (#75723 ) Add missing < %s to RUN line.	2023-12-18 11:02:44 -05:00
Momchil Velikov	fd527def7e	[Clang][SVE2.1] Add floating-point variants of `svrevd_XX` (#75117 )	2023-12-18 15:52:28 +00:00
Stefan Pintilie	c398fa009a	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit f4b5be1ecdc85ca4257b739afb8d57e23c7a8030. The above change was breaking the clang-ppc64le-linux-test-suite bot.	2023-12-16 07:30:53 -06:00
chuongg3	70579c95bd	[AArch64][GlobalISel] Look into array's element (#74109 ) In AArch64RegisterBankInfo, IsFPOrFPType() does not work correctly with ArrayTypes and StructTypes as it does not not look at their elements. This caused some registers to be selected as gpr instead of fpr.	2023-12-15 10:46:57 +00:00
Matt Arsenault	f4b5be1ecd	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit 69c4930aad9659ec6ab846c8e7124d6afe044b1e. See if this sticks after a few more coalescer assertions are fixed.	2023-12-15 10:51:47 +07:00
Jon Roelofs	b071b70317	[GlobalISel] Always direct-call IFuncs and Aliases (#74902 ) This is safe because for both cases, the use must be in the same TU as the definition, and they cannot be forward declared.	2023-12-14 14:58:20 -07:00
Jon Roelofs	640c1d3dd1	[llvm] Support IFuncs on Darwin platforms (#73686 ) ... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time. Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.	2023-12-14 14:40:52 -07:00
DianQK	7649d22306	[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184 ) Follows https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782. Fixes #74680.	2023-12-14 19:19:55 +08:00
Simon Pilgrim	a0c7a29655	[GlobalISel] IRTranslator::translateGetElementPtr - don't assume a gep constant offset is representable as i64 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65052	2023-12-14 11:02:38 +00:00
CarolineConcatto	f2464ca317	[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926 ) This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]https://github.com/ARM-software/acle/pull/257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>	2023-12-13 15:45:59 +00:00
Momchil Velikov	d7ee99a4fc	[MachineSink] Clear kill flags of sunk addressing mode registers (#75072 ) When doing sink-and-fold, the MachineSink clears the "killed" flags of the operands of the sunk (and deleted) instruction. However, this is not always sufficient. In some cases we can create the new load/store instruction with operands other than the ones present in the deleted instruction. One such example is folding a zero word extend into a memory load on AArch64. The zero-extend is represented by a pair of instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The `SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new load instruction mentions operands "killed" in the `MOV`, which is no longer correct. To fix this, clear the "killed" flags of the registers participating in the addressing mode.	2023-12-13 09:15:28 +00:00
paperchalice	a930fec033	[CodeGen] Port `InterleavedLoadCombine` to new pass manager (#75164 )	2023-12-13 12:46:22 +08:00
Tuan Chuong Goh	32532c2bbe	[AArch64][GlobalISel] Test Pre-Commit for Look into array's element	2023-12-12 15:38:21 +00:00
Shreyansh Chouhan	5d12274646	[AArch64]: Added code for generating XAR instruction (#75085 ) Fixes #61584	2023-12-12 05:48:45 +00:00
paperchalice	ce08c7ee1e	[CodeGen] Port `SelectOptimize` to new pass manager (#74920 ) - Use `BlockFrequencyInfoWrapperPass` in legacy pass so member `std::unique_ptr<BranchProbabilityInfo> BPI` could be removed. - Member `DominatorTree *DT = nullptr` is unused, remove it.	2023-12-12 12:09:30 +08:00
Fangrui Song	072cea668e	[test] Change llc -march to -mtriple Similar to d20190e68413634b87f0f9426312a0e9d8456d18	2023-12-11 15:42:12 -08:00
James Y Knight	876816ff18	[AArch64] Set MaxAtomicSizeInBitsSupported. (#74385 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. Additionally, adjust some comments, and remove partial code dealing with larger-than-128bit atomics, as it's now unreachable. AArch64 always supports 128-bit atomics, so there's no conditionals needed here. (Though: we really ought to require that a 128-bit load is available, not just a cmpxchg, which would mean conditioning on LSE2. But that's future work.) The arm64-irtranslator.ll test was adjusted as it was using an i258 type as a hack to avoid IR atomic lowering to test GlobalISel behavior. Pass -mattr=+lse and use i32, instead, to accomplish that goal in a way that continues to work.	2023-12-11 17:55:07 -05:00
Jonathan Thackray	f576cbe44e	[AArch64] Correctly mark Neoverse N2 as an Armv9.0a core (#75055 ) Neoverse N2 was incorrectly marked as an Armv8.5a core. This has been changed to an Armv9.0a core. However, crypto options are not enabled by default for Armv9 cores, so -mcpu=neoverse-n2+crypto is required to enable crypto for this core. Neoverse N2 Technical Reference Manual: https://developer.arm.com/documentation/102099/0003/	2023-12-11 18:52:25 +00:00
Jay Foad	35ebd92d3d	[GlobalISel] Add G_PREFETCH (#74863 )	2023-12-11 11:06:50 +00:00
Serge Pavlov	18959c46e3	[NFC] Modify test to use autogenerated assertions	2023-12-11 14:38:01 +07:00
Oskar Wirga	9930f3e298	[AArch64] Fix case of 0 dynamic alloc when stack probing (#74877 ) I accidentally closed https://github.com/llvm/llvm-project/pull/74806 If the dynamic allocation size is 0, then we will still probe the current sp value despite not decrementing sp! This results in overwriting stack data, in my case the stack canary. The fix here is just to load the value of [sp] into xzr which is essentially a no-op but still performs a read/probe of the new page.	2023-12-10 08:01:29 -05:00
David Green	e3720bbc08	[AArch64] Extend and cleanup vector icmp test cases. NFC	2023-12-07 18:39:33 +00:00
Simon Pilgrim	f1200ca7ac	[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values (#74304 ) If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.	2023-12-07 14:56:12 +00:00
Sjoerd Meijer	3acbd38492	[AArch64] Optimise MOVI + CMGT to CMGE (#74499 ) This fixes a regression that occured for a pattern of MOVI + CMGT instructions, which can be optimised to CMGE. I.e., when the signed greater than compare has -1 as an operand, we can rewrite that as a compare greater equal than 0, which is what CMGE does. Fixes #61836	2023-12-07 08:32:02 +00:00
Thurston Dang	69c4930aad	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit 1f283a60a4bb896fa2d37ce00a3018924be82b9f. Reason: breaks MSan buildbot (https://lab.llvm.org/buildbot/#/builders/74/builds/24077)	2023-12-06 19:27:21 +00:00
Matthew Devereau	8186e1500b	[SME2] Add LUTI2 and LUTI4 single Builtins and Intrinsics (#73304 ) See https://github.com/ARM-software/acle/pull/217 Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-06 16:35:56 +00:00
Matt Arsenault	1f283a60a4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit 9e50c6e6b5741895f58f3e530004052844b6af9f. A few assertion and verifier errors have been fixed in the coalescer and allocator, so hopefully this sticks this time.	2023-12-06 23:07:22 +07:00
Matt Arsenault	546a9ce80c	CodeGen: Fix bypassing legality checks for IMPLICIT_DEF rematerialization (#73934 ) It's permitted to have extra implicit-def operands of the same main register after the main register def. If there are implicit operands, use the standard legality checks which verify the operand contents. Depends #73933	2023-12-06 21:43:19 +07:00
Matthew Devereau	30faf19a88	[SME2] Add LUTI2 and LUTI4 double Builtins and Intrinsics (#73305 ) See https://github.com/ARM-software/acle/pull/217 Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-06 14:35:11 +00:00
Matthew Devereau	6704d6aadd	[SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (#73317 ) See https://github.com/ARM-software/acle/pull/217 Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-06 10:08:04 +00:00
Pranav Taneja	41507fe595	[GISel] Combine (Scalarize) vector load followed by an element extract.	2023-12-06 11:23:23 +05:30
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Florian Hahn	58dcac3948	[AArch64] Check X16&X17 in prologue if the fn has an SwiftAsyncContext. (#73945 ) StoreSwiftAsyncContext clobbers X16 & X17. Make sure they are available in canUseAsPrologue, to avoid shrink wrapping moving the pseudo to a place where X16 or X17 are live.	2023-12-05 11:41:40 +00:00
Matt Arsenault	74c00d4329	LiveRangeEdit: Clear all dead flags when rematerializing (#73933 ) It's allowed to rematerialize instructions with implicit-defs of the same register as the single explicit def. If this happened, it was only clearing the dead flags on the one main result.	2023-12-05 10:40:13 +07:00

1 2 3 4 5 ...

7335 Commits