llvm-project

Author	SHA1	Message	Date
Sander de Smalen	8aff167b34	[AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores. This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE, which also improves code quality for extending loads and truncating stores. Reviewed By: hassnaa-arm Differential Revision: https://reviews.llvm.org/D141266	2023-01-09 15:08:04 +00:00
Sander de Smalen	17a1936122	[AArch64] NFC: Align addTypeForStreamingSVE and addTypeForFixedLengthSVE This patch is NFC and just moves things around so their implementation is very similar.	2023-01-09 09:47:33 +00:00
Paul Walker	c9602e02fc	[SVE] Fix incorrect VT usage when lowering fixed length vector divides. Ensure the negation required when lowering negative power-of-two divides uses the scalable vector container type with the fixed length result extracted from it. Fixes: #59647 Differential Revision: https://reviews.llvm.org/D140563	2023-01-08 12:22:05 +00:00
Hassnaa Hamdi	9eb698946d	[AArch64][SME]: Make 'Expand' the default action for all Ops. By default expand all operations, then change to Custom/Legal if needed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D141068	2023-01-06 15:32:07 +00:00
Guillaume Chatelet	87b6b347fc	Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize" The patch should be discussed further. This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.	2023-01-06 15:27:50 +00:00
Guillaume Chatelet	dd56e1c92b	[NFC] Only expose getXXXSize functions in TypeSize Currently 'TypeSize' exposes two functions that serve the same purpose: - getFixedSize / getFixedValue - getKnownMinSize / getKnownMinValue source : `bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)` This patch offers to remove one of the two and stick to a single function in the code base. Differential Revision: https://reviews.llvm.org/D141134	2023-01-06 15:24:52 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Roman Lebedev	83288f8063	[AArch64] Custom lower `ISD::ZERO_EXTEND_VECTOR_INREG` The baseline legalization for `ISD::ZERO_EXTEND_VECTOR_INREG` (`VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG`), blends-in the zeros, but as mentioned e.g. in b4bd0a404fe26071dab0854dfd9767974909c7c4, there is no such thing for AArch64. So some of the shuffles that would be nicely lowered by `LowerVECTOR_SHUFFLE()`, e.g. into `ZIP1`, would now be unrecognizable after round-tripping through `ISD::ZERO_EXTEND_VECTOR_INREG` recognition & legalization. The most obvious solution is to just custom-lower `ISD::ZERO_EXTEND_VECTOR_INREG` as the `ZIP1`-with-zeros, like it would have been originally in that test case.	2022-12-26 22:54:03 +03:00
David Green	61b72f6abe	[AArch64] Add RSHRN and RSHRN2 patterns This adds some tablegen patterns for RSHRN, which performs a rounding shift with narrow. This is similar to the existing SHRN patterns with an extra addition to perform the rounding, that adds 1<<(shift-1) before the right shift. Because the round immediate and the shift amount are tied, it goes via a ComplexPattern that uses a SelectRoundingVLShr method to perform the selection checks. aarch64_neon_rshrn are expanded into the sequence of equivalent instructions (trunc(shr(add(x, 1<<(sht-1)), sht))) so that they can be converted back into RSHRN. Which also allows us to match raddhn through the adjusted patterns that previously used aarch64_neon_rshrn. DIfferential Revision: https://reviews.llvm.org/D140297	2022-12-22 16:49:19 +00:00
David Green	3e65ad7482	[AArch64] Combine Trunc(DUP) -> DUP This adds a simple fold of TRUNCATE(AArch64ISD::DUP) -> AArch64ISD::DUP, which can help generate more optimal UMULL sequences, and seems useful in general. Differential Revision: https://reviews.llvm.org/D140289	2022-12-21 14:59:59 +00:00
Peter Waller	6d877e6717	[AArch64][SVE][CodeGen] Prefer ld1r* over indexed-load when consumed by a splat If a load is consumed by a single splat, don't consider indexed loads. This is an alternative implementation to D138581. Depends on D139637. Differential Revision: https://reviews.llvm.org/D139850	2022-12-21 14:23:39 +00:00
David Green	3c0c24e0c1	[AArch64] Combine to UMULL if top bits are known zero Given mul(zext(a), b), we can convert to a umull so long as we know that the top bits of b are zero. This uses MaskedValueIsZero to detect that case for NEON UMULL patterns. Differential Revision: https://reviews.llvm.org/D140287	2022-12-20 13:50:34 +00:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00
Dinar Temirbulatov	7bce66edc6	[AArch64][SVE] Allow to lower WHILEop with constant operands to PTRUE This allows it to fold WHILEop with constant operand to PTRUE instruction in the case given range is fitted to predicate format. Also, this change fixes the unsigned overflow error introduced in D137547 for WHILELO lowering. Differential Revision: https://reviews.llvm.org/D139068	2022-12-18 01:27:03 +00:00
Archibald Elliott	82b51a1428	[AArch64] Support SLC in ACLE prefetch intrinsics This change: - Modifies the ACLE code to allow the new SLC value (3) for the prefetch target. - Introduces a new intrinsic, @llvm.aarch64.prefetch which matches the PRFM family instructions much more closely, and can represent all values for the PRFM immediate. The target-independent @llvm.prefetch intrinsic does not have enough information for us to be able to lower to it from the ACLE intrinsics correctly. - Lowers the acle calls to the new intrinsic on aarch64 (the ARM lowering is unchanged). - Implements code generation for the new intrinsic in both SelectionDAG and GlobalISel. We specifically choose to continue to support lowering the target-independent @llvm.prefetch intrinsic so that other frontends can continue to use it. Differential Revision: https://reviews.llvm.org/D139443	2022-12-16 14:42:27 +00:00
Nilanjana Basu	795868285d	[AArch64] Minor changes and sanity checks in relation to https://reviews.llvm.org/D135229	2022-12-16 01:39:29 +05:30
Nilanjana Basu	02d09ffc1b	[AArch64] Extending lowering of 'trunc <(8\|16) x i64> %x to <(8\|16) x i8>' to use tbl instructions [AArch64] Patch for lowering trunc instructions to 'tbl' for (8\|16)xi32 -> (8\|16)xi8 conversions in https://reviews.llvm.org/D133495 is extended to support trunc to tbl lowering for (8\|16) x i64 to (8\|16) x i8. A microbenchmark for runtime for these transformations is added in https://reviews.llvm.org/D136274 Reviewed by: fhahn, t.p.northover Differential Revision: https://reviews.llvm.org/D135229	2022-12-15 20:50:40 +05:30
Tim Northover	10d34f5538	AArch64: use CAS instead of LDXR/STXR if available This covers 128-bit loads, and atomicrmw operations without a single native instruction. Using CAS saves has a better chance of succeeding with high contention on some systems.	2022-12-14 12:16:40 +00:00
David Green	1da4d5aafa	[AArch64][SVE] Add hadd and rhadd support This adds basic HADD and RHADD support for SVE, by marking the AVGFLOOR and AVGCEIL as custom and converting those to HADD_PRED/RHADD_PRED AArch64 nodes. Both the existing intrinsics and the _PRED nodes are then lowered to the _ZPmZ instructions. Differential Revision: https://reviews.llvm.org/D131875	2022-12-14 09:24:54 +00:00
Martin Storsjö	899739cdbd	Revert "[AArch64][GlobalISel] Lower formal arguments of AAPCS & ms_abi variadic functions." This reverts commit 56fd846f370adf16bea333b12637038ea2f3c225. This commit regressed handling of functions with floats as arguments, reproducible e.g. like this: $ cat test.c double func(double f) { return f; } $ clang -target aarch64-windows -S -o - test.c -fno-asynchronous-unwind-tables func: sub sp, sp, #16 str x0, [sp, #8] ldr d0, [sp, #8] add sp, sp, #16 ret	2022-12-13 11:37:35 +02:00
Vladislav Dzhidzhoev	56fd846f37	[AArch64][GlobalISel] Lower formal arguments of AAPCS & ms_abi variadic functions. Reimplemented SelectionDAG code for GlobalISel. Fixes https://github.com/llvm/llvm-project/issues/54079 Differential Revision: https://reviews.llvm.org/D130903	2022-12-12 19:35:40 +03:00
Zain Jaffal	ebae917294	Recommit "[AArch64] Select SMULL for zero extended vectors when top bit is zero" This is a recommit of f9e0390751cb5eefbbbc191f851c52422acacab1 The previous commit failed to handle cases where the zero extended operand is an extended `BUILD_VECTOR`. We don't replace zext with a sext operand to select smull if any operand is `BUILD_VECTOR` Original commit message: we can safely replace a `zext` instruction with `sext` if the top bit is zero. This is useful because we can select `smull` when both operands are sign extended. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D134711	2022-12-12 14:45:54 +00:00
Zain Jaffal	dfc8ab2e25	Revert "Revert "[AArch64] Select SMULL for zero extended vectors when top bit is zero"" This reverts commit c07a01c2bb0d1f95689f809fd5be23829e364393.	2022-12-12 14:45:27 +00:00
Peter Waller	8812b6eed7	[AArch64][SVE][Fixed length] Fix div miscompile The prior code worked before SVE DIV was enabled 128 bit vectors. With 128 bit vectors, when run on a 256 bit machine, it would split and do a signed unpack, but this resulted in one full vector and one empty vector with a half-sized predicate. The effect was that only half the elements were treated correctly. The fix is to bisect the vector, sign extend, do the division, truncate and then concat. Fixes #59357. Differential Revision: https://reviews.llvm.org/D139618	2022-12-12 11:31:02 +00:00
Salvatore Dipietro	3a894fd90b	[AArch64] Lower READCYCLECOUNTER using MRS CNTVCT_EL0 As suggested in D12425 it would be better for the readcyclecounter function on ARM architecture to use the CNTVCT_EL0 register (Counter-timer Virtual Count register) instead of the PMCCNTR_EL0 (Performance Monitors Cycle Count Register) because the PMCCNTR_EL0 is a PMU register which, depending on the configuration, it might always return zeroes and it doesn't guaranteed to always be increased. Differential Revision: https://reviews.llvm.org/D136999	2022-12-09 10:36:16 +00:00
Nilanjana Basu	955c0f13cd	[AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl instructions Adding support for ZExt lowering for destination types beyond the existing support for (8\|16) x i32 Patch for lowering zext instructions to 'tbl' for (8\|16)xi8 -> (8\|16)xi32 conversions in https://reviews.llvm.org/D120571 is extended to support zext to 'tbl' lowering for Y x i8 to Y x i8X where X > 2 and X < 8, that is, any number of vector elements & any destination element type whose size is a multiple of 8 and lies between 16 & 64 is allowed for this transformation. Related microbenchmarks are in https://reviews.llvm.org/D136274 & https://reviews.llvm.org/D138059 Differential Revision: https://reviews.llvm.org/D136722	2022-12-09 13:55:25 +05:30
Kazu Hirata	8a7cbea525	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-08 23:22:00 -08:00
Florian Hahn	c07a01c2bb	Revert "[AArch64] Select SMULL for zero extended vectors when top bit is zero" This reverts commit f9e0390751cb5eefbbbc191f851c52422acacab1. The patch causes a crash for the IR below: target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" target triple = "arm64-apple-macosx" define void @test(ptr %data, <8 x i16> %v) { entry: %0 = sext <8 x i16> %v to <8 x i32> %1 = mul <8 x i32> %0, <i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584> %2 = lshr <8 x i32> %1, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %3 = trunc <8 x i32> %2 to <8 x i16> store <8 x i16> %3, ptr %data, align 2 ret void }	2022-12-08 10:28:33 +00:00
Zain Jaffal	f9e0390751	[AArch64] Select SMULL for zero extended vectors when top bit is zero we can safely replace a `zext` instruction with `sext` if the top bit is zero. This is useful because we can select `smull` when both operands are sign extended. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D134711	2022-12-08 09:06:18 +02:00
Sander de Smalen	5922a04dbd	[AArch64][SVE2p1] Make use of REVD instruction. Reversing double-words within a quard-word is possible using the REVD instruction when SVE2p1 is enabled. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D139119	2022-12-06 15:42:32 +00:00
Haojian Wu	6f12281d8e	Fix a -Wunused-variable warning in release build, NFC	2022-12-06 13:56:16 +01:00
Archibald Elliott	83b3304dd2	[AArch64] Implement __arm_rsr128/__arm_wsr128 This only contains the SelectionDAG implementation. GlobalISel to follow. The broad approach is: - Introduce new builtins for 128-bit wide instructions. - Lower these to @llvm.read_register.i128/@llvm.write_register.i128 - Introduce target-specific ISD nodes which have legal operands (two i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to match the instructions they are for. These are a little complex as they need to match the "shape" of what they're replacing or the legaliser complains. - Select these using the existing tryReadRegister/tryWriteRegister to share the MDString parsing code, and introduce additional code to ensure these are selected into the right MRRS/MSRR instructions. What makes this hard is ensuring that the two i64s end up in an XSeqPair register pair, because SelectionDAG doesn't care that much about register classes if it can avoid doing so. The main change to existing code is the reorganisation of tryReadRegister and tryWriteRegister to try to keep the string parsing code separate from the instruction creating code. This also includes the changes to clang to define and use the ACLE feature macro named `__ARM_FEATURE_SYSREG128`. Contributors: Sam Elliott Lucas Prates Differential Revision: https://reviews.llvm.org/D139086	2022-12-06 11:39:05 +00:00
Ties Stuij	94e7e58fa4	[AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available. See specs for corresponding immediate and register versions in: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138813	2022-12-06 10:57:49 +00:00
Ties Stuij	eaea4608e6	[AArch64] lower abs intrinsic to new ABS instruction in SelDag When feature CSSC is available, the SelectionDag abs intrinsic should map to the new scalar ABS instruction. Additionally, the SIMDTwoScalarD tablegen defm includes a pattern match for scalar i64, which we don't want to use when CSSC is enabled. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ABS--Absolute-value- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138812	2022-12-06 10:48:21 +00:00
Ties Stuij	2f778e60c9	[AArch64] SelectionDag codegen for gpr CTZ instruction When feature CSSC is available we should use instruction CTZ in SelectionDag where applicable: - CTTZ intrinsics are lowered to using the gpr CTZ instruction - BITREVERSE -> CTLZ instruction pattern gets replaced by CTZ spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138811	2022-12-06 10:42:07 +00:00
Sander de Smalen	4d2f0f723a	[AArch64][SME] Avoid going through memory for streaming-compatible splats Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D139111	2022-12-05 13:04:30 +00:00
Fangrui Song	b0df70403d	[Target] llvm::Optional => std::optional The updated functions are mostly internal with a few exceptions (virtual functions in TargetInstrInfo.h, TargetRegisterInfo.h). To minimize changes to LLVMCodeGen, GlobalISel files are skipped. https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 22:43:14 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Ties Stuij	82a5f1c62b	[AArch64] use CNT for ISD::popcnt and ISD::parity if available These are the two places where we explicitly want to use cnt in SelectionDAG when feature CSSC is available: ISD::popcnt and ISD::parity For both, we need to make sure we're emitting optimized code for i32 (and lower), i64 and i128. The most optimal way is of course using the GPR CNT instruction. If we don't have CSSC, but we do have neon, we'll use floating point CNT. If all fails, we'll fall back on the general GPR popcnt and parity implementations. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138808	2022-12-02 11:27:14 +00:00
Sander de Smalen	d32c9e8384	Reland "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca." Phabricator review for this patch was D138791	2022-12-01 14:48:30 +00:00
David Sherwood	4a5ccf4e93	Revert "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca." This reverts commit 279c0a83aa22cd35d4b7c7c52b85d2a86f2528a7.	2022-12-01 10:22:21 +00:00
Hassnaa Hamdi	2bda5a6287	[AArch64][SME][NFC]: Enable lowering truncate for enhancement. Enable lowering truncate to enhance the generated code.	2022-12-01 03:54:28 +00:00
Hassnaa Hamdi	279c0a83aa	[AArch64][SME]: Generate streaming-compatible code for ld2-alloca. To generate code compatible to streaming mode: - disable lowering interleaved load to avoid generating invalid NEON intrinsics. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D138791	2022-12-01 02:31:01 +00:00
Hassnaa Hamdi	02db5603ba	[AArch64][SME]: Generate streaming-compatible code for fp-extend-trunc To generate code compatible to streaming mode: - enable custome lowering for TruncStore to avoid crashing during legalizing TruncStore for non Integer vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D138720	2022-11-29 12:45:53 +00:00
Hassnaa Hamdi	9c7286f938	[AArch64][SME]: Generate streaming-compatible code for bit counting/select To generate code compatible to streaming mode: - enable custom-lowering ISD::CTLZ and ISD::CTPOP. - disable combining OR into BSL. - Testing files: - bit-counting.ll - bitselect.ll Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D138682	2022-11-29 12:24:21 +00:00
Nicola Lancellotti	49cd18c55e	Revert "[AArch64] Canonicalize ZERO_EXTEND to VSELECT" This reverts commit 43fe14c056458501990c3db2788f67268d1bdf38.	2022-11-28 16:37:30 +00:00
Hassnaa Hamdi	60ab791aa0	[AArch64][SME]: Generate streaming-compatible code for fp-compares. To generate code compatible to streaming mode: - enable expanding ISD::SETUEQ to avoid custom-lowering setcc to setcc_merge_zero which cause a crash while instruction selection because there is no pattern match for it. - Testing files: - fp-compares.ll Differential Revision: https://reviews.llvm.org/D138670	2022-11-28 11:21:40 +00:00
Roman Lebedev	453f27bc9e	[AArch64] `LowerBUILD_VECTOR()`: `NormalizeBuildVector()` might return non-BUILD_VECTOR As apparent in the newly-added test, provided in: `cf624b23bc (commitcomment-90836329)`, we should be more careful with handling wider vectors, or we will assert later on.	2022-11-26 18:46:36 +03:00
Kazu Hirata	23ca55231a	[AArch64] Use std::optional in AArch64ISelLowering.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-25 22:06:01 -08:00
Benjamin Maxwell	79b5829a15	[TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics This allows DemandedBits to see that the SVE count intrinsics (CNTB, CNTH, CNTW, CNTD) sans multiplier will only ever produce small positive integers. The maximum value you could get here is 256, which is CNTB on a machine with a 2048bit vector size (the maximum for SVE). Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated. Differential Revision: https://reviews.llvm.org/D138424	2022-11-25 10:15:14 +00:00

1 2 3 4 5 ...

1593 Commits