llvm-project

Author	SHA1	Message	Date
Florian Hahn	596a8d07d4	[AArch64] Add additional reassociation test. Add a test where the reassociation candidates are split across 2 blocks.	2023-01-09 16:38:19 +00:00
Florian Hahn	70520e2f1c	[AArch64] Add test showing reassociation potential. Add a test case where some ops of a reassociate-able expression are in an earlier block. This can appear in practice, e.g. when computing the final reduction value after vectorization.	2023-01-09 15:20:55 +00:00
Sander de Smalen	8aff167b34	[AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores. This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE, which also improves code quality for extending loads and truncating stores. Reviewed By: hassnaa-arm Differential Revision: https://reviews.llvm.org/D141266	2023-01-09 15:08:04 +00:00
David Green	07d6af6a71	[AArch64] Fold And/Or into CSel if possible If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we can emit `csinc x, ZR, cc`. This can help where we can not otherwise general ccmp instructions. Differential Revision: https://reviews.llvm.org/D141119	2023-01-09 11:52:37 +00:00
Tim Northover	5b24d42106	TailDuplication: do not remove trivial PHIs from addr-taken blocks. Unlike an anonymous block, it will not be removed even though we've resolved all valid paths to get here. So removing a PHI can leave vregs with no definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.	2023-01-09 11:12:33 +00:00
zhongyunde	9e83333445	[AArch64][SelectionDAG] Eliminates redundant zero-extension for 32-bit popcount Fix https://github.com/llvm/llvm-project/issues/59597. mov w8, w0 + fmov d0, x8 ==> fmov s0, w0 Reviewed By: dmgreen, efriedma Differential Revision: https://reviews.llvm.org/D140649	2023-01-09 16:08:16 +08:00
Serguei Katkov	fd64bd94ed	[Inline Spiller] Extend the snippet by statepoint uses Snippet is a tiny live interval which has copy or fill like def and copy or spill like use at the end (any of them might abcent). Snippet has only one use/def inside interval and interval is located in one basic block. When inline spiller spills some reg around uses it also forces the spilling of connected snippets those which got by splitting the same original reg and its def is a full copy of our reg or its last use is a full copy to our reg. The definition of snippet is extended to allow not only one use/def but more. However all other uses are statepoint instructions which will fold fill into its operand. That way we do not introduce new fills/spills. Reviewed By: qcolombet, dantrushin Differential Revision: https://reviews.llvm.org/D138093	2023-01-09 13:30:57 +07:00
Paul Walker	c9602e02fc	[SVE] Fix incorrect VT usage when lowering fixed length vector divides. Ensure the negation required when lowering negative power-of-two divides uses the scalable vector container type with the fixed length result extracted from it. Fixes: #59647 Differential Revision: https://reviews.llvm.org/D140563	2023-01-08 12:22:05 +00:00
David Green	0d4ab5de7f	[ARM][AArch64] Add tests for And/Or into CSel fold. NFC	2023-01-07 14:08:29 +00:00
James Y Knight	1ae36b1387	Remove special cases for invoke of non-throwing inline-asm. Non-throwing inline asm infers the nounwind attribute in instcombine. Thus, it can be handled in the same manner as non-throwing target functions are generally. Further special casing is unnecessary complexity.	2023-01-06 13:53:10 -05:00
Luke Lau	275658d1af	[SelectionDAG] Implicitly truncate known bits in SPLAT_VECTOR Now that D139525 fixes the Hexagon infinite loop, the stopgap can be removed to provide more information about known bits in SPLAT_VECTOR whose operands are smaller than the bit width (which is most of the time) Reviewed By: reames Differential Revision: https://reviews.llvm.org/D141075	2023-01-06 15:43:47 +00:00
Hassnaa Hamdi	9eb698946d	[AArch64][SME]: Make 'Expand' the default action for all Ops. By default expand all operations, then change to Custom/Legal if needed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D141068	2023-01-06 15:32:07 +00:00
Sanjay Patel	bf82070ea4	[SDAG] try to avoid multiply for X*Y==0 Forking this off from D140850 - https://alive2.llvm.org/ce/z/TgBeK_ https://alive2.llvm.org/ce/z/STVD7d We could almost justify doing this in IR, but consideration for "minsize" requires that we only try it in codegen -- the transform is not reversible. In all other cases, avoiding multiply should be a win because a mul is more expensive than simple/parallelizable compares. AArch even has a trick to keep instruction count even for some types. Differential Revision: https://reviews.llvm.org/D141086	2023-01-06 09:06:11 -05:00
Sanjay Patel	bd87b84a02	[AArch64] add tests for x*y == 0; NFC	2023-01-06 08:37:04 -05:00
Ties Stuij	0b066e02a6	[AArch64] add GlobalIsel support for scalar CNT instruction When feature CSSC is available we should use instruction CNT for s32, s64 and s128 types in GlobalIsel's G_CTPOP. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits- Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D139417	2023-01-06 11:08:34 +00:00
David Green	8b5d0361c0	[AArch64] Regenerate fp16-vector-nvcast.ll check lines. NFC	2023-01-05 18:16:58 +00:00
Craig Topper	11e92bd61f	[SelectionDAG] Improve codegen for udiv by constant if any divisors are 1. If the divisor is 1, the magic algorithm does not return a correct result and we end up using a select to pick the numerator for those elements at the end. Therefore we can use undef for that element of the earlier operations when the divisor is 1. We sometimes get this through SimplifyDemandedVectorElts, but not always. Definitely seems like we don't if the NPQ fixup is used. Unfortunately, DAGCombiner is unable to fold srl X, <0, undef> to X so I had to add flags to avoid emitting the srl unless one of the shift amounts is non-zero. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D141022	2023-01-05 08:41:44 -08:00
Ties Stuij	8d5b759a6c	[AArch64][GlobalISel] implement GPR (U/S)(MIN/MAX) instr support Lower umin, umax, smin, smax intrinsics to corresponding UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D139420	2023-01-05 11:50:31 +00:00
Jay Foad	0d518ae50c	[GlobalISel] New combine to commute constant operands to the RHS Differential Revision: https://reviews.llvm.org/D140907	2023-01-05 11:12:40 +00:00
Diana Picus	6ee4f253b2	[GlobalISel] Add G_BUILD_VECTOR[_TRUNC] to CSE Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in `shouldCSEOpc`. This simplifies the code generated for vector splats. Differential Revision: https://reviews.llvm.org/D140965	2023-01-05 10:15:31 +01:00
Roman Lebedev	41005b7ab2	[DAGCombiner] Do try to combine `ISD::ANY_EXTEND_VECTOR_INREG` nodes These weren't previously getting combined at all here, only in target-specific combines.	2023-01-05 01:12:31 +03:00
Roman Lebedev	317a1adfe4	[DAGCombiner] Fold _EXTEND_INREG of one of CONCAT_VECTORS operands into _EXTEND of operand This appears to be the root problematic pattern for AArch64 regression in D140677. We already do this, and many more, as target-specific X86 combines, so this isn't causing much of an impact.	2023-01-05 01:12:31 +03:00
Roman Lebedev	2b1d077592	[NFC][AArch64] Add some tests for upcoming patch	2023-01-05 01:12:31 +03:00
Matt Arsenault	bf4596bf58	CodeGen: Clean up some tests with broken "strictfp" attribute	2023-01-03 20:26:57 -05:00
Samuel Parker	615333bc09	[TypePromotion] NewPM support. Differential Revision: https://reviews.llvm.org/D140893	2023-01-03 15:09:29 +00:00
Roman Lebedev	4fc417ec37	[DAGCombiner] `convertBuildVecZextToBuildVecWithZeros()`: rework split factor calculation The original computation was both making assumptions that do not hold in practice, and being overly pessimistic. We should just check every possible split factor, and pick the best one. Fixes https://github.com/llvm/llvm-project/issues/59781	2023-01-02 18:34:35 +03:00
Roman Lebedev	16facf1ca6	[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do have some support to reconstruct scalarized loads, we clearly don't catch everything. The comment for the affected AArch64 store suggests that having two stores was the desired outcome in the first place. This was showing as a source of many regressions with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.	2022-12-31 03:49:43 +03:00
Roman Lebedev	e4d25a9c23	[DAG] BUILD_VECTOR: absorb ZERO_EXTEND of a single first operand if all other ops are zeros This kind of pattern seems to come up as regressions with better ZERO_EXTEND_VECTOR_INREG recognition. For initial implementation, this is quite restricted to the minimal viable transform, otherwise there are too many regressions to be dealt with.	2022-12-31 00:58:11 +03:00
Craig Topper	8abd70081f	[TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend. If the dividend has leading zeros, we can use them to reduce the size of the multiplier and avoid the fixup cases. This patch is for scalars only, but we might be able to do this for vectors in a follow up. Differential Revision: https://reviews.llvm.org/D140750	2022-12-29 13:58:46 -08:00
zhongyunde	c69d83908a	[AArch64][MachineScheduler] Set no side effect for movprfx The movprfx is a vector copy, so it doesn't access memory. Set the value of hasSideEffects 0 to avoid return true for the hasUnmodeledSideEffects(), which will block the machine scheduler which load/store instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D140680	2022-12-28 01:18:14 +08:00
Roman Lebedev	e26e7ed69a	[DAG] `combineShuffleToZeroExtendVectorInReg()`: try to match w/ commuted operands We don't have any reason to expect that the operand we will match is on any particular hand of the shuffle, so we should try both.	2022-12-26 22:54:03 +03:00
Roman Lebedev	83288f8063	[AArch64] Custom lower `ISD::ZERO_EXTEND_VECTOR_INREG` The baseline legalization for `ISD::ZERO_EXTEND_VECTOR_INREG` (`VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG`), blends-in the zeros, but as mentioned e.g. in b4bd0a404fe26071dab0854dfd9767974909c7c4, there is no such thing for AArch64. So some of the shuffles that would be nicely lowered by `LowerVECTOR_SHUFFLE()`, e.g. into `ZIP1`, would now be unrecognizable after round-tripping through `ISD::ZERO_EXTEND_VECTOR_INREG` recognition & legalization. The most obvious solution is to just custom-lower `ISD::ZERO_EXTEND_VECTOR_INREG` as the `ZIP1`-with-zeros, like it would have been originally in that test case.	2022-12-26 22:54:03 +03:00
Roman Lebedev	62fc5f1640	[DAGCombiner] Add a most basic `combineShuffleToZeroExtendVectorInReg()` Sometimes we end up with a shuffles in DAG that would be better represented as a `ISD::ZERO_EXTEND_VECTOR_INREG`, and a failure to do so causes suboptimal codegen in a number of cases, especially when we will then cast vector to scalar. I acknowledge, the test changes here are rather underwhelming, but as with all of codegen, it's always a yak shawing, and this is the most stripped down version of the patch that shows some effect without having insurmountable amount of fallout to deal with. The next change resolves this regression. The transformation will be extended in follow-ups.	2022-12-26 22:54:03 +03:00
Roman Lebedev	46458aadd4	[NFC][AArch64] Add a few vector shuffle tests that should be `zip1` At least, they are equivalent to the `@vzipNoBlend`, which is lowered into zip1.	2022-12-26 22:54:03 +03:00
Danila Malyutin	821a59588b	[TwoAddressInstruction] Constrain RegClass when processing a statepoint This transformation could've triggered a verifier assert if RegA and RegB were of different reg classes. Fix this by constraining as the comment for replaceRegWith suggests. Differential Revision: https://reviews.llvm.org/D140672	2022-12-26 19:00:34 +03:00
Roman Lebedev	110c5442b8	[NFC][Codegen] Add tests with oversized shifts by non-byte-multiple	2022-12-24 19:26:41 +03:00
Roman Lebedev	a9fbf25a14	[NFC][Codegen] Rename tests for oversized shifts by byte multiple	2022-12-24 19:26:41 +03:00
Roman Lebedev	387c1573f8	[NFC][Codegen] Tests with wide scalar shifts, for new potential legalization strategy	2022-12-24 00:47:25 +03:00
Jessica Paquette	7ef8f9c972	[IR/MachineOutliner] Add a "nooutline" function attr and respect it Add `nooutline` + update LangRef to say it exists. This makes it possible to say "don't outline from this function ever." We want to be able to toggle whether or not a function should be in the search set regardless of default behaviour. Add testcases for the IR Outliner + Machine Outliner. Also remove an unnecessary check for an empty function in the Machine Outliner. Differential Revision: https://reviews.llvm.org/D140438	2022-12-22 10:22:08 -08:00
David Green	61b72f6abe	[AArch64] Add RSHRN and RSHRN2 patterns This adds some tablegen patterns for RSHRN, which performs a rounding shift with narrow. This is similar to the existing SHRN patterns with an extra addition to perform the rounding, that adds 1<<(shift-1) before the right shift. Because the round immediate and the shift amount are tied, it goes via a ComplexPattern that uses a SelectRoundingVLShr method to perform the selection checks. aarch64_neon_rshrn are expanded into the sequence of equivalent instructions (trunc(shr(add(x, 1<<(sht-1)), sht))) so that they can be converted back into RSHRN. Which also allows us to match raddhn through the adjusted patterns that previously used aarch64_neon_rshrn. DIfferential Revision: https://reviews.llvm.org/D140297	2022-12-22 16:49:19 +00:00
David Green	440d71f7b7	[AArch64] Additional RSHRN pattern tests. NFC	2022-12-21 18:32:52 +00:00
David Green	3e65ad7482	[AArch64] Combine Trunc(DUP) -> DUP This adds a simple fold of TRUNCATE(AArch64ISD::DUP) -> AArch64ISD::DUP, which can help generate more optimal UMULL sequences, and seems useful in general. Differential Revision: https://reviews.llvm.org/D140289	2022-12-21 14:59:59 +00:00
Peter Waller	6d877e6717	[AArch64][SVE][CodeGen] Prefer ld1r* over indexed-load when consumed by a splat If a load is consumed by a single splat, don't consider indexed loads. This is an alternative implementation to D138581. Depends on D139637. Differential Revision: https://reviews.llvm.org/D139850	2022-12-21 14:23:39 +00:00
Ties Stuij	50ddc8cca6	[AArch64] GlobalIsel codegen for gpr CTZ If feature CSSC is available, CTTZ intrinsics are lowered using the CTZ instruction when using GlobalIsel. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros- Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D139418	2022-12-21 11:31:50 +00:00
bipmis	eb7b8e3e2a	[AAch64] Optimize muls with operands having enough zero bits. Fix the regression in the reported test case lagarith-preproc.c. Specfic to the incorrect umsubl generation. Differential Revision: https://reviews.llvm.org/D139411	2022-12-21 11:14:45 +00:00
Roman Lebedev	3a8e009f97	Revert "Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block"" One of these two changes is exposing (or causing) some more miscompiles. A reproducer is in progress, so reverting until resolved. This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.	2022-12-20 18:36:42 +03:00
David Green	752819e813	[AArch64][ARM] Remove load from dup and vmul tests. NFC These tests needn't use loads in their testing of dup and mul instructions, and as the load changes the test may no longer test what they are intending (as in D140069).	2022-12-20 15:23:38 +00:00
KAWASHIMA Takahiro	347d2be7be	[AArch64] Add Neon int instructions to isAssociativeAndCommutative Differential Revision: https://reviews.llvm.org/D139810	2022-12-20 23:47:51 +09:00
KAWASHIMA Takahiro	673b4ad645	[AArch64] Add FP16 instructions to isAssociativeAndCommutative `-mcpu=` in `llvm/test/CodeGen/AArch64/machine-combiner.ll` is changed to `neoverse-n2` to use FP16 and SVE/SVE2 instructions. By this, the register allocation and/or instruction scheduling are slightly changed and some existing `CHECK` lines need to be updated. Differential Revision: https://reviews.llvm.org/D139809	2022-12-20 23:47:51 +09:00
bipmis	454997d396	[AAch64] Optimize muls with operands having enough zero bits. Muls with 64bit operands where each of the operand is having top 32 bits as zero, we can generate a single umull instruction on a 32bit operand. Differential Revision: https://reviews.llvm.org/D139411	2022-12-20 14:34:17 +00:00

1 2 3 4 5 ...

6291 Commits