llvm-project

Author	SHA1	Message	Date
Krzysztof Parzyszek	86fe4dfdb6	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".	2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek	4e12d1836a	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.	2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek	b83711248c	TargetTransformInfo: convert Optional to std::optional	2022-12-02 11:27:12 -08:00
David Green	f2a92db29e	[AArch64] Don't treat SVE scalable extends as free widening instructions The logic in isWideningInstruction handles instructions like uaddw and smull, where 'add(x, zext(y))' or 'mul(sext(x), sext(y))' can be converted to single instructions, making the extends free. This doesn't apply the same to SVE instructions though. https://godbolt.org/z/695d3nhGd (There are instructions like SMULLT/B, but they require top/bottom lane interleaving. That is similar to MVE instructions, which required a special pass to perform the lane interleaving). This patch just bails out of the call to isWideningInstruction if the vector is scalable, getting a more accurate cost. Differential Revision: https://reviews.llvm.org/D138591	2022-11-30 13:09:48 +00:00
Nicola Lancellotti	49cd18c55e	Revert "[AArch64] Canonicalize ZERO_EXTEND to VSELECT" This reverts commit 43fe14c056458501990c3db2788f67268d1bdf38.	2022-11-28 16:37:30 +00:00
Zain Jaffal	6e4cea55f0	[AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constant Currently the model over estimates the cost of a udiv instruction with one constant. The correct cost for a udiv instruction is insert_cost * extract_cost * num_elements Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D135991	2022-11-28 10:38:17 +02:00
Manuel Brito	1e55d5b1f2	Use poison instead of undef as placeholder for vector construction [NFC] Differential Revision: https://reviews.llvm.org/D138450	2022-11-21 18:43:23 +00:00
Bradley Smith	daf1a1f690	[AArch64][SVE] Add instcombine to convert ptest.last/first to ptest.any This allow for better optimization later in the backend. This fixes the remaining missed optimizations in D137717. Depends on D137930 Differential Revision: https://reviews.llvm.org/D137947	2022-11-15 15:59:21 +00:00
Cullen Rhodes	50621169ae	[AArch64][SVE] Extend PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X)) instcombine Extend above instcombine added in D134946 to cover more flag-setting instructions. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136438	2022-11-04 08:58:15 +00:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Nicola Lancellotti	43fe14c056	[AArch64] Canonicalize ZERO_EXTEND to VSELECT Differential Revision: https://reviews.llvm.org/D135596	2022-10-17 15:42:46 +01:00
Cullen Rhodes	388cacb341	[AArch64][SVE] Add instcombine for PTEST_ANY(X=OP(PG,...), X) -> PTEST_ANY(PG, X)) Given this is an OR reduction the two are equivalent and later optimizations (AArch64InstrInfo::optimizePTestInstr) may rewrite the sequence to use the flag-setting variant of instruction X, to remove the PTEST altogether. Reviewed By: paulwalker-arm, bsmith Differential Revision: https://reviews.llvm.org/D134946	2022-10-12 09:14:08 +00:00
Florian Hahn	eba84971ae	Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store" This reverts commit 1c62af3e23cab41074f7ce0ba86a93bea82b99b9. The commit causes the test below to fail. Revert for now to get the bots back to green. Failing test: lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll	2022-09-28 15:35:13 +01:00
liqinweng	1c62af3e23	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-09-28 19:40:29 +08:00
Hassnaa Hamdi	181f200a1c	[NFC]: AArch64-SVE modify some comments	2022-09-23 12:07:31 +00:00
Caroline Concatto	5431bf27bd	[AArch64]Remove svget/svset/svcreate from llvm This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm. It also implements the InstCombine for vector.extract that used to be in svget. Depends on: D131547 Differential Revision: https://reviews.llvm.org/D131548	2022-09-23 10:48:43 +01:00
Hassnaa Hamdi	f2072e0ae0	[AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations Differential Revision: https://reviews.llvm.org/D132477	2022-09-22 16:50:55 +00:00
David Sherwood	64bef3d568	[AArch64][SME] Disable inlining when SME attributes require smstart/smstop or lazy-save. Inlining must be disabled when the call-site needs to toggle PSTATE.SM or when the callee's function body is executed in a different streaming mode than its caller. This is needed because function calls are the boundaries for streaming mode changes. More details about the SME attributes and design can be found in D131562. Differential Revision: https://reviews.llvm.org/D131581	2022-09-21 09:35:47 +01:00
Mingming Liu	8aa800614b	[AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR. Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers. See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion. Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g., returning zero cost if type is legalized to scalar). [1] `2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)` [2] `2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)` Differential Revision: https://reviews.llvm.org/D128302	2022-09-09 09:47:30 -07:00
David Green	3875c38adf	[AArch64] Fix formatting of the Shuffle Cost tables. NFC	2022-09-08 19:54:12 +01:00
liqinweng	723245bfac	[AARCH64][COST] Improve cost of reverse shuffles for AArch64 Update the comments for reverse shuffles and add tests Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132730	2022-09-08 18:55:49 +08:00
Eli Friedman	b219a9c0a2	[CostModel][AArch64] Fix ctpop intrinsic cost when NEON is disabled. If we don't have NEON, we use the generic fallback, which takes 12 instructions. Make sure the costs reflect that. (On a related note, we could optimize the generic fallback a bit. It currently uses sequences like lsr+and+add; if we use and+lsr+add instead, we can fold the lsr into the add.) Differential Revision: https://reviews.llvm.org/D133154	2022-09-02 15:17:55 -07:00
Mingming Liu	242203d254	[AArch64][TTI] Add cost table entry for trunc over vector of integers. 1) Tablegen patterns exist to use 'xtn' and 'uzp1' for trunc [1]. Cost table entries are updated based on the actual number of {xtn, uzp1} instructions generated. 2) Without this, an IR instruction like trunc <8 x i16> %v to <8 x i8> is considered free and might be sinked to other basic blocks. As a result, the sinked 'trunc' is in a different basic block with its (usually not-free) vector operand and misses the chance to be combined during instruction selection. (examples in [2]) 3) It's a lot of effort to teach CodeGenPrepare.cpp to sink the operand of trunc without introducing regressions, since the instruction to compute the operand of trunc could be faster (e.g., throughput) than the instruction corresponding to "trunc (bin-vector-op". For instance in [3], sinking %1 (as trunc operand) into bb.1 and bb.2 means to replace 2 xtn with 2 shrn (shrn has a throughput of 1 and only utilize v1 pipeline), which is not necessarily good, especially since ushr result needs to be preserved for store operation in bb.0. Meanwhile, it's too optimistic (for CodeGenPrepare pass) to assume machine-cse will always be able to de-dup shrn from various basic blocks into one shrn. [1] For {v8i16->v8i8, v4i32->v4i16, v2i64->v2i32}, `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4472)`. For concat (trunc, trunc) -> uzip1, `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L5428-L5437)` [2] examples - trunc(umin(X, 255)) -> UQXTRN v8i8 (and other {u,s}x{min,max} pattern for v8i16 operands) from `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L4515-L4528)` - trunc (AArch64vlshr v8i16, imm) -> SHRNv8i8 (same missed for SHRNv2i32) from `813ae2871d/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L6743-L6748)` [3] --- ; instruction latency / throughput / pipeline on `neoverse-n1` bb.0: %1 = lshr <8 x i16> %10, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4> ; ushr, latency 2, throughput 1, pipeline V1 %2 = trunc <8 x i16> %1 to <8 x i8> ; xtn, latency 2, throughput 2, pipeline V %3 = store <8 x i8> %1, ptr %addr br cond i1 cond, label bb.1, label bb.2 bb.1: %4 = trunc <8 x i16> %1 to <8 x i8> ; xtn bb.2: %5 = trunc <8 x i16> %1 to <8 x i8> ; xtn --- Differential Revision: https://reviews.llvm.org/D132784	2022-09-02 10:06:55 -07:00
Paul Walker	3bb228729f	[CostModel][SVE] Correct cost model of SK_Splice shuffles for <vscale x 1 x Ty> vector types. AArch64TTIImpl::getSpliceCost() is now used more aggressively and LNT (MultiSource/Benchmarks/mafft) exposed a failure case for <vscale x 1 x i1>. I've tested other element types and whilst they can be costed they cannot be code generated, so this patch returns InstructionCost::getInvalid() for all cases.	2022-08-26 16:06:01 +01:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Philip Reames	478cf94378	[X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc] This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.	2022-08-22 13:37:59 -07:00
David Green	0cf9e47f27	[AArch64] Add SK_Splice fixed-width costs A fixed length SK_Splice shuffle vector is lowered to a Ext under AArch64, which should have a cost of 1. Differential Revision: https://reviews.llvm.org/D132299	2022-08-22 12:44:57 +01:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
chendewen	7eeb468ae5	[Aarch64] Add cost for missing extensions. This patch adds a cost estimate for some missing sign extensions. ref: https://reviews.llvm.org/D14730 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D130565	2022-07-28 17:34:00 +08:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
Cullen Rhodes	7c3cda551a	[AArch64][SVE] Prefer SIMD&FP variant of clast[ab] The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures: Core Scalar SIMD&FP -------------------------------- Neoverse V1 9 cyc 3 cyc Neoverse N2 8 cyc 3 cyc Cortex A510 8 cyc 4 cyc A64FX 29 cyc 6 cyc	2022-07-13 08:53:36 +00:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
David Green	fb4d3d238f	[AArch64] Remove unnecessary funnel shift sve costs. D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values and line up with the scalar/neon costs better. Remove the lines again to clean up the code, they can be added back at a later date with better values if needed.	2022-06-21 12:21:37 +01:00
Philip Reames	db85345f2d	[BasicTTI] Allow generic handling of scalable vector fshr/fshl This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9aba8, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue. Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing. AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target. Differential Revision: https://reviews.llvm.org/D127680	2022-06-20 10:38:51 -07:00
Tiehu Zhang	b329156f4f	[AArch64][LV] AArch64 does not prefer vectorized addressing TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses. This patch specializes the hook for AArch64, to return true only when we enable SVE. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D124612	2022-06-17 18:32:50 +08:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Bradley Smith	5f4541fefb	[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic Differential Revision: https://reviews.llvm.org/D125233	2022-05-19 14:07:59 +00:00
Florian Hahn	17a73992dd	[AArch64] Remove redundant f{min,max}nm intrinsics. The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a. This helps with simplifying code written using the ACLE, e.g. see https://godbolt.org/z/jYxsoc89c Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125234	2022-05-10 19:57:43 +01:00
David Green	dccc69a38d	[AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D124786	2022-05-06 18:23:36 +01:00
David Green	2dcb2d8562	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357	2022-05-02 11:36:05 +01:00
David Kreitzer	6918a15f43	Test commit. Fixed a typo in a comment.	2022-04-29 16:18:09 -07:00
David Green	46cef9a82d	[AArch64] Attempt to fix bots by ensuring legalized type is a vector	2022-04-27 15:36:15 +01:00
David Green	8e2a0e61f5	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414	2022-04-27 13:51:50 +01:00

1 2 3 4 5 ...

320 Commits