llvm-project

Author	SHA1	Message	Date
pvanhout	ae77aceba5	[Analysis] Remove DA & LegacyDA UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs. See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538 - Remove LegacyDivergenceAnalysis.h/.cpp - Remove DivergenceAnalysis.h/.cpp + Unit tests - Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs - Remove/adjust references to the passes in the docs where applicable - Remove TTI hook associated with those passes. - Move tests to UniformityAnalysis folder. - Remove RUN lines for the DA, leave only the UA ones. - Some tests had to be adjusted/removed depending on how they used the legacy DAs. Reviewed By: foad, sameerds Differential Revision: https://reviews.llvm.org/D148116	2023-04-17 09:01:22 +02:00
Simon Pilgrim	fb8038db73	[TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).	2023-04-13 11:26:28 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
Michael Liao	72fc08a541	[InstCombine] Teach alloca replacement to handle `addrspacecast` - As the address space cast may not be valid on a specific target, `addrspacecast` is not handled when an `alloca` is able to be replaced with the source of memcpy/memmove. This patch addresses that by querying a target hook on whether that address space cast is valid. For example, on most GPU targets, the cast from a global pointer to a generic pointer is valid. - If that cast is allowedd (by querying `isValidAddrSpaceCast`), the replacement is enhanced to handle that `addrspacecast` as well. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D147025	2023-04-11 11:47:37 -04:00
David Sherwood	b4089cfa2f	[NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface Given just how many arguments we pass to preferPredicateOverEpilogue and considering this list may grow over time I've decided to pass in a pointer to a new TailFoldingInfo structure instead, similar to what we do with IntrinsicCostAttributes, etc. In addition, many of the arguments we pass in are actually available in the LoopVectorizationLegality class so I've managed to reduce the set of pointers that we need to pass in the TailFoldingInfo struct. Differential Revision: https://reviews.llvm.org/D146127	2023-04-04 14:00:49 +00:00
Wang, Xin10	0bf75d8541	Handle the unexpected inputs for pass HardwareLoops For a function TryConvertLoop in pass HardwareLoops, wrong input arguments will lead to crash. There will be 3 cases. In line 342, compiler want to get something from HWLoopInfo.CountType, which depends on if argument Bitwidth is given, if not, will crash. In Function isHardwareLoopCandidate, it dereference CountType too. In Function InsertLoopDec, it dereference LoopDecrement. They all could lead to crash. This patch add condition to this pass, when we meet unexpected inputs then skip the pass. Reviewed By: samparker, fhahn Differential Revision: https://reviews.llvm.org/D146277	2023-03-30 10:42:19 +08:00
David Sherwood	1c4fedfa35	[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail Currently in LoopVectorize we avoid tail-folding if we can prove the trip count is always a multiple of the maximum fixed-width VF. This works because we know the vectoriser only ever chooses a VF that is a power of 2. However, if we are also considering scalable VFs then we conservatively bail out of the optimisation because we don't know the value of vscale, which could be an odd or prime number, etc. This patch tries to enable the same optimisation for scalable VFs by asking if vscale is known to be a power of 2. If so, we can then query the maximum value of vscale and use the same logic as we do for fixed-width VFs. I've also added a new TTI hook called isVScaleKnownToBeAPowerOfTwo that does the same thing as the existing TargetLowering hook. Differential Revision: https://reviews.llvm.org/D146199	2023-03-27 08:34:30 +00:00
Valery N Dmitriev	f9b438b519	[SLP] Outline GEP chain cost modeling into new TTI interface - NFCI. Cost modeling for GEPs should actually be target dependent but is currently done inside SLP target-independent way. Sinking it into TTI enables target dependent implementation. This patch adds new TTI interface and implementation of the basic functionality trying to retain existing cost modeling. Differential Revision: https://reviews.llvm.org/D144770	2023-03-14 14:01:34 -07:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Luke Lau	b02b1e0ed6	[LV][NFC] Use ElementCount for getMaxInterleaveFactor In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor. This is based off of the approach used here: `8d36708507` The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch. See https://reviews.llvm.org/D143723#4132349 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144474	2023-02-22 10:15:05 +00:00
Simon Tatham	a8cd35c3b7	[LowerTypeTests] Support generating Armv6-M jump tables. (reland) [Originally committed as f6ddf7781471b71243fa3c3ae7c93073f95c7dff; reverted in bbef38352fbade9e014ec97d5991da5dee306da7 due to test breakage; now relanded with the Arm tests conditioned on `arm-registered-target`] The LowerTypeTests pass emits a jump table in the form of an `inlineasm` IR node containing a string representation of some assembly. It tests the target triple to see what architecture it should be generating assembly for. But that's not good enough for `Triple::thumb`, because the 32-bit PC-relative `b.w` branch instruction isn't available in all supported architecture versions. In particular, Armv6-M doesn't support that instruction (although the similar Armv8-M Baseline does). Most of this patch is concerned with working out whether the compilation target is Armv6-M or not, which I'm doing by going through all the functions in the module, retrieving a TargetTransformInfo for each one, and querying it via a new method I've added to check its SubtargetInfo. If any function's TTI indicates that it's targeting an architecture supporting B.W, then we assume we're also allowed to use B.W in the jump table. The Armv6-M compatible jump table format requires a temporary register, and therefore also has to use the stack in order to restore that register. Another consequence of this change is that jump tables on Arm/Thumb are no longer always the same size. In particular, on an architecture that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb tables are different sizes from //each other//. As a consequence, ``getJumpTableEntrySize`` can no longer base its answer on the target triple's architecture: it has to take into account the decision that ``selectJumpTableArmEncoding`` made, which meant I had to move that function to an earlier point in the code and store its answer in the ``LowerTypeTestsModule`` class. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D143576	2023-02-20 10:46:47 +00:00
Simon Tatham	bbef38352f	Revert "[LowerTypeTests] Support generating Armv6-M jump tables." This reverts commit f6ddf7781471b71243fa3c3ae7c93073f95c7dff. Eight buildbots reported that the two test files changed by that commit had started failing. The buildbots in question all had in common that they build with a very restricted `LLVM_TARGETS_TO_BUILD`, such as only X86 or AArch64 or Hexagon. I didn't notice this before commit because my own build has the full default set of targets, and in that circumstance, the tests pass. I assume the problem has something to do with the attempt to query TargetTransformInfo: if you can't make a valid TTI for the target triple then you can't ask it what kind of inline assembler you should be emitting, and so `opt` without the Arm backend can't get the Arm cases of these tests right. I don't have time to fix this until next week, so I'll revert the change for now to keep the buildbots happy.	2023-02-16 17:11:06 +00:00
Simon Tatham	f6ddf77814	[LowerTypeTests] Support generating Armv6-M jump tables. The LowerTypeTests pass emits a jump table in the form of an `inlineasm` IR node containing a string representation of some assembly. It tests the target triple to see what architecture it should be generating assembly for. But that's not good enough for `Triple::thumb`, because the 32-bit PC-relative `b.w` branch instruction isn't available in all supported architecture versions. In particular, Armv6-M doesn't support that instruction (although the similar Armv8-M Baseline does). Most of this patch is concerned with working out whether the compilation target is Armv6-M or not, which I'm doing by going through all the functions in the module, retrieving a TargetTransformInfo for each one, and querying it via a new method I've added to check its SubtargetInfo. If any function's TTI indicates that it's targeting an architecture supporting B.W, then we assume we're also allowed to use B.W in the jump table. The Armv6-M compatible jump table format requires a temporary register, and therefore also has to use the stack in order to restore that register. Another consequence of this change is that jump tables on Arm/Thumb are no longer always the same size. In particular, on an architecture that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb tables are different sizes from //each other//. As a consequence, ``getJumpTableEntrySize`` can no longer base its answer on the target triple's architecture: it has to take into account the decision that ``selectJumpTableArmEncoding`` made, which meant I had to move that function to an earlier point in the code and store its answer in the ``LowerTypeTestsModule`` class. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D143576	2023-02-16 15:34:49 +00:00
Sander de Smalen	005311399e	[LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style. This NFC (intended) patch has several small changes: * It renames PredicationStyle to TailFoldingStyle. * It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle() * Simplifies some of its uses in the LoopVectorizer Rationale: To my surprise PredicationStyle::None did not mean 'no predication', but rather 'no active lane mask intrinsic', such that the predicate is created using a splat + compare with stepvector. The enum is also highly specific to tail folding, so it seems better to name this around that feature, i.e. 'tail folding style'. This also makes it more amenable to extend it to other tail folding styles, such as the one added in D142109. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D142887	2023-02-03 14:59:57 +00:00
Serguei Katkov	6188929dfb	[TTI][NFC] Introduce option to set predictable branch threshold Currently TargetTransformInfo::getPredictableBranchThreshold() method returns hardcoded value 99. This value affects the decision whether to convert select instruction to branch or not in several passes: SelectOptimize, CodeGenPrepare, SimplifyCFG. It would be useful to make possible to play with that threshold in order to test select-optimize heuristics. Option was originally introduced in the TargetLoweringBase, but was removed in the revision 664d0c052c315 and not restored in the TTI Patch Author: aleksandr.popov Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D143060	2023-02-02 17:54:56 +07:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
David Green	16a72a0f87	[AArch64] Enable the select optimize pass for AArch64 This enabled the select optimize patch for ARM Out of order AArch64 cores. It is trying to solve a problem that is difficult for the compiler to fix. The criteria for when a csel is better or worse than a branch depends heavily on whether the branch is well predicted and the amount of ILP in the loop (as well as other criteria like the core in question and the relative performance of the branch predictor). The pass seems to do a decent job though, with the inner loop heuristics being well implemented and doing a better job than I had expected in general, even without PGO information. I've been doing quite a bit of benchmarking. The headline numbers are these for SPEC2017 on a Neoverse N1: 500.perlbench_r -0.12% 502.gcc_r 0.02% 505.mcf_r 6.02% 520.omnetpp_r 0.32% 523.xalancbmk_r 0.20% 525.x264_r 0.02% 531.deepsjeng_r 0.00% 541.leela_r -0.09% 548.exchange2_r 0.00% 557.xz_r -0.20% Running benchmarks with a combination of the llvm-test-suite plus several versions of SPEC gave between a 0.2% and 0.4% geomean improvement depending on the core/run. The instruction count went down by 0.1% too, which is a good sign, but the results can be a little noisy. Some issues from other benchmarks I had ran were improved in rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted branches will see in improvement, badly predicted branches may get worse, and on average performance seems to be a little better overall. This patch enables the pass for AArch64 under -O3 for cores that will benefit for it. i.e. not in-order cores that do not fit into the "Assume infinite resources that allow to fully exploit the available instruction-level parallelism" cost model. It uses a subtarget feature for specifying when the pass will be enabled, which I have enabled under cpu=generic as the performance increases for out of order cores seems larger than any decreases for inorder, which were minor. Differential Revision: https://reviews.llvm.org/D138990	2022-12-03 16:08:58 +00:00
Krzysztof Parzyszek	86fe4dfdb6	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".	2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek	4e12d1836a	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.	2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek	b83711248c	TargetTransformInfo: convert Optional to std::optional	2022-12-02 11:27:12 -08:00
Krzysztof Parzyszek	26424c96c0	Attributes: convert Optional to std::optional	2022-12-02 08:15:45 -06:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
Shubham Narlawar	b920407cf5	[LICM] Disable thread-safety checks in single-thread model If the single-thread model is used, or the -licm-force-thread-model-single flag is specified, skip checks related to thread-safety. This means that store promotion for conditionally executed stores only requires proof of dereferenceability and writability, but not of thread-safety. For example, this enables promotion of stores to (non-constant) globals, as well as captured allocas. Fixes https://github.com/llvm/llvm-project/issues/50537. Differential Revision: https://reviews.llvm.org/D130466	2022-10-10 16:51:16 +02:00
Simon Pilgrim	c94cbc343e	Fix gcc warning about ambiguous if-else chain Fixes warnings introduced by D111968	2022-09-23 14:36:28 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Philip Reames	8c46881a53	[TTI] Recognize fp constants in getOperandInfo We were recognizing vectors of floats, but not scalars. That's a tad odd.	2022-09-21 14:34:34 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Simon Pilgrim	e2d140e9c3	[TTI] Add isExpensiveToSpeculativelyExecute wrapper CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead. This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.	2022-09-03 13:12:22 +01:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Philip Reames	27d3321c4f	[TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc] This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.	2022-08-22 11:26:31 -07:00
Philip Reames	274f86e7a6	[TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc] This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.	2022-08-22 11:06:32 -07:00
Philip Reames	c42a5f1cc2	[TTI] Migrate getOperandInfo to OperandVaueInfo [nfc] This is part of merging OperandValueKind and OperandValueProperties.	2022-08-22 10:19:02 -07:00
Philip Reames	5cd427106d	[TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc] OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so. This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.	2022-08-22 09:48:15 -07:00
Ting Wang	d2d77e050b	[PowerPC][Coroutines] Add tail-call check with call information for coroutines Fixes #56679. Reviewed By: ChuanqiXu, shchenz Differential Revision: https://reviews.llvm.org/D131953	2022-08-21 22:20:40 -04:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Philip Reames	b0a2c48e9f	[tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc]	2022-08-19 16:22:22 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Simon Pilgrim	1d522a39f7	[TTI] Remove getInstructionThroughput cost helper. Pulled out of D79483 - we can just as easily use getUserCost directly	2022-08-17 11:41:47 +01:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Fangrui Song	7d6017fd31	[TTI] Change new getVectorInstrCost overload to use const reference after D131114 A const reference is preferred over a non-null const pointer. `Type *` is kept as is to match the other overload. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D131197	2022-08-04 15:16:51 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
jacquesguan	e38af7ba95	[LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add. Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd. This patch does the following changes: 1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost. 2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul. Differential Revision: https://reviews.llvm.org/D130868	2022-08-02 16:02:38 +08:00
Stanislav Mekhanoshin	0562cf442f	Allow data prefetch into non-default address space I am playing with the LoopDataPrefetch pass and found out that it bails to work with a pointer in a non-zero address space. This patch adds the target callback to check if an address space is to be considered for prefetching. Default implementation still only allows address space 0, so this is NFCI. This does not currently affect any known targets, but seems to be generally useful for the future. Differential Revision: https://reviews.llvm.org/D129795	2022-07-27 10:01:26 -07:00
Malhar Jajoo	41958f76d8	[Costmodel] Add "type-based-intrinsic-cost" cli option This patch adds a command line flag to be able to test the type based cost-model analysis for Intrinsics. Differential Revision: https://reviews.llvm.org/D129109	2022-07-22 15:50:57 +01:00

1 2 3 4 5 ...

447 Commits