llvm-project

Author	SHA1	Message	Date
serge-sans-paille	16544cbe64	[iwyu] Move <cmath> out of llvm/Support/MathExtras.h Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of that header and include it when needed. No functional change intended, but there's no longer a transitive include fromMathExtras.h to cmath.	2022-09-28 20:49:01 +02:00
Philip Reames	899ebd7e99	[LV] Remove two unused default arguments [nfc]	2022-09-27 14:33:53 -07:00
Philip Reames	dc7387b587	[LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated. The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use. Differential Revision: https://reviews.llvm.org/D134460	2022-09-27 07:28:40 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Florian Hahn	dcbc8a0daa	[LV] Remove unused widenCallInstruction declaration (NFC). The definition and uses have been removed a while ago. Clean up the unused declaration.	2022-09-20 15:20:28 +01:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Craig Topper	90a004b4a1	[LV] Remove FIXME about NoImplicitFloat. NFC My understanding is that NoImplicitFloat, despite it's name, is supposed to disable all vectors not just float vectors. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134084	2022-09-19 10:01:02 -07:00
Florian Hahn	ac80b0e84f	[LV] Mark Instr as const in scalarizeInstruction. (NFC). This is to reduce the diff in follow-up changes.	2022-09-13 09:10:02 +01:00
Florian Hahn	69d9bb2aad	[VPlan] Check recipe uses instead of type of underlying instr (NFC). Suggested by @Ayal post-commit, to reduce the dependence on the underlying instruction in favor of information available directly for the recipe.	2022-09-11 12:24:44 +01:00
Florian Hahn	da734473fa	[LV] Remove now dead variable after 2a78890b7b7f08 (NFC).	2022-09-09 20:25:55 +01:00
Florian Hahn	2a78890b7b	[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC). Use VPExpandSCEVRecipe to expand the step of pointer inductions. This cleanup addresses a corresponding FIXME. It should be NFC, as steps for pointer induction must be constants, which makes expansion trivial.	2022-09-09 19:20:13 +01:00
Philip Reames	a33d98e20a	[LV] Pull out common expression [nfc]	2022-09-09 07:31:46 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00
Florian Hahn	408ebe5e3a	[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC). Depends on D132585. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132586	2022-09-05 10:48:29 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Philip Reames	8936d86469	[LV] Add debug output for force scalar tracing [nfc] I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.	2022-08-29 15:17:51 -07:00
Florian Hahn	c78696813f	[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC). Suggested as independent fix during the review of D132585.	2022-08-29 10:19:47 +01:00
Kazu Hirata	56ea4f9bd3	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-27 21:21:02 -07:00
Kazu Hirata	a33ef8f2b7	Use llvm::all_equal (NFC)	2022-08-27 09:53:10 -07:00
Philip Reames	3dcec5e29f	[LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc] I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them. For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.	2022-08-26 11:09:17 -07:00
Philip Reames	2d5f025779	[LV] Extract utility for checking if VPValue is uniform [nfc]	2022-08-26 09:56:13 -07:00
Daniil Fukalov	9c710ebbdb	[TTI] NFC: Reduce InstructionCost::getValue() usage... in order to propagate `InstructionCost` value upper. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103406	2022-08-26 16:37:32 +03:00
Philip Reames	23245a914b	[LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc]	2022-08-24 11:42:10 -07:00
Philip Reames	3ab00cfca9	[LV] Adjust code added in f79214d1 for 531dd3634 [nfc] When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634. Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst. I noticed this while doing a follow up change.	2022-08-24 10:38:17 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
Florian Hahn	ff34432649	[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC). The argument is no longer used, remove it.	2022-08-23 10:15:28 +01:00
Philip Reames	27d3321c4f	[TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc] This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.	2022-08-22 11:26:31 -07:00
Philip Reames	c42a5f1cc2	[TTI] Migrate getOperandInfo to OperandVaueInfo [nfc] This is part of merging OperandValueKind and OperandValueProperties.	2022-08-22 10:19:02 -07:00
Philip Reames	5cd427106d	[TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc] OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so. This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.	2022-08-22 09:48:15 -07:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Philip Reames	b0a2c48e9f	[tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc]	2022-08-19 16:22:22 -07:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Kazu Hirata	50724716cd	[Transforms] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-14 12:51:58 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Philip Reames	569a7f6aa3	[LV] Move definition of isPredicatedInst out of line and make it const [nfc]	2022-08-03 08:53:11 -07:00
Philip Reames	a1cab0daae	[LV] Use cost base decision for uniform mem op strategy [nfc-ish] This is mostly a stylistic change to make the uniform memop widening cost code fit more naturally with the sourounding code. Its not strictly speaking NFC as I added in the store with invariant value case, and we could in theory have a target where a gather/scatter is cheaper than a single load/store... but it's probably NFC in practice. Note that the scatter/gather result can still be overriden later if the result is uniform-by-parts.	2022-08-03 07:47:24 -07:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
jacquesguan	e38af7ba95	[LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add. Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd. This patch does the following changes: 1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost. 2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul. Differential Revision: https://reviews.llvm.org/D130868	2022-08-02 16:02:38 +08:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Kazu Hirata	95a932fb15	Remove redundaunt override specifiers (NFC) Identified with modernize-use-override.	2022-07-24 22:28:11 -07:00

1 2 3 4 5 ...

1733 Commits