llvm-project

Author	SHA1	Message	Date
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
David Sherwood	c18fda02e1	[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414 )	2024-12-19 10:07:13 +00:00
DianQK	e7a4d78ad3	[SLP] Check if instructions exist after vectorization (#120434 ) Fixes #120433.	2024-12-19 06:21:57 +08:00
Florian Hahn	5ca3794e82	[VPlan] Move initial VPlan block creation to constructor. (NFC) This sets up the initial blocks needed to initialize a VPlan directly in the constructor. This will allow tracking of all created blocks directly in VPlan, simplifying block deletion.	2024-12-18 22:00:30 +00:00
Florian Hahn	6910aec097	[VPlan] Don't use VPlan ctor taking trip count in most unit tests (NFC). Update tests to use constructor not passing a trip count VPValue. The tests don't need that and are simpler as a result.	2024-12-18 19:57:09 +00:00
Florian Hahn	0e8d022ffe	[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260 ) Currently the addUsersInExitBlocks incorrectly assumes exit phis only have a single operand, which may not be the case for loops with early exits when they share a common exit block. Also further relax the assertion in fixupIVUsers to allow exit values if they come from theloop latch/middle.block. PR: https://github.com/llvm/llvm-project/pull/120260	2024-12-18 14:47:16 +00:00
Simon Pilgrim	fbc18b85d6	Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422 ) Reverts llvm/llvm-project#115209 - investigating a reported regression	2024-12-18 13:32:53 +00:00
David Sherwood	13107cb094	[LoopVectorize] Enable more early exit vectorisation tests (#117008 ) PR #112138 introduced initial support for dispatching to multiple exit blocks via split middle blocks. This patch fixes a few issues so that we can enable more tests to use the new enable-early-exit-vectorization flag. Fixes are: 1. The code to bail out for any loop live-out values happens too late. This is because collectUsersInExitBlocks ignores induction variables, which get dealt with in fixupIVUsers. I've moved the check much earlier in processLoop by looking for outside users of loop-defined values. 2. We shouldn't yet be interleaving when vectorising loops with uncountable early exits, since we've not added support for this yet. 3. Similarly, we also shouldn't be creating vector epilogues. 4. Similarly, we shouldn't enable tail-folding. 5. The existing implementation doesn't yet support loops that require scalar epilogues, although I plan to add that as part of PR #88385. 6. The new split middle blocks weren't being added to the parent loop.	2024-12-18 09:25:45 +00:00
hanbeom	b7a8d9584c	[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#115209 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask Original combining left the combine between vectors of different lengths as a TODO.	2024-12-18 07:47:42 +00:00
Luke Lau	c2a879ecaa	[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252 ) When building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes: `d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)` It was caused by this recipe: ``` WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> ``` Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow. The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue's cached type: ``` --- before --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: <badref> <- some value that was erased --- after --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> <- a new value that happens to have the same address ``` This fixes this by deferring the erasing of recipes till after the transformation. The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-12-18 11:28:28 +08:00
Luke Lau	4a7f60d328	[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194 ) This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V. A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths. Fix this by just using the opcode stored in the recipe itself. I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.	2024-12-18 11:28:07 +08:00
Florian Hahn	eb59fe8d04	[VPlan] Remove redundant assignment in VPReductionPHIRecipe (NFC) Suggested post-commit for 0e528ac404e13ed2d952a2d83aaf8383293c851e.	2024-12-17 21:32:40 +00:00
Florian Hahn	4ad0fdd163	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-17 20:44:32 +00:00
Simon Pilgrim	5287299f88	[VectorCombine] foldShuffleOfBinops - prefer same cost fold if it reduces instruction count (#120216 ) We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.	2024-12-17 18:10:20 +00:00
Nikita Popov	1157187496	[VPlan] Propagate all GEP flags (#119899 ) Store GEPNoWrapFlags instead of only InBounds and propagate them.	2024-12-17 13:48:50 +01:00
Florian Hahn	58cfa39861	[VPlan] Remove legacy VPlan() constructors (NFC). The constructors were retained to reduce the diff during transition. Remove them now.	2024-12-17 08:22:22 +00:00
Luke Lau	fba3e069b4	[VPlan] Remove overlapping VPInstruction::mayWriteToMemory. NFCI (#120039 ) VPInstruction has a definition of mayWriteToMemory, which seems to only be used by VPlanSLP. However VPInstructions are already handled in VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this definition. I think these should be the same for all intents and purposes. The VPRecipeBase definition is more conservative but returns true for stores/calls/invokes/SLPStores.	2024-12-17 11:02:55 +08:00
Florian Hahn	0e528ac404	[VPlan] Use start value operand for FindLastIV reduction phis. Update VPReductionPHIRecipe::execute to use the start value from the start value operand of the recipe. This is needed to make sure we resume from the correct value during epilogue vectorization. At the moment, the start value is set to the sentinel value in adjustRecipesForReductions, as the original start value needs to be used when creating ResumePhi recipes. Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.	2024-12-16 23:29:49 +00:00
Florian Hahn	f9120dc2a6	[VPlan] Make sure vector trip count is ready for prepareToExecute (NFC) Split off from https://github.com/llvm/llvm-project/pull/112145. This ensures that getOrCreateVectorTripCount creates the trip count as needed when induction resume value creation is moved to VPlan and no longer creates the vector trip count early.	2024-12-16 20:44:20 +00:00
Florian Hahn	89d5272841	[VPlan] Remove getPreheader(). (NFC) The preheader is now the entry block, connected to the vector.ph. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-16 19:48:02 +00:00
Simon Pilgrim	8217c2eaef	[VectorCombine] foldShuffleOfBinops - extend to handle icmp/fcmp ops as well (#120075 ) Extend binary instructions matching to match compare instructions + predicate as well.	2024-12-16 17:23:04 +00:00
Alexey Bataev	d1a7225076	[SLP]Check if the node must keep its original bitwidth Need to check if during previous analysis the node has requested to keep its original bitwidth to avoid incorrect codegen. Fixes #120076	2024-12-16 08:01:22 -08:00
Florian Hahn	95e509a989	[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008 ) This helps to simplify some existing code and new code (https://github.com/llvm/llvm-project/pull/112145) PR: https://github.com/llvm/llvm-project/pull/120008	2024-12-16 09:40:03 +00:00
Luke Lau	4746395bd7	[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668 ) I'm not sure if getStepVector was used for other things in the past where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe is the only user of it, and just passes zero to it. I presume InstCombine was already catching this so hopefully removing this won't affect codegen.	2024-12-16 11:55:48 +08:00
Florian Hahn	43045051d4	[VPlan] Modernize VPWidenIntOrFpInductionRecipe printing (NFC). Modernize VPWidenIntOrFpInductionRecipe printing by including the result VPValue and all operand VPValues, similar to VPScalarIVStepsRecipe and VPDerivedIVRecipe.	2024-12-15 20:46:52 +00:00
Florian Hahn	e64650d702	[VPlan] Get types and step from VPWidenPointerInductionRecipe (NFC). Use information directly from operands instead of going through IVDescriptor.	2024-12-15 18:52:10 +00:00
Florian Hahn	2067e604a4	[VPlan] Manage VPWidenPointerInduction debug location via recipe. Update VPWidenPointerInduction to manage its debug location via recipe. This makes sure we emit a proper debug location for VPWidenPointerInductionRecipes.	2024-12-15 14:41:07 +00:00
Florian Hahn	734a204fbd	[VPlan] Manage VPWidenIntOrFPInduction debug location via recipe (NFC). Properly set VPWidenIntOrFpInductionRecipe's debug location in the recipe and use it, instead of using the debug location of the underlying IR instruction.	2024-12-15 13:45:28 +00:00
Simon Pilgrim	916bae2d92	[VectorCombine] foldShuffleOfBinops - refactor to make it easier to match icmp/fcmp patterns NFC refactor to make it easier to also use the fold for icmp/fcmp patterns in a future patch - match the Shuffle with general Instruction operands and avoid explicit use of the BinaryOperator matches as much as possible for the general costing / fold.	2024-12-15 12:49:24 +00:00
Florian Hahn	2564f1e199	[VPlan] Simplify Not(Not(A)) -> A. Follow-up simplification to 5fae408d3a4c073ee4.	2024-12-14 20:08:26 +00:00
Simon Pilgrim	cc54a0ce56	[VectorCombine] vectorizeLoadInsert - only fold when inserting into a poison vector (#119906 ) We have corresponding poison tests in the "-inseltpoison.ll" sibling test files. Fixes #119900	2024-12-14 11:56:12 +00:00
Han-Kuan Chen	da439d3af4	[SLP] NFC. Refactor getEntryCost and isReverseOrder usage. (#119680 ) Users should check whether an input is empty before using isReverseOrder.	2024-12-14 02:01:25 +08:00
Ramkumar Ramachandra	4a0d53a0b0	PatternMatch: migrate to CmpPredicate (#118534 ) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.	2024-12-13 14:18:33 +00:00
Han-Kuan Chen	3133acf1fb	Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )" This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.	2024-12-12 20:38:31 -08:00
Han-Kuan Chen	82204154b7	[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )	2024-12-13 12:06:10 +08:00
Florian Hahn	4e828f8d74	[VPlan] Perform DT expensive input DT verification earlier (NFC). After 6c8f41d33674, DT adjustments for the skeleton are applied as VPBBs are executed. Move input DT verification up before starting to execute any VPBBs to avoid checking DT while the CFG and DT are in an incomplete state. This fixes a number of verification failures with expensive checks enabled, including https://lab.llvm.org/buildbot/#/builders/16/builds/10584	2024-12-12 20:06:20 +00:00
Han-Kuan Chen	2546ae4ed0	[SLP][REVEC] Fix the number of elements in the mask of a ShuffleVectorInst is not a power of 2. (#119689 ) The following shufflevector should not be vectorized when slp-vectorize-non-power-of-2 is enabled. shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 0, i32 1, i32 2> shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 4, i32 5, i32 6>	2024-12-13 02:22:41 +08:00
Florian Hahn	c95af0844d	[VPlan] Move ::getVectorLoopRegion out of ifdef (NFC). Fixes a build failure with assertions disabled after 6c8f41d336747.	2024-12-12 16:21:21 +00:00
Florian Hahn	6c8f41d336	[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292 ) As a first step to move towards modeling the full skeleton in VPlan, start by wrapping IR blocks created during legacy skeleton creation in VPIRBasicBlocks and hook them into the VPlan. This means the skeleton CFG is represented in VPlan, just before execute. This allows moving parts of skeleton creation into recipes in the VPBBs gradually. Note that this allows retiring some manual DT updates, as this will be handled automatically during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/114292	2024-12-12 15:58:16 +00:00
Kazu Hirata	2f8238f849	[llvm] Migrate away from PointerUnion::{is,get} (NFC) (#119679 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2024-12-12 07:54:48 -08:00
Simon Pilgrim	86779da52b	[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119695 ) Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions. To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together. This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model. Reapplied patch from #119559 - fixed use after free issue. Fixes #111431	2024-12-12 13:45:10 +00:00
Florian Hahn	a480d51722	[VPlan] Use existing vector trip count VPValue for resume phi (NFC) Instead of going through getOrAddLiveIn to get a VPValue for the vector trip count retrieve it directly from VPlan via getVectorTripCount. Small simplification following 0e70289f373.	2024-12-12 11:03:47 +00:00
Simon Pilgrim	b604d23feb	[VectorCombine] Pull out isa<VectorType> check. Noticed while investigating a crash in #119559 - we don't account for I being replaced and its Type being reallocated. So hoist the checks to the start of the loop.	2024-12-12 11:02:01 +00:00
Mel Chen	b3cba9be41	[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812 ) Consider the following loop: ``` int rdx = init; for (int i = 0; i < n; ++i) rdx = (a[i] > b[i]) ? i : rdx; ``` We can vectorize this loop if `i` is an increasing induction variable. The final reduced value will be the maximum of `i` that the condition `a[i] > b[i]` is satisfied, or the start value `init`. This patch added new RecurKind enums - IFindLastIV and FFindLastIV. --------- Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>	2024-12-12 16:48:31 +08:00
Luke Lau	b26fe5b7e9	[VPlan] Use variadic isa<> in a few more places. NFC (#119538 )	2024-12-12 13:26:39 +08:00
Michal Paszkowski	04313b86a5	Revert "[LoadStoreVectorizer] Postprocess and merge equivalence classes" (#119657 ) Reverts llvm/llvm-project#114501, due to the following failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4171	2024-12-11 20:36:23 -08:00
Vyacheslav Klochkov	fd2f8d485d	[LoadStoreVectorizer] Postprocess and merge equivalence classes (#114501 ) This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>	2024-12-11 19:01:35 -08:00
Florian Hahn	5fae408d3a	[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138 ) A more lightweight variant of https://github.com/llvm/llvm-project/pull/109193, which dispatches to multiple exit blocks via the middle blocks. The patch also introduces a bit of required scaffolding to enable early-exit vectorization, including an option. At the moment, early-exit vectorization doesn't come with legality checks, and is only used if the option is provided and the loop has metadata forcing vectorization. This is only intended to be used for testing during bring-up, with @david-arm enabling auto early-exit vectorization plugging in the changes from https://github.com/llvm/llvm-project/pull/88385. PR: https://github.com/llvm/llvm-project/pull/112138	2024-12-11 21:11:05 +00:00

1 2 3 4 5 ...

5312 Commits