llvm-project

Author	SHA1	Message	Date
Sam Tebbs	8be3fc0f5c	[AArch64] Disallow vscale x 1 partial reductions (#125252 ) We don't want to allow partial reductions resulting in a vscale x 1 type as we can't lower it in the backend. (cherry picked from commit c7995a6905f2320f280013454676f992a8c6f89f)	2025-02-05 14:37:39 +00:00
Stephen Tozer	822f74a911	[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767 ) This patch contains a number of changes relating to the above flag; primarily it updates comment references to the old flag names, "-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names, "-fextend-variable-liveness[={all,this}]". These changes are all NFC. This patch also removes the explicit -fextend-this-ptr-liveness flag alias, and shortens the help-text for the main flag; these are both changes that were meant to be applied in the initial PR (#110000), but due to some user-error on my part they were not included in the merged commit.	2025-01-28 18:25:32 +00:00
Alexey Bataev	947d8ebbf3	[SLP]Unify getNumberOfParts use Adds getNumberOfParts and uses it instead of similar code across code base, fixes analysis of non-vectorizable types in computeMinimumValueSizes. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124774	2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra	d76ea250c8	Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320 ) Changes: There was a serious bug in the previous patch, leading to a miscompile. See #122723 for the miscompile report from Alexander, and the follow-up investigation by Nikita. The patch has since been reworked, and now includes the testcase from the miscompile. Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of one of the FIXMEs it introduced by replacing a predicate comparison with CmpPredicate::getMatching. Co-authored-by: Nikita Popov <npopov@redhat.com>	2025-01-28 16:53:01 +00:00
Florian Hahn	3007f31e74	[LoopUnroll] Add AArch64 tests for multi-exit loop unrolling. Test coverage to https://github.com/llvm/llvm-project/pull/124751.	2025-01-28 14:25:27 +00:00
Alexey Bataev	a1ab5b4c87	[SLP]Check the MainOp matches the requirements for the instructions Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.	2025-01-28 06:00:52 -08:00
Alexey Bataev	1d5fbe83c3	[SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash	2025-01-28 05:45:13 -08:00
Nicholas Guy	cdea38f91a	Reland "[LoopVectorizer] Add support for chaining partial reductions #120272 " (#124282 ) Change `getScaledReduction` to take an existing vector, rather than creating and returning a new one each call. Rename `getScaledReduction` to `getScaledReductions` to more accurately reflect what it's now doing. --------- Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>	2025-01-28 10:40:35 +00:00
Thurston Dang	fa9ac62d02	[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211 ) This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994). Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.	2025-01-27 20:08:53 -08:00
Han-Kuan Chen	08d14e10ca	[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244 ) We have two types of mask in SLP: a scalar mask and a vector mask. When vectorizing four i32 additions into <4 x i32>, SLP creates a mask of length 4. When vectorizing four <2 x i32> additions into <8 x i32>, SLP also creates a mask of length 4. We refer to the first case as a scalar mask (because the mask element represents a scalar, i32), and the second case as a vector mask (because the mask element represents a vector, <4 x i32>). At some point, we must convert the scalar mask into a vector mask (otherwise, calling TTI cost functions or IRBuilderBase functions may yield incorrect results). Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify the CommonMask, we have decided to perform the mask transformation only within createShuffle. However, we do not store the transformed result, as createShuffle may be called multiple times.	2025-01-28 12:02:37 +08:00
Florian Hahn	713482fccf	[VPlan] Use State.get to extract lane mask for BranchOnMask. Simplifies the code slightly and avoids redundant extracts/broadcasts if the operand is live-in or already scalar.	2025-01-27 21:35:36 +00:00
Florian Hahn	09a29fcc8d	[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819 ) Live-ins don't need to be handled, other than adding to the exit phi recipe. Do that early and assert that otherwise the exit value is defined in the vector loop region. This should enable simply skipping other exit values that do not need further fixing, e.g. if handling the exit value from the early exit directly in handleUncountableEarlyExit. PR: https://github.com/llvm/llvm-project/pull/123819	2025-01-27 16:12:07 +00:00
Simon Pilgrim	1bb784a748	[LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-27 16:05:29 +00:00
Simon Pilgrim	ad2b2aa50b	[PhaseOrdering] vector-trunc.ll - use -passes="default<O2>" to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-27 16:05:29 +00:00
Simon Pilgrim	178f47143a	[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412 ) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS. I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :\|	2025-01-27 15:56:22 +00:00
Nikita Popov	212f344b84	[InstCombine] Handle constant expression result in tryFactorization() If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes https://github.com/llvm/llvm-project/issues/124526.	2025-01-27 16:25:37 +01:00
Ramkumar Ramachandra	3a4376b8f9	LAA: handle 0 return from getPtrStride correctly (#124539 ) getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.	2025-01-27 14:21:14 +00:00
David Sherwood	b7286dbef9	Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752 " (#123616 ) The last attempt failed a sanitiser build because we were creating a reference to a null Predicates pointer in isDereferenceableAndAlignedInLoop. This was exposed by the unit test IsDerefReadOnlyLoop in unittests/Analysis/LoadsTest.cpp. I fixed this by falling back on getConstantMaxBackedgeTakenCount if Predicates is null - see line 316 in llvm/lib/Analysis/Loads.cpp. There are no other changes.	2025-01-27 11:59:38 +00:00
Andreas Jonson	f8ab91f74f	[LVI][CVP] Add test for trunc bittest. (NFC)	2025-01-26 16:55:05 +01:00
Teresa Johnson	2af819fa3d	[MemProf] Add test for hot hints (#124394 ) The change in PR124219 required removing one of the tests added for -memprof-use-hot-hints, since we no longer label any contexts as hot in metadata, so add a new test that checks the hot attribute instead.	2025-01-26 07:53:53 -08:00
Simon Pilgrim	dec47b76f4	[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312 ) Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs	2025-01-26 14:43:51 +00:00
Florian Hahn	81d38da65e	[LV] Add more tests for narrowing interleave groups for AArch64. Add additional tests for https://github.com/llvm/llvm-project/pull/106441.	2025-01-26 13:52:18 +00:00
Fangrui Song	2131115be5	[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429 ) When simplifying operands based on demanded bits, the return value range of llvm.fshl might change. Keeping the Range attribute might cause llvm.fshl to generate a poison and lead to miscompile. Drop the Range attribute similar to `dropPosonGeneratingFlags` elsewhere. Fix #124387	2025-01-25 13:35:11 -08:00
Fangrui Song	89f2fee9f8	[InstCombine] Add test for incorrect retention of Range attribute in fshl	2025-01-25 13:17:15 -08:00
Alexey Bataev	5e65f43041	[SLP][NFC]Add a test, producing serie of extrtactelements, building non-extendable tree	2025-01-25 11:50:14 -08:00
Florian Hahn	6383a12e3b	[VPlan] Refactor HCFG builder to preserve original vector latch (NFC). Update HCFG builder to preserve the original latch block of the initial VPlan, ensuring there is always a latch. It also skips creating the BranchOnCond for the latch of the top-level loop, instead of removing it later. Exiting via the latch is controlled by later recipes. This further unifies HCFG construction and prepares for use to also build an initial VPlan (VPlan0) for inner loops.	2025-01-25 13:32:01 +00:00
David Green	52bffdf9f5	[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284 ) Fixes an issue reported from #114964, where metadata arguments were attempted to be accessed as constants.	2025-01-25 10:59:50 +00:00
Alex MacLean	07ed8187ac	[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320 ) Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling convention is a more idiomatic and compile-time performant than using the `nvvm.annoation !"kernel"` metadata. Transition OMPIRBuilder to use calling conventions for PTX kernels and no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels specified via calling convention as well as metadata. Update OpenMP tests to use the calling conventions.	2025-01-24 16:56:10 -08:00
Teresa Johnson	c725a95e08	[MemProf] Convert Hot contexts to NotCold early (#124219 ) While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. To address this, any hot contexts are converted to notcold contexts immediately after first checking for unambiguous allocation types, and before checking it again and before adding metadata while performing context trimming. Note that hot hints are now disabled by default, however, this avoids adding unnecessary overhead if they are re-enabled.	2025-01-24 15:58:13 -08:00
vporpo	6409799bdc	[SandboxVec][Legality] Pack from different BBs (#124363 ) When the inputs of the pack come from different BBs we need to make sure we emit the pack instructions at the correct place.	2025-01-24 15:39:37 -08:00
vporpo	ac75d32280	[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360 ) This patch changes the functionality of `VecUtils::getLowest(Vals, BB)` such that it filters out any instructions in `Vals` that are not in BB. This is useful when Vals contains instructions from different BBs, because in that case we are only interested in one BB.	2025-01-24 14:52:57 -08:00
Teresa Johnson	ae8b560899	[MemProf] Disable hot hints by default (#124338 ) By default we were marking some contexts as hot, and adding hot hints to unambiguously hot allocations. However, there is not yet support for cloning to expose hot allocation contexts, and none is planned for the forseeable future. While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. This change simply disables the generation of hot contexts / hints by default, as few allocations were unambiguously hot. A subsequent change will address the issue when hot hints are optionally enabled. See PR124219 for details. This change resulted in significant overhead reductions for a large target: ~48% reduction in the per-module ThinLTO bitcode summary sizes ~72% reduction in the distributed ThinLTO bitcode combined summary sizes ~68% reduction in thin link time ~34% reduction in thin link peak memory	2025-01-24 13:06:11 -08:00
Alexandros Lamprineas	1b1270f30b	[FMV][GlobalOpt] Enable static resolution of non-FMV callers. (#124314 ) The undetectable FMV features predres and ls64 have been removed, therefore the optimization is now re-enabled. The llvm testsuite Graviton4 bots are expected to remain green.	2025-01-24 19:48:40 +00:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Simon Pilgrim	a12d7e4b61	[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254 ) getVectorCallCosts determines the cost of a vector intrinsic, based off an existing scalar intrinsic call - but we were including the scalar argument data to the IntrinsicCostAttributes, which meant that not only was the cost calculation not type-only based, it was making incorrect assumptions about constant values etc. This also exposed an issue that x86 relied on fallback calculations for funnel shift costs - this is great when we have the argument data as that improves the accuracy of uniform shift amounts etc., but meant that type-only costs would default to Cost=2 for all custom lowered funnel shifts, which was far too cheap. This is the reverse of #124129 where we weren't including argument data when we could. Fixes #63980	2025-01-24 15:13:13 +00:00
Simon Pilgrim	625e0a40f1	[SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests	2025-01-24 10:12:19 +00:00
Elvis Wang	aff1242b8e	[LV] Align debug location of the widen-phi to the original phi. (#120338 ) This patch align the debug location of the widen-phi to the debug location of original phi. Split from: #120054	2025-01-24 17:49:54 +08:00
Simon Pilgrim	7746596713	[SLP][X86] Add VBMI2 coverage for funnel shift tests VBMI2 CPUs actually have vector funnel shift instruction support	2025-01-24 09:47:40 +00:00
vporpo	d2234ca163	[SandboxVec][BottomUpVec] Fix packing when PHIs are present (#124206 ) Before this patch we might have emitted pack instructions in between PHI nodes. This patch fixes it by fixing the insert point of the new packs.	2025-01-23 16:29:01 -08:00
Alexander Kornienko	788318484d	Revert "[InstCombine] Teach foldSelectOpOp about samesign" (#124123 ) Reverts llvm/llvm-project#122723 due to a miscompilation See https://github.com/llvm/llvm-project/pull/122723#issuecomment-2608777844 for details and the test case.	2025-01-24 01:08:10 +01:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
vporpo	c7053ac202	[SandboxVec][BottomUpVec] Disable crossing BBs (#124039 ) Crossing BBs is not currently supported by the structures of the vectorizer. This patch fixes instances where this was happening, including: - a walk of use-def operands that updates the UnscheduledSuccs counter, - the dead instruction removal is now done per BB, - the scheduler, which will reject bundles that cross BBs.	2025-01-23 15:08:13 -08:00
David Blaikie	42043c423f	Reapply "Verifier: Add check for DICompositeType elements being null" This remove some erroneous debug info from tests that should address the test failures that showed up when the this was previously committed. This reverts commit 6716ce8b641f0e42e2343e1694ee578b027be0c4.	2025-01-23 22:29:30 +00:00
Vitaly Buka	0e213834df	Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272 )" (#124198 ) Introduced stack buffer overflow, see #120272. `getScaledReduction` can return empty vector, and there is not check for that. This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09. This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.	2025-01-23 14:00:33 -08:00
David Green	775d0f36f7	[GVN] Handle scalable vectors with the same size in VNCoercion (#123984 ) This allows us to forward to a load even if the types do not match (nxv4i32 vs nxv2i64 for example). Scalable types are allowed in canCoerceMustAliasedValueToLoad so long as the size (minelts * scalarsize) is the same, and some follow-on code is adjusted to make sure it handles scalable sizes correctly. Methods like analyzeLoadFromClobberingWrite and analyzeLoadFromClobberingStore still do nothing for scalable vectors, as Offsets and mismatching types are not supported.	2025-01-23 18:43:50 +00:00
David Green	bec4c7f5f7	[InstCombine] Unpack scalable struct loads/stores. (#123986 ) This teaches unpackLoadToAggregate and unpackStoreToAggregate to unpack scalable structs to individual loads/stores with insertvalues / extractvalues. The gep used for the offsets uses an i8 ptradd as opposed to a struct gep, as the geps for scalable structs are not supported and we canonicalize to i8.	2025-01-23 18:04:27 +00:00
David Green	3d72619d75	[InstCombine] Add a test for splitting scalable structs. NFC	2025-01-23 17:36:06 +00:00
David Green	6d4e72abb8	[GVN] Add extra vscale tests with different types. NFC	2025-01-23 17:36:06 +00:00
Nicholas Guy	caf0540b91	[LoopVectorizer] Add support for chaining partial reductions (#120272 ) Chaining partial reductions, where multiple partial reductions share an accumulator, allow for more values to be combined together as part of the reduction without discarding the semantics of the partial reduction itself.	2025-01-23 17:24:57 +00:00
Simon Pilgrim	d8cd8d56ea	[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129 ) We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.) Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can. Noticed while having yet another attempt at #63980	2025-01-23 16:57:13 +00:00

1 2 3 4 5 ...

30961 Commits