llvm-project

Author	SHA1	Message	Date
Alex Bradbury	8fcb1263f4	[PreISelIntrinsicLowering] Produce a memset_pattern16 libcall for llvm.experimental.memset.pattern when available (#120420 ) This is to enable a transition of LoopIdiomRecognize to selecting the llvm.experimental.memset.pattern intrinsic as requested in #118632 (as opposed to supporting selection of the libcall or the intrinsic). As such, although it _is_ a TODO to add costing considerations on whether to lower to the libcall (when available) or expand directly, lacking such logic is helpful at this stage in order to minimise any unexpected code gen changes in this transition.	2025-01-30 07:12:53 +00:00
vporpo	e094c0fa67	[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479 ) This patch adds a legality check that checks for repeated instrs in a bundle and won't vectorize if such pattern is found.	2025-01-29 15:54:15 -08:00
Simon Pilgrim	5921295dca	Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962 ) Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost	2025-01-29 22:17:53 +00:00
Mikhail Gudim	1822462e2a	[InstCombine][VectorCombine][NFC] Move a test from InstCombine to (#124948 ) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.	2025-01-29 13:54:38 -05:00
Teresa Johnson	ae6d5dd58b	[MemProf] Prune unneeded non-cold contexts (#124823 ) We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.	2025-01-29 10:38:31 -08:00
Simon Pilgrim	88e00141f8	[PhaseOrdering][X86] Add additional hadd/hsub test coverage Add v16i16 coverage and "reverse order hadd/hsub" tests	2025-01-29 17:25:54 +00:00
Nikita Popov	8a43d0e873	[Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR() This case is intended to check the callee argument, not the call-site. Fixes an issue introduced in #123181.	2025-01-29 17:34:10 +01:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Yingwei Zheng	f226cabbb1	[ValueTracking] Handle nonnull attributes at callsite (#124908 ) Alive2: https://alive2.llvm.org/ce/z/yJfskv Closes https://github.com/llvm/llvm-project/issues/124540.	2025-01-29 23:14:36 +08:00
Yingwei Zheng	cf37ae5cae	[InstCombine] Add one-use check when folding fabs over selects (#122270 ) Fixes multi-use issue introduced by https://github.com/llvm/llvm-project/pull/86390. It allows the folding of `fabs (select Cond, TrueC, FalseC)` to avoid performance regression in ocio	2025-01-29 21:44:59 +08:00
Ryotaro Kasuga	690f251063	[LoopInterchange] Handle LE and GE correctly (#124901 ) LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve #123920	2025-01-29 19:30:54 +09:00
Simon Pilgrim	89ca3e72ca	[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789 ) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.	2025-01-29 10:23:09 +00:00
David Sherwood	776ef9d1be	[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (#124900 )	2025-01-29 09:48:55 +00:00
David Sherwood	c836b8956d	[LoopVectorize][NFC] Disable output for tests that don't need it (#124747 ) There are a lot of tests that do not depend upon the IR output for validation, relying instead on the debug output. For these tests we can add the -disable-output command line argument.	2025-01-29 08:09:50 +00:00
natedal	b0924ed64e	[AggressiveInstCombine] Add tests for memchr inline threshold (NFC) (#121711 ) This adds a test checking that, if length=2, memchr is called. This is undesirable as it would be faster to directly compare the two array elements with the target element, rather than calling the external memchr function.	2025-01-29 14:06:04 +08:00
Stephen Tozer	822f74a911	[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767 ) This patch contains a number of changes relating to the above flag; primarily it updates comment references to the old flag names, "-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names, "-fextend-variable-liveness[={all,this}]". These changes are all NFC. This patch also removes the explicit -fextend-this-ptr-liveness flag alias, and shortens the help-text for the main flag; these are both changes that were meant to be applied in the initial PR (#110000), but due to some user-error on my part they were not included in the merged commit.	2025-01-28 18:25:32 +00:00
Alexey Bataev	947d8ebbf3	[SLP]Unify getNumberOfParts use Adds getNumberOfParts and uses it instead of similar code across code base, fixes analysis of non-vectorizable types in computeMinimumValueSizes. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124774	2025-01-28 12:16:44 -05:00
Ramkumar Ramachandra	d76ea250c8	Reland [InstCombine] Teach foldSelectOpOp about samesign (#124320 ) Changes: There was a serious bug in the previous patch, leading to a miscompile. See #122723 for the miscompile report from Alexander, and the follow-up investigation by Nikita. The patch has since been reworked, and now includes the testcase from the miscompile. Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of one of the FIXMEs it introduced by replacing a predicate comparison with CmpPredicate::getMatching. Co-authored-by: Nikita Popov <npopov@redhat.com>	2025-01-28 16:53:01 +00:00
Florian Hahn	3007f31e74	[LoopUnroll] Add AArch64 tests for multi-exit loop unrolling. Test coverage to https://github.com/llvm/llvm-project/pull/124751.	2025-01-28 14:25:27 +00:00
Alexey Bataev	a1ab5b4c87	[SLP]Check the MainOp matches the requirements for the instructions Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.	2025-01-28 06:00:52 -08:00
Alexey Bataev	1d5fbe83c3	[SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash	2025-01-28 05:45:13 -08:00
Nicholas Guy	cdea38f91a	Reland "[LoopVectorizer] Add support for chaining partial reductions #120272 " (#124282 ) Change `getScaledReduction` to take an existing vector, rather than creating and returning a new one each call. Rename `getScaledReduction` to `getScaledReductions` to more accurately reflect what it's now doing. --------- Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>	2025-01-28 10:40:35 +00:00
Thurston Dang	fa9ac62d02	[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211 ) This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994). Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.	2025-01-27 20:08:53 -08:00
Han-Kuan Chen	08d14e10ca	[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244 ) We have two types of mask in SLP: a scalar mask and a vector mask. When vectorizing four i32 additions into <4 x i32>, SLP creates a mask of length 4. When vectorizing four <2 x i32> additions into <8 x i32>, SLP also creates a mask of length 4. We refer to the first case as a scalar mask (because the mask element represents a scalar, i32), and the second case as a vector mask (because the mask element represents a vector, <4 x i32>). At some point, we must convert the scalar mask into a vector mask (otherwise, calling TTI cost functions or IRBuilderBase functions may yield incorrect results). Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify the CommonMask, we have decided to perform the mask transformation only within createShuffle. However, we do not store the transformed result, as createShuffle may be called multiple times.	2025-01-28 12:02:37 +08:00
Florian Hahn	713482fccf	[VPlan] Use State.get to extract lane mask for BranchOnMask. Simplifies the code slightly and avoids redundant extracts/broadcasts if the operand is live-in or already scalar.	2025-01-27 21:35:36 +00:00
Florian Hahn	09a29fcc8d	[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819 ) Live-ins don't need to be handled, other than adding to the exit phi recipe. Do that early and assert that otherwise the exit value is defined in the vector loop region. This should enable simply skipping other exit values that do not need further fixing, e.g. if handling the exit value from the early exit directly in handleUncountableEarlyExit. PR: https://github.com/llvm/llvm-project/pull/123819	2025-01-27 16:12:07 +00:00
Simon Pilgrim	1bb784a748	[LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-27 16:05:29 +00:00
Simon Pilgrim	ad2b2aa50b	[PhaseOrdering] vector-trunc.ll - use -passes="default<O2>" to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-27 16:05:29 +00:00
Simon Pilgrim	178f47143a	[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412 ) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS. I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :\|	2025-01-27 15:56:22 +00:00
Nikita Popov	212f344b84	[InstCombine] Handle constant expression result in tryFactorization() If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes https://github.com/llvm/llvm-project/issues/124526.	2025-01-27 16:25:37 +01:00
Ramkumar Ramachandra	3a4376b8f9	LAA: handle 0 return from getPtrStride correctly (#124539 ) getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.	2025-01-27 14:21:14 +00:00
David Sherwood	b7286dbef9	Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752 " (#123616 ) The last attempt failed a sanitiser build because we were creating a reference to a null Predicates pointer in isDereferenceableAndAlignedInLoop. This was exposed by the unit test IsDerefReadOnlyLoop in unittests/Analysis/LoadsTest.cpp. I fixed this by falling back on getConstantMaxBackedgeTakenCount if Predicates is null - see line 316 in llvm/lib/Analysis/Loads.cpp. There are no other changes.	2025-01-27 11:59:38 +00:00
Andreas Jonson	f8ab91f74f	[LVI][CVP] Add test for trunc bittest. (NFC)	2025-01-26 16:55:05 +01:00
Teresa Johnson	2af819fa3d	[MemProf] Add test for hot hints (#124394 ) The change in PR124219 required removing one of the tests added for -memprof-use-hot-hints, since we no longer label any contexts as hot in metadata, so add a new test that checks the hot attribute instead.	2025-01-26 07:53:53 -08:00
Simon Pilgrim	dec47b76f4	[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312 ) Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs	2025-01-26 14:43:51 +00:00
Florian Hahn	81d38da65e	[LV] Add more tests for narrowing interleave groups for AArch64. Add additional tests for https://github.com/llvm/llvm-project/pull/106441.	2025-01-26 13:52:18 +00:00
Fangrui Song	2131115be5	[InstCombine] Drop Range attribute when simplifying 'fshl' based on demanded bits (#124429 ) When simplifying operands based on demanded bits, the return value range of llvm.fshl might change. Keeping the Range attribute might cause llvm.fshl to generate a poison and lead to miscompile. Drop the Range attribute similar to `dropPosonGeneratingFlags` elsewhere. Fix #124387	2025-01-25 13:35:11 -08:00
Fangrui Song	89f2fee9f8	[InstCombine] Add test for incorrect retention of Range attribute in fshl	2025-01-25 13:17:15 -08:00
Alexey Bataev	5e65f43041	[SLP][NFC]Add a test, producing serie of extrtactelements, building non-extendable tree	2025-01-25 11:50:14 -08:00
Florian Hahn	6383a12e3b	[VPlan] Refactor HCFG builder to preserve original vector latch (NFC). Update HCFG builder to preserve the original latch block of the initial VPlan, ensuring there is always a latch. It also skips creating the BranchOnCond for the latch of the top-level loop, instead of removing it later. Exiting via the latch is controlled by later recipes. This further unifies HCFG construction and prepares for use to also build an initial VPlan (VPlan0) for inner loops.	2025-01-25 13:32:01 +00:00
David Green	52bffdf9f5	[IPSCCP][FuncSpec] Protect against metadata access from call args. (#124284 ) Fixes an issue reported from #114964, where metadata arguments were attempted to be accessed as constants.	2025-01-25 10:59:50 +00:00
Alex MacLean	07ed8187ac	[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320 ) Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling convention is a more idiomatic and compile-time performant than using the `nvvm.annoation !"kernel"` metadata. Transition OMPIRBuilder to use calling conventions for PTX kernels and no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels specified via calling convention as well as metadata. Update OpenMP tests to use the calling conventions.	2025-01-24 16:56:10 -08:00
Teresa Johnson	c725a95e08	[MemProf] Convert Hot contexts to NotCold early (#124219 ) While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. To address this, any hot contexts are converted to notcold contexts immediately after first checking for unambiguous allocation types, and before checking it again and before adding metadata while performing context trimming. Note that hot hints are now disabled by default, however, this avoids adding unnecessary overhead if they are re-enabled.	2025-01-24 15:58:13 -08:00
vporpo	6409799bdc	[SandboxVec][Legality] Pack from different BBs (#124363 ) When the inputs of the pack come from different BBs we need to make sure we emit the pack instructions at the correct place.	2025-01-24 15:39:37 -08:00
vporpo	ac75d32280	[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360 ) This patch changes the functionality of `VecUtils::getLowest(Vals, BB)` such that it filters out any instructions in `Vals` that are not in BB. This is useful when Vals contains instructions from different BBs, because in that case we are only interested in one BB.	2025-01-24 14:52:57 -08:00
Teresa Johnson	ae8b560899	[MemProf] Disable hot hints by default (#124338 ) By default we were marking some contexts as hot, and adding hot hints to unambiguously hot allocations. However, there is not yet support for cloning to expose hot allocation contexts, and none is planned for the forseeable future. While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. This change simply disables the generation of hot contexts / hints by default, as few allocations were unambiguously hot. A subsequent change will address the issue when hot hints are optionally enabled. See PR124219 for details. This change resulted in significant overhead reductions for a large target: ~48% reduction in the per-module ThinLTO bitcode summary sizes ~72% reduction in the distributed ThinLTO bitcode combined summary sizes ~68% reduction in thin link time ~34% reduction in thin link peak memory	2025-01-24 13:06:11 -08:00
Alexandros Lamprineas	1b1270f30b	[FMV][GlobalOpt] Enable static resolution of non-FMV callers. (#124314 ) The undetectable FMV features predres and ls64 have been removed, therefore the optimization is now re-enabled. The llvm testsuite Graviton4 bots are expected to remain green.	2025-01-24 19:48:40 +00:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Simon Pilgrim	a12d7e4b61	[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254 ) getVectorCallCosts determines the cost of a vector intrinsic, based off an existing scalar intrinsic call - but we were including the scalar argument data to the IntrinsicCostAttributes, which meant that not only was the cost calculation not type-only based, it was making incorrect assumptions about constant values etc. This also exposed an issue that x86 relied on fallback calculations for funnel shift costs - this is great when we have the argument data as that improves the accuracy of uniform shift amounts etc., but meant that type-only costs would default to Cost=2 for all custom lowered funnel shifts, which was far too cheap. This is the reverse of #124129 where we weren't including argument data when we could. Fixes #63980	2025-01-24 15:13:13 +00:00
Simon Pilgrim	625e0a40f1	[SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests	2025-01-24 10:12:19 +00:00

1 2 3 4 5 ...

30975 Commits