llvm-project

Author	SHA1	Message	Date
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
Florian Hahn	8df64ed777	[LV] Don't consider IV increments uniform if exit value is used outside. In some cases, there might be a chain of uniform instructions producing the exit value. To generate correct code in all cases, consider the IV increment not uniform, if there are users outside the loop. Instead, let VPlan narrow the IV, if possible using the logic from 3ff1d01985752. Test case from #122602 verified with Alive2: https://alive2.llvm.org/ce/z/bA4EGj Fixes https://github.com/llvm/llvm-project/issues/122496. Fixes https://github.com/llvm/llvm-project/issues/122602.	2025-01-12 22:03:21 +00:00
Florian Hahn	f5a35a31bf	[LV] Add test cases with incorrect IV live-outs. Add test cases for https://github.com/llvm/llvm-project/issues/122496 and https://github.com/llvm/llvm-project/issues/122602.	2025-01-12 20:55:20 +00:00
Florian Hahn	3ff1d01985	Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67. Re-applies commit with typos fixed.	2025-01-12 20:10:28 +00:00
Florian Hahn	0ebb3ac7c9	Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes." This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac. Typo breaking the build	2025-01-12 19:37:45 +00:00
Florian Hahn	1afba19913	[VPlan] Try to narrow wide and replicating recipes to uniform recipes. Use the existing VPlan-based analysis to identify recipes that only have their first lane demanded and transform them to uniform recpliate recipes. This simplifies the generated code in some places and prepares for fixing https://github.com/llvm/llvm-project/issues/122496.	2025-01-12 19:32:01 +00:00
Ruhung	4f7dc1b55a	[InstCombine] Fold (add (add A, 1), (sext (icmp ne A, 0))) to call umax(A, 1) (#122491 ) Transform (add (add A, 1), (sext (icmp ne A, 0))) into call umax(A, 1). Fixes #121853. Alive2: https://alive2.llvm.org/ce/z/TweTan	2025-01-12 16:51:58 +01:00
goldsteinn	17ef436e3d	[ValueTracking] Take into account whether zero is poison when computing CR for `ct{t,l}z` (#122548 )	2025-01-11 15:11:11 -06:00
goldsteinn	cc995ad064	[InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X` (#122552 ) - [InstSimpify] Add tests for simplifying `(xor (sub C_Mask, X), C_Mask)`; NFC - [InstSimpify] Simplifying `(xor (sub C_Mask, X), C_Mask)` -> `X` Helps address regressions with folding `clz(Pow2)`. Proof: https://alive2.llvm.org/ce/z/zGwUBp	2025-01-11 15:10:42 -06:00
Amr Hesham	642e493d4d	[InstCombine] Convert fshl(x, 0, y) to shl(x, and(y, BitWidth - 1)) when BitWidth is pow2 (#122362 ) Convert `fshl(x, 0, y)` to `shl(x, and(y, BitWidth - 1))` when BitWidth is pow2 Alive2 proof: https://alive2.llvm.org/ce/z/3oTEop Fixes: #122235	2025-01-11 11:48:05 +01:00
Veera	2d5f07c828	[InstCombine] Fold `X udiv Y` to `X lshr cttz(Y)` if Y is a power of 2 (#121386 ) Fixes #115767 This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two since bitwise operations are faster than division. Proof: https://alive2.llvm.org/ce/z/qHmLta	2025-01-11 13:56:13 +08:00
Mircea Trofin	6329355860	[ctxprof] Move test serialization to yaml (#122545 ) We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader. A subsequent patch will address deserialization. (thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)	2025-01-10 18:04:25 -08:00
Florian Hahn	44058e5b5f	[LV] Precommit tests for #106441 . Tests for https://github.com/llvm/llvm-project/pull/106441 from https://github.com/llvm/llvm-project/issues/82936.	2025-01-10 18:49:44 +00:00
Alexey Bataev	681c83a2f9	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 09:32:35 -08:00
Alex MacLean	59ced72bc2	[ValueTracking] Add rotate idiom to haveNoCommonBitsSet special cases (#122165 ) An occasional idiom for rotation is "(A << B) + (A >> (BitWidth - B))". Currently this is not well handled on targets with native funnel-shift/rotate support. Add a special case to haveNoCommonBitsSet to ensure that the addition is converted to a disjoint or in InstCombine so during instruction selection the idiom can be converted to an efficient rotation implementation. Proof: https://alive2.llvm.org/ce/z/WdCZsN	2025-01-10 09:17:44 -08:00
Alexey Bataev	3c9c94a24f	Revert "[SLP]Fix mask generation after cost estimation" This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix buildbots reported in https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492	2025-01-10 08:46:42 -08:00
Alexey Bataev	547ba9730b	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 08:17:56 -08:00
Alexey Bataev	920c58916a	[SLP][NFC]Add a test with the mask translate after buildvector shuffle cost estimation	2025-01-10 08:12:03 -08:00
Nikita Popov	eeac0ffaf4	Revert "[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 )" This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a. This causes a large compile-time regression.	2025-01-10 09:05:06 +01:00
Teresa Johnson	3055e86c71	[MemProf] Disable cloning of callsites in recursive cycles by default (#122354 ) This disables the support added in PR121985 by default while we investigate a compile time crash.	2025-01-09 12:01:43 -08:00
vporpo	6312beef78	[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826 ) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.	2025-01-09 11:53:48 -08:00
Alexey Bataev	5ff36748cf	[SLP]Fix mask processing for reused gathered scalars Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes #122324	2025-01-09 11:24:48 -08:00
Florian Hahn	b6cda338ab	[Loads] Also consider getPointerAlignment when checking assumptions. (#120916 ) Also use getPointerAlignment when trying to use alignment and dereferenceable assumptions. This catches cases where dereferencable is known via the assumption but alignment is known via getPointerAlignment (e.g. via argument attribute or align of 1) PR: https://github.com/llvm/llvm-project/pull/120916	2025-01-09 18:19:39 +00:00
Mikhail Gudim	c87ef146e1	[InstCombine][NFC] Precommit a test for folding a binary op of reductions. (#121568 )	2025-01-09 12:15:20 -05:00
Pengcheng Wang	b4e17d4a31	[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 ) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from https://github.com/llvm/llvm-project/pull/118787	2025-01-09 21:05:52 +08:00
Florian Hahn	b0697dc1de	[LV] Only check isVectorizableEarlyExitLoop with multiple exits. (#121994 ) Currently we emit early-exit related debug messages/remarks even when there is a single exit. Update to only check isVectorizableEarlyExitLoop if there isn't a single exit block. PR: https://github.com/llvm/llvm-project/pull/121994	2025-01-09 12:05:19 +00:00
Nikita Popov	dcdf44aca7	[InstCombine] Remove foldSelectICmpEq() fold (#122098 ) This fold matches complex patterns, for which we have no proof of real-world relevance, and which does not actually handle the originally motivating cases from https://github.com/llvm/llvm-project/issues/71792 either. In https://github.com/llvm/llvm-project/pull/121708 and https://github.com/llvm/llvm-project/pull/121753 we have handled some simpler variants by extending existing folds. I propose to remove this code until we have evidence that it is useful for something.	2025-01-09 12:33:01 +01:00
Benjamin Maxwell	f88ef1bd1b	[LV] Teach LoopVectorizationLegality about struct vector calls (#119221 ) This is a split-off from #109833 and only adds code relating to checking if a struct-returning call can be vectorized. This initial patch only allows the case where all users of the struct return are `extractvalue` operations that can be widened. ``` %call = tail call { float, float } @foo(float %in_val) %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 ``` Note: The tests require the VFABI changes from #119000 to pass.	2025-01-09 09:27:29 +00:00
Yingwei Zheng	b8337dc4b2	[InstCombine] Handle commuted patterns in `foldBinOpShiftWithShift` (#122126 ) Closes https://github.com/llvm/llvm-project/issues/121775.	2025-01-09 14:36:17 +08:00
David Green	676c641718	[VectorCombine] Use getInstructionCost to cost Shuffle. (#122068 ) This allows it to produce a more accurate cost for the shuffle, using the more accurate calls to getShuffleCost in getInstructionCost. It helps fix some of the regressions from vector combine a little while ago, now that we have better subvector extract costs.	2025-01-08 20:48:40 +00:00
Andreas Jonson	d4182f1b56	[InstCombine] move foldAndOrOfICmpsOfAndWithPow2 into foldLogOpOfMaskedICmps (#121970 )	2025-01-08 18:04:38 +01:00
Simon Pilgrim	322ff42315	[PhaseOrdering][AArch64] block_scaling_decompr_8bit.ll - use -passes="default<O3>" to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-08 15:07:21 +00:00
Alexey Bataev	1160994602	[SLP]Fix a crash for very long GEP chains Need to check if the GEP bases are equal and return false early. Also, need to return false if the lookup is too deep, considering bases equal too. Fixes a crash in the assertion.	2025-01-08 06:47:41 -08:00
David Green	a8dab1aa03	[AArch64] Add a subvector extract cost. (#121472 ) These can generally be emitted using an ext instruction or mov from the high half. The half half extracts can be free depending on the users, but that is not handled here, just the basic costs. It originally included all subvector extracts, but that was toned-down to just half-vector extracts to try and help the mid end not breakup high/low extracts without having the SLP vectorizer create a mess using other shuffles.	2025-01-08 08:13:07 +00:00
Luke Lau	f0d5104c94	[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058 ) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.	2025-01-08 15:17:26 +08:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Vyacheslav Klochkov	9184c42869	[LoadStoreVectorizer] Postprocess and merge equivalence classes (#121861 ) This patch introduces a new method: void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses) const; The method is called at the end of Vectorizer::collectEquivalenceClasses() and is needed to merge equivalence classes that differ only by their underlying objects (UO1 and UO2), where UO1 is 1-level-indirection underlying base for UO2. This situation arises due to the limited lookup depth used during the search of underlying bases with llvm::getUnderlyingObject(ptr). Using any fixed lookup depth can result into creation of multiple equivalence classes that only differ by 1-level indirection bases. The new approach merges equivalence classes if they have adjacent bases (1-level indirection). If a series of equivalence classes form ladder formed of 1-step/level indirections, they are all merged into a single equivalence class. This provides more opportunities for the load-store vectorizer to generate better vectors. --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>	2025-01-07 17:17:26 -08:00
Teresa Johnson	b8ad6fb066	[MemProf] Allow cloning of callsites in recursive cycles (#121985 ) Optionally (by default) no longer mark callsite nodes as Recursive, which means they would be automatically skipped during cloning. This was too conservative as it prevents cloning of any callsite that showed up in any recursive cycle, even for non-recursive contexts. While this will enable partial cloning of recursive contexts, the recursive calls themselves will not be updated to call the correct clone, possibly leading to some unnecessary but benign cloning and affecting bytes hinted reporting. To prevent this, optional support looks for recursive cycles in contexts during cloning and removes those contexts from cloning. This requires some additional runtime overhead, so is disabled by default for now. Support for correct cloning of recursive cycles is WIP.	2025-01-07 17:00:46 -08:00
Florian Hahn	0eaa69eb23	[VPlan] Handle VPExpandSCEVRecipe in isUniformAfterVectorization. VPExpandSCEVRecipes must be placed in the entry and are alway uniform. This fixes a crash by always identifying them as uniform, even if the main vector loop region has been removed. Fixes https://github.com/llvm/llvm-project/issues/121897.	2025-01-07 21:35:09 +00:00
Florian Hahn	ea14bdb035	[LV] Add test showing debug output for loops with uncountable BTCs. Currently we print an early-exit related related debug message, even though there's no early exit.	2025-01-07 20:27:30 +00:00
goldsteinn	6192fafe9c	[InstSimplify] Use multi-op replacement when simplify `select` (#121708 ) - [InstSimplify] Refactor `simplifyWithOpsReplaced` to allow multiple replacements; NFC - [InstSimplify] Use multi-op replacement when simplify `select` In the case of `select X \| Y == 0 :...` or `select X & Y == -1 : ...` we can do more simplifications by trying to replace both `X` and `Y` with the respective constant at once. Handles some cases for https://github.com/llvm/llvm-project/pull/121672 more generically.	2025-01-07 11:42:01 -06:00
Andreas Jonson	15d3e4afd6	[InstCombine] Test for two types of bittests (NFC)	2025-01-07 18:34:31 +01:00
Florian Mayer	ef391dbc29	[LV] Drop incorrect inbounds for reverse vector pointer when folding tail (#120730 ) When folding the tail, we may compute an address that we don't in the original scalar loop and it may not be inbounds. Drop Inbounds in that case.	2025-01-07 06:14:01 -08:00
Yingwei Zheng	4e066b6be4	[PatternMatch] Match commuted patterns in `Signum_match` (#121911 ) Closes https://github.com/llvm/llvm-project/issues/121776.	2025-01-07 21:31:48 +08:00
Lewis Crawford	a629d9e102	[NVPTX] Constant-folding for f2i, d2ui, f2ll etc. (#118965 ) Add constant-folding support for the NVVM intrinsics for converting float/double to signed/unsigned int32/int64 types, including all rounding-modes and ftz modifiers.	2025-01-07 13:17:36 +00:00
Yingwei Zheng	882df05435	[InstCombine] Fold `(A \| B) ^ (A & C) --> A ? ~C : B` (#121906 ) Closes https://github.com/llvm/llvm-project/issues/121773.	2025-01-07 20:50:35 +08:00
Simon Pilgrim	5a7dfb4659	[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.	2025-01-07 11:31:44 +00:00
Nikita Popov	63d4e0fb66	[InstCombine] Compute result directly on APInts If the bitwidth is 2 and we add two 1s, the result may overflow. This is fine in terms of correctness, but triggers the APInt ctor assertion. Fix this by performing the calculation directly on APInts. Fixes the issue reported in: https://github.com/llvm/llvm-project/pull/114539#issuecomment-2574845003	2025-01-07 12:13:19 +01:00
Simon Pilgrim	4a42658c1b	[VectorCombine][X86] shuffle-of-cmps.ll - tweak shuf_fcmp_oeq_v4i32 shuffle to be not so cheap An upcoming patch will recognise this as a cheap INSERTPS shuffle - alter the shuffle to ensure the 2 x FCMP is still cheaper on SSE4 targets	2025-01-07 11:07:48 +00:00
Yingwei Zheng	231d113c7e	[InstCombine] Handle commuted patterns in `foldSelectWithSRem` (#121896 ) Closes https://github.com/llvm/llvm-project/issues/121771.	2025-01-07 17:09:58 +08:00

1 2 3 4 5 ...

30816 Commits