llvm-project

Author	SHA1	Message	Date
AZero13	ffd2633061	[InstCombine] Fold mul (shr exact (X, N)), 2^N + 1 -> add (X , shr exact (X, N)) (#112407 ) Alive2 Proofs: https://alive2.llvm.org/ce/z/aJnxyp https://alive2.llvm.org/ce/z/dyeGEv	2025-02-13 14:25:09 +08:00
Thurston Dang	df07121d54	[hwasan][NFCI] Rename ClRandomSkipRate to ClRandomKeepRate (#126990 ) The meaning of ClRandomSkipRate was inverted in https://github.com/llvm/llvm-project/pull/88070 but the variable name was not changed. This patch fixes it to avoid confusion. Additionally, it elaborates the flag description to mention the interaction between the random keep rate and hotness cutoff.	2025-02-12 18:43:00 -08:00
Thurston Dang	51d8255203	[msan] Handle Arm NEON saturating extract and narrow (#125742 ) This handles NEON saturating extract and narrow (Intrinsic::aarch64_neon_{sqxtn, sqxtun, uqxtn}) by (ab)using handleShadowOr() to perform the shadow cast. Previously, these were unknown intrinsics handled suboptimally by visitInstruction. Updates the tests from https://github.com/llvm/llvm-project/pull/125288 and https://github.com/llvm/llvm-project/pull/125140	2025-02-12 16:22:49 -08:00
vporpo	1c207f1b6e	[SandboxVec][DAG] Fix DAG when old interval is mem free (#126983 ) This patch fixes a bug in `DependencyGraph::extend()` when the old interval contains no memory instructions. When this is the case we should do a full dependency scan of the new interval.	2025-02-12 15:06:30 -08:00
vporpo	31cb807537	[SanbdoxVec][BottomUpVec] Fix diamond shuffle with multiple vector inputs (#126965 ) When the operand comes from multiple inputs then we need additional packing code. When the operands are scalar then we can use a single InsertElementInst. But when the operands are vectors then we need a chain of ExtractElementInst and InsertElementInst instructions to insert the vector value into the destination vector. This is what this patch implements.	2025-02-12 14:33:05 -08:00
Thurston Dang	0d95631a3a	[msan] Handle llvm.[us]cmp (starship operator) (#125804 ) Apply handleShadowOr to llvm.[us]cmp. Previously, llvm.[su]cmp was correctly handled heuristically when each parameter type is the same as the return type (e.g., `call i8 @llvm.ucmp.i8.i8(i8 %x, i8 %y)`) but handled incorrectly by visitInstruction when the return type is different e.g., (`call i8 @llvm.ucmp.i8.i62(i62 %x, i62 %y)`, `call <4 x i8> @llvm.ucmp.v4i8.v4i32(<4 x i32> %x, <4 x i32> %y)`). Updates the tests from https://github.com/llvm/llvm-project/pull/125790	2025-02-12 13:38:45 -08:00
Thurston Dang	e9e6ba6a5e	[msan] Handle single-parameter Arm NEON vector convert intrinsics (#126136 ) This handles the following llvm.aarch64.neon intrinsics, which were suboptimally handled by visitInstruction: - fcvtas, fcvtau - fcvtms, fcvtmu - fcvtns, fcvtnu - fcvtps, fcvtpu - fcvtzs, fcvtzu The old instrumentation checked that the shadow of every element of the input vector was fully initialized, and aborted otherwise. The new instrumentation propagates the shadow: for each element of the output, the shadow is initialized iff the corresponding element of the input is fully initialized (since these are floating-point to integer conversions). Updates the tests from https://github.com/llvm/llvm-project/pull/126095	2025-02-12 13:20:22 -08:00
Vasileios Porpodas	e75e61728e	[SandboxVec] Fix warnings introduced by 7a7f9190d03e	2025-02-12 12:43:24 -08:00
vporpo	7a7f9190d0	[SandboxVec][Legality] Fix mask on diamond reuse with shuffle (#126963 ) This patch fixes a bug in the creation of shuffle masks when vectorizing vectors in case of a diamond reuse with shuffle. The mask needs to enumerate all elements of a vector, not treat the original vector value as a single element. That is: if vectorizing two <2 x float> vectors into a <4 x float> the mask needs to have 4 indices, not just 2.	2025-02-12 12:29:09 -08:00
vporpo	6d7a84d72b	[SandboxVec][Scheduler] Fix top of schedule (#126820 ) This patch fixes the way the top-of-schedule variable gets set and updated. Before this patch it used to get updated whenever we scheduled a bundle, which is wrong, as the top-of-schedule needs to be maintained across scheduling attempts. It should get reset only when we clear the schedule or when we destroy the current schedule and re-schedule.	2025-02-12 11:52:01 -08:00
Harald van Dijk	23209eb1d9	Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 )" This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.	2025-02-12 17:50:39 +00:00
Harald van Dijk	3ec9f7494b	[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 ) After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock *. This commit also updates the existing calls to the DIBuilder methods to pass in iterators.	2025-02-12 17:38:59 +00:00
Alexey Bataev	bb3d789dfe	[SLP][NFC]Improve dump of the ScheduleData, NFC	2025-02-12 06:51:30 -08:00
Alexey Bataev	e1935a2b15	Revert "[SLP][NFC]Improve dump of the ScheduleData, NFC" This reverts commit 108e6bca693e5f44d2d17da5a6e06203a0290de7 to fix error revealed by buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/15888.	2025-02-12 06:34:27 -08:00
Alexey Bataev	108e6bca69	[SLP][NFC]Improve dump of the ScheduleData, NFC	2025-02-12 06:25:04 -08:00
David Sherwood	3e62321ed9	[LoopVectorize] Make collectInLoopReductions more efficient (#126769 ) We call collectInLoopReductions in multiple places asking the same question with exactly the same answer. For example, this was being called from a loop in calculateRegisterUsage and this patch hoists the call out to above the loop. In addition I've changed collectInLoopReductions so that it bails out if we've already built up a list.	2025-02-12 14:05:34 +00:00
Jie Fu	a0fbc19ad6	[MemorySanitizer] Silence an unused-variable warning (NFC) /llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2622:22: error: unused variable 'ReturnType' [-Werror,-Wunused-variable] FixedVectorType *ReturnType = cast<FixedVectorType>(I.getType()); ^ 1 error generated.	2025-02-12 11:32:51 +08:00
Thurston Dang	bfbe5319a8	[msan] Add handlePairwiseShadowOrIntrinsic and use it to handle Arm NEON pairwise add (#126008 ) This patch adds a function, handlePairwiseShadowOrIntrinsic that ORs pairs of adjacent shadow values; this is suitable for propagating shadow for 1- or 2-vector intrinsics that combine adjacent fields. It then applies handlePairwiseShadowOrIntrinsic to Arm NEON pairwise add: llvm.aarch64.neon.{addhn, raddhn} (currently incorrectly handled) and llvm.aarch64.neon.{saddlp, uaddlp} (currently suboptimally handled). Updates the tests from https://github.com/llvm/llvm-project/pull/125820.	2025-02-11 19:13:18 -08:00
Alexey Bataev	10844fb9b0	[SLP]Fix attempt to build the reorder mask for non-adjusted reuse mask When building the reorder for non-single use reuse mask, need to check if the size of the mask is multiple of the number of unique scalars. Otherwise, the compiler may crash when trying to reorder nodes. Fixes #126304	2025-02-11 13:41:25 -08:00
Alireza Torabian	3c74430320	[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615 ) Parameter PossiblyLoopIndependent has lost its intended purpose. This flag is always set to true in all cases when depends() is called, hence we want to reconsider the utility of this variable and remove it from the function signature entirely. This is an NFC patch.	2025-02-11 16:23:28 -05:00
Kazu Hirata	042e860a8a	[Vectorize] Avoid repeated hash lookups (NFC) (#126681 )	2025-02-11 09:09:43 -08:00
Florian Hahn	e258bca950	[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235 ) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes https://github.com/llvm/llvm-project/issues/121518. PR: https://github.com/llvm/llvm-project/pull/125235	2025-02-11 13:03:12 +01:00
Florian Hahn	3706dfef66	[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897 ) `forgetLcssaPhiWithNewPredecessor` performs additional invalidation if there is an existing SCEV for the phi, but earlier `forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate the SCEV for the phi. Change the order to first call `forgetLcssaPhiWithNewPredecessor` to ensure it runs before its SCEV gets invalidated too eagerly. Fixes https://github.com/llvm/llvm-project/issues/119665. PR: https://github.com/llvm/llvm-project/pull/119897	2025-02-10 16:29:42 +00:00
Kazu Hirata	2f88672414	[Coroutines] Avoid repeated hash lookups (NFC) (#126466 )	2025-02-10 07:50:32 -08:00
Nikita Popov	2d31a12dbe	[DSE] Don't use initializes on byval argument (#126259 ) There are two ways we can fix this problem, depending on how the semantics of byval and initializes should interact: * Don't infer initializes on byval arguments. initializes on byval refers to the original caller memory (or having both attributes is made a verifier error). * Infer initializes on byval, but don't use it in DSE. initializes on byval refers to the callee copy. This matches the semantics of readonly on byval. This is slightly more powerful, for example, we could do a backend optimization where byval + initializes will allocate the full size of byval on the stack but not copy over the parts covered by initializes. I went with the second variant here, skipping byval + initializes in DSE (FunctionAttrs already doesn't propagate initializes past byval). I'm open to going in the other direction though. Fixes https://github.com/llvm/llvm-project/issues/126181.	2025-02-10 10:34:03 +01:00
Ricardo Jesus	5f84b6edd9	[AArch64] Add MATCH loops to LoopIdiomVectorizePass (#101976 ) This patch adds a new loop to LoopIdiomVectorizePass, enabling it to recognise and vectorise loops such as: ```cpp template<class InputIt, class ForwardIt> InputIt find_first_of(InputIt first, InputIt last, ForwardIt s_first, ForwardIt s_last) { for (; first != last; ++first) for (ForwardIt it = s_first; it != s_last; ++it) if (first == it) return first; return last; } ``` These loops match the C++ standard library function `std::find_first_of`.	2025-02-10 08:23:34 +00:00
Elvis Wang	2e3729bf40	[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288 ) We should only query the computeCost() when the VF is vector.	2025-02-10 08:40:28 +08:00
Kazu Hirata	df25511f0e	[Coroutines] Avoid repeated hash lookups (NFC) (#126432 )	2025-02-09 13:35:12 -08:00
Hassnaa Hamdi	e9a20f77ee	Reland "[LV]: Teach LV to recursively (de)interleave." (#125094 ) This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by #125434 Original patch: https://github.com/llvm/llvm-project/pull/89018 Reviewed-by: paulwalker-arm, Mel-Chen	2025-02-09 19:21:54 +00:00
Florian Hahn	32c4493d5f	[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI). Follow-up as discussed when using VPInstruction::ResumePhi for all resume values (#112147). This patch explicitly adds incoming values for each predecessor in VPlan. This simplifies codegen and allows transformations adjusting the predecessors of blocks with NFC modulo incoming block order in phis.	2025-02-09 11:20:20 +00:00
vporpo	69b8cf4f06	[SandboxVec][BottomUpVec] Add cost estimation and tr-accept-or-revert pass (#126325 ) The TransactionAcceptOrRevert pass is the final pass in the Sandbox Vectorizer's default pass pipeline. It's job is to check the cost before/after vectorization and accept or revert the IR to its original state. Since we are now starting the transaction in BottomUpVec, tests that run a custom pipeline need to accept the transaction. This is done with the help of the TransactionAlwaysAccept pass (tr-accept).	2025-02-08 08:34:18 -08:00
Florian Hahn	6ff8a06de9	[VPlan] Run recipe removal and simplification after optimizeForVFAndUF. (#125926 ) Run recipe simplification and dead recipe removal after VPlan-based unrolling and optimizeForVFAndUF, to clean up any redundant or dead recipes introduced by them. Currently this is NFC, as it removes the corresponding removeDeadRecipes run in optimizeForVFAndUF and no additional simplifications kick in after unrolling yet. That is changing with https://github.com/llvm/llvm-project/pull/123655. Note that with this change, pattern-matching is now applied after EVL-based recipes have been introduced. Trying to match VPWidenEVLRecipe when not explicitly requested might apply a pattern with 2 operands to one with 3 due to the extra EVL operand and VPWidenEVLRecipe being a subclass of VPWidenRecipe. To prevent this, update Recipe_match::match to only match VPWidenEVLRecipe if it is in the requested recipe types (RecipeTy). PR: https://github.com/llvm/llvm-project/pull/125926	2025-02-08 13:33:46 +00:00
Florian Hahn	ee806646ad	[VPlan] Consistently use hasScalarVFOnly (NFC). Consistently use hasScalarVFOnly instead of using hasVF(ElementCount::getFixed(1)). Also add an assert to ensure all cases are covered by hasScalarVFOnly.	2025-02-08 12:19:25 +00:00
Florian Hahn	16df836a52	[VPlan] Mark hasVF & hasScalableVF as const (NFC).	2025-02-08 11:32:23 +00:00
Kazu Hirata	5901bda5a0	[Vectorize] Avoid repeated hash lookups (NFC) (#126345 )	2025-02-08 00:48:51 -08:00
Kazu Hirata	80a4718200	[GVNHoist] Avoid repeated hash lookups (NFC) (#126189 )	2025-02-07 07:59:53 -08:00
Florian Hahn	1611059f5d	[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434 ) As exposed by https://github.com/llvm/llvm-project/pull/125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: https://github.com/llvm/llvm-project/pull/125434	2025-02-07 15:27:31 +00:00
David Sherwood	3872e55758	[LoopVectorize] Fix build error (#126218 ) Fixes issue caused by 1930524bbde3cd26ff527bbdb5e1f937f484edd6 Unused variable UsesMask in LoopVectorize.cpp	2025-02-07 10:16:32 +00:00
David Sherwood	1930524bbd	[LoopVectorize] Fix cost model assert when vectorising calls (#125716 ) The legacy and vplan cost models did not agree because VPWidenCallRecipe::computeCost only calculates the cost of the call instruction, whereas LoopVectorizationCostModel::setVectorizedCallDecision in some cases adds on the cost of a synthesised mask argument. However, this mask is always 'splat(i1 true)' which should be hoisted out of the loop during codegen. In order to synchronise the two cost models I have two options: 1) Also add the cost of the splat to the vplan model, or 2) Remove the cost of the splat from the legacy model. I chose 2) because I feel this more closely represents what the final code will look like. There is an argument that we should take account of such broadcast costs in the preheader when deciding if it's profitable to vectorise a loop, however there isn't currently a mechanism to do this. We currently only take account of the runtime checks when assessing profitability and what the minimum trip count should be. However, I don't believe this work needs doing as part of this PR.	2025-02-07 09:36:52 +00:00
James Chesterman	ac158aa13b	[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268 ) Does a select on the input rather than the output. This way the mask has the same number of lanes as the other operand in the select instruction.	2025-02-07 09:09:10 +00:00
Kazu Hirata	b7feccb31d	[memprof] Dump call site matching information (#125130 ) MemProfiler.cpp annotates the IR with the memory profile so that we can later duplicate context. This patch dumps the entire inline call stack for each call site match.	2025-02-06 23:37:10 -08:00
Mel Chen	4d3148d926	[LV][EVL] Fix the check for legality of folding with EVL. (#125678 ) The current legality check for folding with EVL has incomplete verification for VF. This patch fixes the VF check, ensuring that tail folding with EVL is enabled only when a scalable VF is available. This allows loops that prefer tail folding with EVL but cannot use scalable VF vectorization to still be vectorized using a fixed VF, rather than abandoning vectorization entirely.	2025-02-07 12:53:10 +08:00
Yingwei Zheng	9cd83d6ea2	[InstCombine] Drop samesign in `foldLogOpOfMaskedICmps` (#125829 ) Alive2: https://alive2.llvm.org/ce/z/6zLAYp Note: We can also apply this fix to the logic below (`if (Mask & AMask_NotAllOnes)`), but it seems unreachable.	2025-02-07 11:56:52 +08:00
Joseph Huber	d9500f5032	[OpenMP] Fix the OpenMPOpt pass incorrectly optimizing if definition was missing Summary: This code is intended to block transformations if the call isn't present, however the way it's coded it silently lets it pass if the definition doesn't exist at all. This previously was always valid since we included the runtime as one giant blob so everything was always there, but now that we want to move towards separate ones, it's not quite correct.	2025-02-06 21:38:36 -06:00
Luke Lau	d0f122b9c5	[LV] Update incoming blocks in VPWidenPHIRecipe in reassociateBlocks (#125481 ) This is extracted from #118638 After c7ebe4f we will crash in fixNonInductionPHIs if we use a VPWidenPHIRecipe with the vector preheader as an incoming block, because the phi will reference the old non-IRBB vector preheader. This fixes this by updating VPBlockUtils::reassociateBlocks to update any VPWidenPHIRecipes's incoming blocks. This assumes that if the VPWidenPHIRecipe is in a VPRegionBlock, it's in the entry block, and that we are replacing a VPBasicBlock with another VPBasicBlock.	2025-02-07 08:50:35 +08:00
vporpo	a0d86b23c0	[SandboxVec][Scheduler] Notify scheduler about instruction creation (#126141 ) This patch implements the vectorizer's callback for getting notified about new instructions being created. This updates the scheduler state, which may involve removing dependent instructions from the ready list and update the "scheduled" flag. Since we need to remove elements from the ready list, this patch also implements the `remove()` operation.	2025-02-06 15:45:44 -08:00
vporpo	166b2e8837	[SandboxVec][DAG] Update DAG when a new instruction is created (#126124 ) The DAG will now receive a callback whenever a new instruction is created and will update itself accordingly.	2025-02-06 14:12:03 -08:00
Teresa Johnson	1dbfbb5ce6	[MemProf] Stop cloning traversal on single allocation type (#126131 ) We were previously checking this after recursing on all callers, but if we already have a single allocation type there is no need to even look at any callers. Didn't show a significant improvement overall, but it does reduce the count of times we enter the identifyClones and do other checks.	2025-02-06 13:21:02 -08:00
Florian Hahn	049aa179dc	[VPlan] Simplify operand tuple matching in VPlanPatternMatch (NFC). Remove some indirection when matching recipe and matcher operands by directly using fold over parameter pack.	2025-02-06 21:00:44 +00:00
David Pagan	a5fc7c3ac1	[clang][OpenMP] New OpenMP 6.0 assumption clause, 'no_openmp_constructs' (#125933 ) Add initial parsing/sema support for new assumption clause so clause can be specified. For now, it's ignored, just like the others. Added support for 'no_openmp_construct' to release notes. Testing - Updated appropriate LIT tests. - Testing: check-all	2025-02-06 12:41:10 -08:00

1 2 3 4 5 ...

38904 Commits