llvm-project

Author	SHA1	Message	Date
Tyler Nowicki	9a9f155df1	[Coroutines] Split buildCoroutineFrame into normalization and frame building (#108076 ) * Split buildCoroutineFrame into code related to normalization and code related to actually building the coroutine frame. * This will enable future specialization of buildCoroutineFrame for different ABIs while the normalization can be done by splitCoroutine prior to calling buildCoroutineFrame. See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057	2024-09-11 10:29:06 -04:00
Nikita Popov	2afe678f0a	[MemCpyOpt] Allow memcpy elision for non-noalias arguments (#107860 ) We currently elide memcpys for readonly nocapture noalias arguments. noalias is checked to make sure that there are no other ways to write the memory, e.g. through a different argument or an escaped pointer. In addition to the current noalias check, also query alias analysis, in case it can prove that modification is not possible through other means. This fixes the problem reported in https://discourse.llvm.org/t/problem-about-memcpy-elimination/81121.	2024-09-11 10:04:37 +02:00
AdityaK	3c9022c965	Bail out jump threading on indirect branches (#103688 ) The bug was introduced by https://github.com/llvm/llvm-project/pull/68473 Fixes: #102351	2024-09-10 22:39:02 -07:00
Kazu Hirata	3dad29b677	[LTO] Remove unused includes (NFC) (#108110 ) clangd reports these as unused headers. My manual inspection agrees with the findings.	2024-09-10 19:36:04 -07:00
Teresa Johnson	ae5f1a78d3	[MemProf] Convert CallContextInfo to a struct (NFC) (#108086 ) As suggested in #107918, improve readability by converting this tuple to a struct.	2024-09-10 16:27:56 -07:00
Florian Hahn	e3c537ff90	[VPlan] Consider non-header phis in planContainsAdditionalSimp. Update planContainsAdditionalSimplifications to also check phis not in the loop header. This ensures we don't miss cases where VPBlendRecipes (which correspond to such phis) have been simplified. Fixes https://github.com/llvm/llvm-project/issues/107473.	2024-09-10 21:37:14 +01:00
Tyler Nowicki	f4e2d7bfc1	[Coroutines] Move spill related methods to a Spill utils (#107884 ) * Move code related to spilling into SpillUtils to help cleanup CoroFrame See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057	2024-09-10 15:43:57 -04:00
Shubham Sandeep Rastogi	7a91af4f87	Add DIExpression::foldConstantMath to CoroSplit (#107933 ) The CoroSplit pass has it's own salvageDebugInfo implementation and it's DIExpressions do not get folded. Add a call to DIExpression::foldConstantMath in the CoroSplit pass to reduce the size of those DIExpressions. [The compile time tracker shows no significant increase in compile time either.](https://llvm-compile-time-tracker.com/compare.php?from=bdf02249e7f8f95177ff58c881caf219699acb98&to=e1c1c1759c06bc4c42f79eebdb0e3cd45219cef4&stat=instructions:u) rdar://134675402	2024-09-10 11:27:01 -07:00
Teresa Johnson	524a028f69	[MemProf] Streamline and avoid unnecessary context id duplication (#107918 ) Sort the list of calls such that those with the same stack ids are also sorted by function. This allows processing of all matching calls (that can share a context node) in bulk as they are all adjacent. This has 2 benefits: 1. It reduces unnecessary work, specifically the handling to intersect the context ids with those along the graph edges for the stack ids, for calls that we know can share a node. 2. It simplifies detecting when we have matching stack ids but don't need to duplicate context ids. Specifically, we were previously still duplicating context ids whenever we saw another call with the same stack ids, but that isn't necessary if they will share a context node. With this change we now only duplicate context ids if we see some that not only have the same ids but also are in different functions. This change reduced the amount of context id duplication and provided reductions in both both peak memory (~8%) and time (~%5) for a large target.	2024-09-10 10:11:33 -07:00
Johannes Doerfert	56a033462e	[Attributor] Keep track of reached returns in AAPointerInfo (#107479 ) Instead of visiting call sites in Attribute::checkForAllUses, we now keep track of returns in AAPointerInfo and use the call site return information as required. This way, the user of AAPointerInfo(CallSite)Argument can determine if the call return should be visited. We do not collect them as "may accesses" in the AAPointerInfo(CallSite)Argument itself in case a return user is found.	2024-09-10 08:13:21 -07:00
Han-Kuan Chen	0ccc6092d2	[VectorCombine] Add foldShuffleOfIntrinsics. (#106502 )	2024-09-10 21:10:09 +08:00
Florian Hahn	a794ee4559	[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305 ) Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only generated if there are users of VF, to avoid unnecessary test changes. PR: https://github.com/llvm/llvm-project/pull/95305	2024-09-10 10:41:35 +01:00
Igor Kirillov	bf694841f5	[VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (#104606 ) Check that `binop(zext(value)`, other) is possible and profitable to transform into: `zext(binop(value, trunc(other)))`. When CPU architecture has illegal scalar type iX, but vector type <N * iX> is legal, scalar expressions before vectorisation may be extended to a legal type iY. This extension could result in underutilization of vector lanes, as more lanes could be used at one instruction with the lower type. Vectorisers may not always recognize opportunities for type shrinking, and this patch aims to address that limitation.	2024-09-10 10:09:03 +01:00
Yuxuan Chen	761bf333e3	[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897 ) After landing https://github.com/llvm/llvm-project/pull/99285 we found that the call graph update was causing the following crash when expensive checks are turned on ``` llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC \|\| RC->isAncestorOf(Targe tRC)) && "New call edge is not trivial!"' failed. ``` I have to admit I believe that the call graph update process I did for that patch could be wrong. After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced that `CoroAnnotationElidePass` can be a FunctionPass and rely on the adaptor to update the call graph for us, so long as we properly invalidate the caller's analyses. After this patch, `llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer fails under expensive checks.	2024-09-09 18:57:39 -07:00
Mircea Trofin	3b22618094	[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499 ) This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-09 18:16:24 -07:00
Alexey Bataev	b3d2d5039b	[SLP][NFC]Reorder code for better structural complexity, NFC	2024-09-09 12:33:18 -07:00
Tyler Nowicki	ea2da571c7	[Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source (#106306 ) * Move the SuspendCrossingInfo analysis helper into its own header/source See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057 Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>	2024-09-09 11:50:27 -04:00
Teresa Johnson	e46f03bc31	[MemProf] Remove unnecessary data structure (NFC) (#107643 ) Recent change #106623 added the CallToFunc map, but I subsequently realized the same information is already available for the calls being examined in the StackIdToMatchingCalls map we're iterating through.	2024-09-09 08:17:41 -07:00
Kazu Hirata	a2f659c134	[StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797 )	2024-09-09 07:15:12 -07:00
Kazu Hirata	3940a1ba14	[Float2Int] Avoid repeated hash lookups (NFC) (#107795 )	2024-09-09 07:13:52 -07:00
Florian Hahn	1a5a1e9781	[VPlan] Assert that VFxUF is always used. Add assertion to ensure invariant discussed in https://github.com/llvm/llvm-project/pull/95305.	2024-09-09 14:26:09 +01:00
Sergey Kachkov	1f2a634c44	Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.	2024-09-09 16:14:51 +03:00
Yuxuan Chen	a416267a5f	[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the `noalloc` variant (#99285 ) This patch is episode three of the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 After we attribute the calls to some coroutines as "coro_elide_safe" in the C++ FE and creating a `noalloc` ramp function, we use a new middle end pass to move the call to coroutines to the noalloc variant. This pass should be run after CoroSplit. For each node we process in CoroSplit, we look for its callers and replace the attributed ones in presplit coroutines to the noalloc one. The transformed `noalloc` ramp function will also require a frame pointer to a block of memory it can use as an activation frame. We allocate this on the caller's frame with an alloca. Please note that we cannot safely transform such attributed calls in post-split coroutines due to memory lifetime reasons. The CoroSplit pass is responsible for creating the coroutine frame spills for all the allocas in the coroutine. Therefore it will be unsafe to create new allocas like this one in post-split coroutines. This happens relatively rarely because CGSCC performs the passes on the callees before the caller. However, if multiple coroutines coexist in one SCC, this situation does happen (and prevents us from having potentially unbound frame size due to recursion.) You can find episode 1: Clang FE of this patch series at https://github.com/llvm/llvm-project/pull/99282 Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283	2024-09-08 23:09:40 -07:00
Yuxuan Chen	234cc81625	[LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (#99283 ) This patch is episode two of the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 Previously CoroElide depends on inlining, and its analysis does not work very well with code generated by the C++ frontend due the existence of many customization points. There has been issue reported to upstream how ineffective the original CoroElide was in real world applications. For C++ users, this set of patches aim to fix this problem by providing library authors and users deterministic HALO behaviour for some well-behaved coroutine `Task` types. The stack begins with a library side attribute on the `Task` class that guarantees no unstructured concurrency when coroutines are awaited directly with `co_await`ed as a prvalue. This attribute on Task types gives us lifetime guarantees and makes C++ FE capable to telling the ME which coroutine calls are elidable. We convey such information from FE through the attribute `coro_elide_safe`. This patch modifies CoroSplit to create a variant of the coroutine ramp function that 1) does not use heap allocated frame, instead take an additional parameter as the pointer to the frame. Such parameter is attributed with `dereferenceble` and `align` to convey size and align requirements for the frame. 2) always stores cleanup instead of destroy address for `coro.destroy()` actions. In a later patch, we will have a new pass that runs right after CoroSplit to find usages of the callee coroutine attributed `coro_elide_safe` in presplit coroutine callers, allocates the frame on its "stack", transform those usages to call the `noalloc` ramp function variant. (note I put quotes on the word "stack" here, because for presplit coroutine, any alloca will be spilled into the frame when it's being split) The C++ Frontend attribute implementation that works with this change can be found at https://github.com/llvm/llvm-project/pull/99282 The pass that makes use of the new `noalloc` split can be found at https://github.com/llvm/llvm-project/pull/99285	2024-09-08 23:09:20 -07:00
Yuxuan Chen	e17a39bc31	[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282 ) This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 This patch proposes a C++ struct/class attribute `[[clang::coro_await_elidable]]`. This notion of await elidable task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way. Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example: ``` Task foo(); Task bar() { co_await foo(); } ``` For CoroElide to happen, the ramp function of `foo` must be inlined into `bar`. This inlining happens after `foo` has been split but `bar` is usually still a presplit coroutine. If `foo` is indeed a coroutine, the inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide then runs an analysis to figure out whether the SSA value of `coro.begin()` of `foo` gets destroyed before `bar` terminates. `Task` types are rarely simple enough for the destroy logic of the task to reference the SSA value from `coro.begin()` directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen. The approach we want to take with this language extension generally originates from the philosophy that library implementations of `Task` types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller. The ``[[clang::coro_await_elidable]]`` is a class attribute which can be applied to a coroutine return type. When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met: - callee coroutine function returns a type that is annotated with ``[[clang::coro_await_elidable]]``. - In caller coroutine, the return value of the callee is a prvalue that is immediately `co_await`ed. From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements `Task` types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs. After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of `co_await` expressions (not `operator co_await`). If it's also attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision. The original patch version is https://github.com/llvm/llvm-project/pull/94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs. The middle end CoroSplit patch can be found at https://github.com/llvm/llvm-project/pull/99283 The middle end transformation that performs the elide can be found at https://github.com/llvm/llvm-project/pull/99285	2024-09-08 23:08:58 -07:00
Simon Pilgrim	97e6f92d31	Fix GCC Wparentheses warning. NFC.	2024-09-08 13:34:34 +01:00
Kazu Hirata	bc59b638ae	[Vectorize] Avoid repeated hash lookups (NFC) (#107729 )	2024-09-08 00:08:32 -07:00
Kazu Hirata	f5aad24820	[IROutliner] Avoid repeated hash lookups (NFC) (#107726 )	2024-09-08 00:07:45 -07:00
Chaitanya	49e38606cd	[Sanitizer] Create DiagnosticInfoInstrumentation for IR Instrumentation reporting. (#106356 ) This PR adds DK_Instrumentation enum to DiagnosticKind and DiagnosticInfoInstrumentation is extended from DiagnosticsInfo for IR instrumentation reporting.	2024-09-08 10:10:16 +05:30
Kazu Hirata	caebb4562c	[Transforms] Avoid repeated hash looksup (NFC) (#107727 )	2024-09-07 18:16:06 -07:00
Kazu Hirata	23a26e7120	[DFAJumpThreading] Avoid repeated hash lookups (NFC) (#107670 )	2024-09-07 08:22:21 -07:00
Mircea Trofin	fe6c025037	[nfc][ctx_prof] Fix the second source of nondeterminism in `CtxProfAnalysisPrinterPass` Verified on a build with `LLVM_REVERSE_ITERATION=ON` Issue #106855	2024-09-06 21:54:23 -07:00
Mircea Trofin	d7fb5b9df0	[ctx_prof] PGOCtxProfFlattener must always return `PreservedAnalyses::none()` This is because it always removes instrumentation. This fixes failures detectable with extensive checks, e.g. https://lab.llvm.org/buildbot/#/builders/187/builds/987 (Related to PR #107329)	2024-09-06 20:02:18 -07:00
dyung	2bf551e600	Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666 ) Reverts llvm/llvm-project#107380 Change is causing the test preserve-lcssa.ll to fail on at least 2 build bots: - https://lab.llvm.org/buildbot/#/builders/190/builds/5231 - https://lab.llvm.org/buildbot/#/builders/161/builds/1855	2024-09-06 19:54:26 -07:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Mircea Trofin	dc62bc8909	[nfc][ctx_prof] Remove spurious `#include` in PGOCtxProfFlattening.cpp Re. PR ##107329, 2 includes weren't necessary - the CodeGen one, in particular, seemed accidentally (IDE) introduced.	2024-09-06 15:42:46 -07:00
Kazu Hirata	f6df5cd24d	[CtxProf] Fix warnings This patch fixes: llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp:214:14: error: unused variable 'Index' [-Werror,-Wunused-variable] llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp:284:6: error: unused function 'areAllBBsReachable' [-Werror,-Wunused-function]	2024-09-06 14:57:43 -07:00
Mircea Trofin	775c50709c	[ctx_prof] Flattened profile lowering pass (#107329 ) Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened. Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes. We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point. Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)	2024-09-06 13:47:08 -07:00
Ramkumar Ramachandra	a6577791d4	LV: fix style after cursory reading (NFC) (#105830 )	2024-09-06 18:41:56 +01:00
Shilei Tian	ce2e38653f	[Attributor] Add support for atomic operations in `AAAddressSpace` (#106927 )	2024-09-06 12:45:16 -04:00
Kazu Hirata	ce192b87b2	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1278:12: error: unused variable 'Op0' [-Werror,-Wunused-variable]	2024-09-06 09:12:06 -07:00
Kolya Panchenko	00e40c9b5b	[LV] Support binary and unary operations with EVL-vectorization (#93854 ) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` in `tryAddExplicitVectorLength` for each binary and unary operations. Follow up patches will extend support for remaining cases, like `FCmp` and `ICmp`	2024-09-06 11:41:36 -04:00
Sergey Kachkov	2cb4d1b1bd	[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice.	2024-09-06 18:39:47 +03:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Shilei Tian	109cd11dc4	[Attributor] Skip AS specialization for volatile memory instructions (#107250 )	2024-09-06 11:00:30 -04:00
Kazu Hirata	bd1559533d	[IndVars] Avoid repeated hash lookups (NFC) (#107513 )	2024-09-06 07:40:27 -07:00
Yingwei Zheng	52fac608bd	[InstCombine] Fold `[l\|a]shr iN (X-1)&~X, N-1 -> [z\|s]ext(X==0)` (#107259 ) Alive2: https://alive2.llvm.org/ce/z/kwvTFn Closes #107228. `ashr iN (X-1)&~X, N-1` also exists. See https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1274.	2024-09-06 21:37:50 +08:00
ErikHogeman	78e1e6ace6	[LV] Check for vector-to-scalar casts in legalizer (#106244 ) The code makes assumptions later on the operations and their inputs being scalar in the loops that are processed, so we should make sure this is the case in the legalizer.	2024-09-06 11:20:14 +02:00
hanbeom	861caf9b31	[SCCP] Remove LoadInst if it loaded from Constant GlobalVariable (#107245 ) This patch removes the `LoadInst` when it loaded from Constant GlobalVariable. This allows `canRemoveInstruction` function to be removed.	2024-09-06 10:16:30 +02:00
Kazu Hirata	144314eaa5	[SLPVectorizer] Avoid repeated hash lookups (NFC) (#107491 )	2024-09-05 19:04:56 -07:00

... 2 3 4 5 6 ...

37616 Commits