37616 Commits

Author SHA1 Message Date
Tyler Nowicki
9a9f155df1
[Coroutines] Split buildCoroutineFrame into normalization and frame building (#108076)
* Split buildCoroutineFrame into code related to normalization and code
related to actually building the coroutine frame.
* This will enable future specialization of buildCoroutineFrame for
different ABIs while the normalization can be done by splitCoroutine
prior to calling buildCoroutineFrame.

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
2024-09-11 10:29:06 -04:00
Nikita Popov
2afe678f0a
[MemCpyOpt] Allow memcpy elision for non-noalias arguments (#107860)
We currently elide memcpys for readonly nocapture noalias arguments.
noalias is checked to make sure that there are no other ways to write
the memory, e.g. through a different argument or an escaped pointer.

In addition to the current noalias check, also query alias analysis, in
case it can prove that modification is not possible through other means.

This fixes the problem reported in
https://discourse.llvm.org/t/problem-about-memcpy-elimination/81121.
2024-09-11 10:04:37 +02:00
AdityaK
3c9022c965
Bail out jump threading on indirect branches (#103688)
The bug was introduced by
https://github.com/llvm/llvm-project/pull/68473

Fixes: #102351
2024-09-10 22:39:02 -07:00
Kazu Hirata
3dad29b677
[LTO] Remove unused includes (NFC) (#108110)
clangd reports these as unused headers.  My manual inspection agrees
with the findings.
2024-09-10 19:36:04 -07:00
Teresa Johnson
ae5f1a78d3
[MemProf] Convert CallContextInfo to a struct (NFC) (#108086)
As suggested in #107918, improve readability by converting this tuple to
a struct.
2024-09-10 16:27:56 -07:00
Florian Hahn
e3c537ff90
[VPlan] Consider non-header phis in planContainsAdditionalSimp.
Update planContainsAdditionalSimplifications to also check phis not in
the loop header. This ensures we don't miss cases where VPBlendRecipes
(which correspond to such phis) have been simplified.

Fixes https://github.com/llvm/llvm-project/issues/107473.
2024-09-10 21:37:14 +01:00
Tyler Nowicki
f4e2d7bfc1
[Coroutines] Move spill related methods to a Spill utils (#107884)
* Move code related to spilling into SpillUtils to help cleanup
CoroFrame

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
2024-09-10 15:43:57 -04:00
Shubham Sandeep Rastogi
7a91af4f87
Add DIExpression::foldConstantMath to CoroSplit (#107933)
The CoroSplit pass has it's own salvageDebugInfo implementation and it's
DIExpressions do not get folded. Add a call to
DIExpression::foldConstantMath in the CoroSplit pass to reduce the size
of those DIExpressions.

[The compile time tracker shows no significant increase in compile time
either.](https://llvm-compile-time-tracker.com/compare.php?from=bdf02249e7f8f95177ff58c881caf219699acb98&to=e1c1c1759c06bc4c42f79eebdb0e3cd45219cef4&stat=instructions:u)

rdar://134675402
2024-09-10 11:27:01 -07:00
Teresa Johnson
524a028f69
[MemProf] Streamline and avoid unnecessary context id duplication (#107918)
Sort the list of calls such that those with the same stack ids are also
sorted by function. This allows processing of all matching calls (that
can share a context node) in bulk as they are all adjacent.

This has 2 benefits:
1. It reduces unnecessary work, specifically the handling to intersect
   the context ids with those along the graph edges for the stack ids,
   for calls that we know can share a node.
2. It simplifies detecting when we have matching stack ids but don't
   need to duplicate context ids. Specifically, we were previously
   still duplicating context ids whenever we saw another call with the
   same stack ids, but that isn't necessary if they will share a context
   node. With this change we now only duplicate context ids if we see
   some that not only have the same ids but also are in different
   functions.

This change reduced the amount of context id duplication and provided
reductions in both both peak memory (~8%) and time (~%5) for a large
target.
2024-09-10 10:11:33 -07:00
Johannes Doerfert
56a033462e
[Attributor] Keep track of reached returns in AAPointerInfo (#107479)
Instead of visiting call sites in Attribute::checkForAllUses, we now
keep track of returns in AAPointerInfo and use the call site return
information as required. This way, the user of
AAPointerInfo(CallSite)Argument can determine if the call return should
be visited. We do not collect them as "may accesses" in the
AAPointerInfo(CallSite)Argument itself in case a return user is found.
2024-09-10 08:13:21 -07:00
Han-Kuan Chen
0ccc6092d2
[VectorCombine] Add foldShuffleOfIntrinsics. (#106502) 2024-09-10 21:10:09 +08:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Igor Kirillov
bf694841f5
[VectorCombine] Add type shrinking and zext propagation for fixed-width vector types (#104606)
Check that `binop(zext(value)`, other) is possible and profitable to transform
into: `zext(binop(value, trunc(other)))`.
When CPU architecture has illegal scalar type iX, but vector type <N * iX> is
legal, scalar expressions before vectorisation may be extended to a legal
type iY. This extension could result in underutilization of vector lanes,
as more lanes could be used at one instruction with the lower type.
Vectorisers may not always recognize opportunities for type shrinking, and
this patch aims to address that limitation.
2024-09-10 10:09:03 +01:00
Yuxuan Chen
761bf333e3
[LLVM][Coroutines] Switch CoroAnnotationElidePass to a FunctionPass (#107897)
After landing https://github.com/llvm/llvm-project/pull/99285 we found
that the call graph update was causing the following crash when
expensive checks are turned on
```
llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
tRC)) && "New call edge is not trivial!"' failed.                                                                                                                                                                                                                                                                               
```
I have to admit I believe that the call graph update process I did for
that patch could be wrong.

After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
adaptor to update the call graph for us, so long as we properly
invalidate the caller's analyses.

After this patch,
`llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
fails under expensive checks.
2024-09-09 18:57:39 -07:00
Mircea Trofin
3b22618094
[ctx_prof] Insert the ctx prof flattener after the module inliner (#107499)
This patch enables experimenting with the contextual profile. ICP is currently disabled in this case - will reenable it subsequently. Also subsequently the inline cost model / decision making would be updated to be context-aware. Right now, this just achieves "complete use" of the profile, in that it's ingested, maintained, and sunk to a flat profile when not needed anymore.

Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
2024-09-09 18:16:24 -07:00
Alexey Bataev
b3d2d5039b [SLP][NFC]Reorder code for better structural complexity, NFC 2024-09-09 12:33:18 -07:00
Tyler Nowicki
ea2da571c7
[Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source (#106306)
* Move the SuspendCrossingInfo analysis helper into its own
header/source

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>
2024-09-09 11:50:27 -04:00
Teresa Johnson
e46f03bc31
[MemProf] Remove unnecessary data structure (NFC) (#107643)
Recent change #106623 added the CallToFunc map, but I subsequently
realized the same information is already available for the calls being
examined in the StackIdToMatchingCalls map we're iterating through.
2024-09-09 08:17:41 -07:00
Kazu Hirata
a2f659c134
[StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797) 2024-09-09 07:15:12 -07:00
Kazu Hirata
3940a1ba14
[Float2Int] Avoid repeated hash lookups (NFC) (#107795) 2024-09-09 07:13:52 -07:00
Florian Hahn
1a5a1e9781
[VPlan] Assert that VFxUF is always used.
Add assertion to ensure invariant discussed in
https://github.com/llvm/llvm-project/pull/95305.
2024-09-09 14:26:09 +01:00
Sergey Kachkov
1f2a634c44 Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.

Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation
when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are
moved into separate duplicated-phis.ll test with explicit x86 target requirement
to fix bots which are not building this target.
2024-09-09 16:14:51 +03:00
Yuxuan Chen
a416267a5f
[LLVM][Coroutines] Transform "coro_elide_safe" calls to switch ABI coroutines to the noalloc variant (#99285)
This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.

This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.

Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)

You can find episode 1: Clang FE of this patch series at
https://github.com/llvm/llvm-project/pull/99282
Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
2024-09-08 23:09:40 -07:00
Yuxuan Chen
234cc81625
[LLVM][Coroutines] Create .noalloc variant of switch ABI coroutine ramp functions during CoroSplit (#99283)
This patch is episode two of the coroutine HALO improvement project
published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

Previously CoroElide depends on inlining, and its analysis does not work
very well with code generated by the C++ frontend due the existence of
many customization points. There has been issue reported to upstream how
ineffective the original CoroElide was in real world applications.

For C++ users, this set of patches aim to fix this problem by providing
library authors and users deterministic HALO behaviour for some
well-behaved coroutine `Task` types. The stack begins with a library
side attribute on the `Task` class that guarantees no unstructured
concurrency when coroutines are awaited directly with `co_await`ed as a
prvalue. This attribute on Task types gives us lifetime guarantees and
makes C++ FE capable to telling the ME which coroutine calls are
elidable. We convey such information from FE through the attribute
`coro_elide_safe`.

This patch modifies CoroSplit to create a variant of the coroutine ramp
function that 1) does not use heap allocated frame, instead take an
additional parameter as the pointer to the frame. Such parameter is
attributed with `dereferenceble` and `align` to convey size and align
requirements for the frame. 2) always stores cleanup instead of destroy
address for `coro.destroy()` actions.

In a later patch, we will have a new pass that runs right after
CoroSplit to find usages of the callee coroutine attributed
`coro_elide_safe` in presplit coroutine callers, allocates the frame on
its "stack", transform those usages to call the `noalloc` ramp function
variant.

(note I put quotes on the word "stack" here, because for presplit
coroutine, any alloca will be spilled into the frame when it's being
split)

The C++ Frontend attribute implementation that works with this change
can be found at https://github.com/llvm/llvm-project/pull/99282
The pass that makes use of the new `noalloc` split can be found at
https://github.com/llvm/llvm-project/pull/99285
2024-09-08 23:09:20 -07:00
Yuxuan Chen
e17a39bc31
[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
  co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.

`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.

The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.

When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.

From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.

The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
2024-09-08 23:08:58 -07:00
Simon Pilgrim
97e6f92d31 Fix GCC Wparentheses warning. NFC. 2024-09-08 13:34:34 +01:00
Kazu Hirata
bc59b638ae
[Vectorize] Avoid repeated hash lookups (NFC) (#107729) 2024-09-08 00:08:32 -07:00
Kazu Hirata
f5aad24820
[IROutliner] Avoid repeated hash lookups (NFC) (#107726) 2024-09-08 00:07:45 -07:00
Chaitanya
49e38606cd
[Sanitizer] Create DiagnosticInfoInstrumentation for IR Instrumentation reporting. (#106356)
This PR adds DK_Instrumentation enum to DiagnosticKind and
DiagnosticInfoInstrumentation is extended from DiagnosticsInfo for IR
instrumentation reporting.
2024-09-08 10:10:16 +05:30
Kazu Hirata
caebb4562c
[Transforms] Avoid repeated hash looksup (NFC) (#107727) 2024-09-07 18:16:06 -07:00
Kazu Hirata
23a26e7120
[DFAJumpThreading] Avoid repeated hash lookups (NFC) (#107670) 2024-09-07 08:22:21 -07:00
Mircea Trofin
fe6c025037 [nfc][ctx_prof] Fix the second source of nondeterminism in CtxProfAnalysisPrinterPass
Verified on a build with `LLVM_REVERSE_ITERATION=ON`

Issue #106855
2024-09-06 21:54:23 -07:00
Mircea Trofin
d7fb5b9df0 [ctx_prof] PGOCtxProfFlattener must always return PreservedAnalyses::none()
This is because it always removes instrumentation. This fixes failures
detectable with extensive checks, e.g. https://lab.llvm.org/buildbot/#/builders/187/builds/987

(Related to PR #107329)
2024-09-06 20:02:18 -07:00
dyung
2bf551e600
Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666)
Reverts llvm/llvm-project#107380

Change is causing the test preserve-lcssa.ll to fail on at least 2 build
bots:
- https://lab.llvm.org/buildbot/#/builders/190/builds/5231
- https://lab.llvm.org/buildbot/#/builders/161/builds/1855
2024-09-06 19:54:26 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
Mircea Trofin
dc62bc8909 [nfc][ctx_prof] Remove spurious #include in PGOCtxProfFlattening.cpp
Re. PR ##107329, 2 includes weren't necessary - the CodeGen one, in
particular, seemed accidentally (IDE) introduced.
2024-09-06 15:42:46 -07:00
Kazu Hirata
f6df5cd24d [CtxProf] Fix warnings
This patch fixes:

  llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp:214:14:
  error: unused variable 'Index' [-Werror,-Wunused-variable]

  llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp:284:6:
  error: unused function 'areAllBBsReachable'
  [-Werror,-Wunused-function]
2024-09-06 14:57:43 -07:00
Mircea Trofin
775c50709c
[ctx_prof] Flattened profile lowering pass (#107329)
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.

Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes.

We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.

Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
2024-09-06 13:47:08 -07:00
Ramkumar Ramachandra
a6577791d4
LV: fix style after cursory reading (NFC) (#105830) 2024-09-06 18:41:56 +01:00
Shilei Tian
ce2e38653f
[Attributor] Add support for atomic operations in AAAddressSpace (#106927) 2024-09-06 12:45:16 -04:00
Kazu Hirata
ce192b87b2 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1278:12: error:
  unused variable 'Op0' [-Werror,-Wunused-variable]
2024-09-06 09:12:06 -07:00
Kolya Panchenko
00e40c9b5b
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
2024-09-06 11:41:36 -04:00
Sergey Kachkov
2cb4d1b1bd
[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380)
Motivating example: https://godbolt.org/z/eb97zrxhx
Here we have 2 induction variables in the loop: one is corresponding to
i variable (add rdx, 4), the other - to res (add rax, 2). The second
induction variable can be removed by rewriteLoopExitValues() method
(final value of res at loop exit is unroll_iter * -2); however, this
doesn't happen because we have duplicated LCSSA phi nodes at loop exit:
```
; Preheader:
for.body.preheader.new:                           ; preds = %for.body.preheader
  %unroll_iter = and i64 %N, -4
  br label %for.body

; Loop:
for.body:                                         ; preds = %for.body, %for.body.preheader.new
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ]
  %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ]
  %inc.3 = add nuw i64 %i.07, 4
  %lsr.iv.next = add nsw i64 %lsr.iv, -2
  %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3
  br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7

; Exit blocks
for.end.loopexit.unr-lcssa.loopexit:              ; preds = %for.body
  %inc.3.lcssa = phi i64 [ %inc.3, %for.body ]
  %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ]
  %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ]
  br label %for.end.loopexit.unr-lcssa
```
rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses:
one in LCSSA phi node, the other - in induction phi node. Here we have 3
uses of this value because of duplicated lcssa nodes, so the transform
doesn't apply and leads to an extra add operation inside the loop. The
proposed solution is to accumulate inserted instructions that will
require LCSSA form update into SetVector and then call
formLCSSAForInstructions for this SetVector once, so the same
instructions don't process twice.
2024-09-06 18:39:47 +03:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Shilei Tian
109cd11dc4
[Attributor] Skip AS specialization for volatile memory instructions (#107250) 2024-09-06 11:00:30 -04:00
Kazu Hirata
bd1559533d
[IndVars] Avoid repeated hash lookups (NFC) (#107513) 2024-09-06 07:40:27 -07:00
Yingwei Zheng
52fac608bd
[InstCombine] Fold [l|a]shr iN (X-1)&~X, N-1 -> [z|s]ext(X==0) (#107259)
Alive2: https://alive2.llvm.org/ce/z/kwvTFn
Closes #107228.

`ashr iN (X-1)&~X, N-1` also exists. See
https://github.com/dtcxzyw/llvm-opt-benchmark/issues/1274.
2024-09-06 21:37:50 +08:00
ErikHogeman
78e1e6ace6
[LV] Check for vector-to-scalar casts in legalizer (#106244)
The code makes assumptions later on the operations and their inputs
being scalar in the loops that are processed, so we should make sure
this is the case in the legalizer.
2024-09-06 11:20:14 +02:00
hanbeom
861caf9b31
[SCCP] Remove LoadInst if it loaded from Constant GlobalVariable (#107245)
This patch removes the `LoadInst` when it loaded from Constant
GlobalVariable. This allows `canRemoveInstruction` function to be
removed.
2024-09-06 10:16:30 +02:00
Kazu Hirata
144314eaa5
[SLPVectorizer] Avoid repeated hash lookups (NFC) (#107491) 2024-09-05 19:04:56 -07:00