Stack-move optimization, the optimization that merges src and dest
alloca of the full-size copy, replaces all uses of the dest alloca with
src alloca. For safety, we needed to check all uses of the dest alloca
locations are dominated by src alloca, to be replaced. This PR adds the
check for that.
Fixes#65225
The transform 'scalarizeLoadExtract' can be applied to scalable
vector types if the index is less than the minimum number of elements.
The check whether the index is less than the minimum number of elements
locates at line 1175~1180. 'scalarizeLoadExtract' will call
'canScalarizeAccess' and check the returned result if this transform is safe.
At the beginning of the function 'canScalarizeAccess', the index will be
checked
1. If it is less than the number of elements of a fixed vector type.
2. If it is less than the minimum number of elements of a scalable vector type.
Otherwise 'canScalarizeAccess' will return unsafe and this transform
will be prevented.
Extend folding for `2^n` euclidean division remainder operations
on signed integers by handling the specific instance in which one
`select` arm has already been replaced by 1.
Reported-By: HypheX
Fixes: https://github.com/llvm/llvm-project/issues/66417.
Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D157913
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.
This adds an additional transform to drop zero-size memcpys, also in
the case where the size is only zero after instruction simplification.
The motivation is the case from PR54983 where the size is non-trivially
zero, and processMemSetMemCpyDependence() keeps trying to reduce the
memset size by zero bytes.
This fix it's not really principled. It only works on the premise that
if InstSimplify doesn't realize the size is zero, then AA also won't.
The principled approach would be to instead add a isKnownNonZero()
guard to the processMemSetMemCpyDependence() transform, but I
suspect that would render that optimization mostly useless (at least
it breaks all the existing test coverage -- worth noting that the
constant size case is also handled by DSE, so I think this transform
is primarily about the dynamic size case).
Fixes https://github.com/llvm/llvm-project/issues/54983.
Fixes https://github.com/llvm/llvm-project/issues/64886.
Differential Revision: https://reviews.llvm.org/D124078
Duplicate phi nodes were being directly removed, without
invalidating MDA. This could result in a new phi node being
allocated at the same address, incorrectly reusing a cache entry.
Fix this by optionally allowing EliminateDuplicatePHINodes() to
collect phi nodes to remove into a vector, which allows GVN to
handle removal itself.
Fixes https://github.com/llvm/llvm-project/issues/64598.
Differential Revision: https://reviews.llvm.org/D158849
This pass will upgrade DXIL-style llvm constructs (which are mostly
metadata) into the representations we use in LLVM for the same concepts.
For now we just strip the valver metadata, which we don't need. Later
changes will make this pass more useful, and then we should be able to
wire it into clang and possibly the DirectX backend's AsmParser.
Since we no longer support typed pointers in LLVM IR, the PtrASXTy
in isLoadInvariantInLoop was set to be equal to Addr->getType() (an
opaque ptr in the same address space). That made the loop looking
through bitcasts redundant.
When pushing a sub nsw 0, %x negation into an expression, try to
preserve the nsw flag for the cases where this is possible. Do this
by passing the flag through recursive Negator::negate() calls.
Proofs: https://alive2.llvm.org/ce/z/oRPNcY
Differential Revision: https://reviews.llvm.org/D158510
The associated function can be a nullptr if it is an indirect call.
This causes a crash in `CheckCallee` which always assumes the callee
is a valid pointer.
Fix#66904.
findDominatingValue has a search limit, and when it is reached, optimization is
not applied. This patch fixes the issue that this limit also takes into account
debug intrinsics, so the result of optimization can depend from the presence of
debug info.
The builtin_expect(), and C++20's likely, unlikely attributes assign branch_weights to annotated branches.
This patch adds the the ability to query branch !prof metadata and improve static analysis based on that.
Fixes: https://github.com/llvm/llvm-project/issues/64998
Reviewers: tejohnson, efriedma
Differential Revision: https://reviews.llvm.org/D159336
In the 0e0ff8573de69286536e4f49098226eda0c4c7f5 was introduced
inconsistency between condition widening and checking if it's possible
to widen. We check the possibility to hoist checks parsed from the
condition, but hoist entire condition.
This patch returns testing that a condition can be hoisted rather than
the checks parsed from that condition.
Co-authored-by: Aleksander Popov <apopov@azul.com>
This patch adds support for variadic argument for loongarch64,
which is based on MIPS64. And `check-msan` all pass.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D158587
Make sure every conditional branch constructed by `LoopUnrollRuntime`
code sets branch weights.
- Add new 1:127 weights for the conditional jumps checking whether the
whole (unrolled) loop should be skipped in the generated prolog or
epilog code.
- Remove `updateLatchBranchWeightsForRemainderLoop` function and just
add weights immediately when constructing the relevant branches. This
leads to simpler code and makes the code more obvious as every call
to `CreateCondBr` now has a `BranchWeights` parameter.
- Rework formula for epilogue latch weights, to assume equal
distribution of remainders and remove `assert` (as I was able to
reach this code when forcing small unroll factors on the commandline).
Differential Revision: https://reviews.llvm.org/D158642
This adds code to the loop rotation transformation to ensure that the
computed block execution counts for the loop bodies are the same before
and after the transformation. This isn't always true in practice, but I
believe this is because of numeric inaccuracies in the BlockFrequency
computation.
The invariants this is modeled on and heuristic choice of 0-trip loop
amount is explained in a lenghty comment in the new
`updateBranchWeights()` function.
Differential Revision: https://reviews.llvm.org/D157462
As per the stack of patches this is attached to, allow users of
BasicBlock::splitBasicBlock to provide an iterator for a position, instead
of just an instruction pointer. This is to fit with my proposal for how to
get rid of debug intrinsics [0]. There are other call-sites that would need
to change, but this is sufficient for a stage2clang self host and some
other C++ projects to build identical binaries, in the context of the whole
remove-DIs project.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152545
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
The Attributor has gained support for indirect calls but it is opt-in.
This patch makes AAKernelInfoCallSite able to handle multiple potential
callees.
This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898.
For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions.
Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc
You can run the standalone translation validation tool `alive-tv` locally to verify these transformations.
```
alive-tv transforms.ll --smt-to=600000 --exit-on-error
```
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D156238
Fix crash on RAUW due to locals and globals having different address
spaces. This is the intent of the original code, but it assumes the
alloca address space is 0. This patch fixes the code to check that the
global's address space matches `DL.getAllocaAddrSpace()` instead.
Fixes#65155
This patch adds a hidden CLI option "--sroa-max-alloca-slices", which is
an integer that controls the maximum number of alloca slices SROA can
consider before bailing out. This is useful because it may not be
profitable to split memcpys into (possibly tens of) thousands of loads/stores.
This also prevents an issue with exponential compile time explosion in passes
like DSE and MemCpyOpt caused by excessive alloca splitting.
Fixes https://github.com/rust-lang/rust/issues/88580.
Differential Revision: https://reviews.llvm.org/D159354
If the scalar does not need to be scheduled and it was vectorized
already in one of the vector nodes, we still can try to vectorize it in
another node. Just does not need account its cost in the scalar total
cost, as it will be handled in the main vectorized node.
Differential Revision: https://reviews.llvm.org/D159205