This picks up from #166028, making the `Function` argument optional:
most cases don't need to provide it, but in e.g. InstCombine's case,
where the instruction (select, branch) is not attached to a function
yet, the function needs to be passed explicitly.
Co-authored-by: Florian Hahn <flo@fhahn.com>
This patch enables `FoldOpIntoSelect` and `foldOpIntoPhi` for the cases
when Op's second parameter is a non-constant.
It doesn't seem to bring significant improvements, but the compile
time impact is neglegable.
This generalizes `handleVectorPmaddIntrinsic()`:
- potentially handle floating-point type intrinsics (e.g.,
`llvm.x86.avx512bf16.dpbf16ps.512`). This usage is not enabled yet.
- "multiplication with an initialized zero guarantees that the
corresponding output becomes initialized" is now gated by a parameter
This patch addresses the profile of 2 branches:
- one that compares the 2 limits, for which we have no information (the C1, C2, see https://reviews.llvm.org/D136233)
- one that is conditioned on a condition for which we have a profile, so we reuse it
Issue #147390
The `llvm.experimental.guard` intrinsic is a `call`, so its metadata - if present - would be one value (as per `Verifier::visitProfMetadata`). That wouldn't be a correct `branch_weights` metadata. Likely, `GI->getMetadata(LLVMContext::MD_prof)` was always `nullptr`.
We can bias away from deopt instead.
Issue #147390
Update the verifier to first check if the number of incoming values
matches the number of predecessors, before using
incoming_values_and_blocks. We unfortunately need also check here, as
this may be called before verifyPhiRecipes runs.
Also update the verifier unit tests, to actually fail for the expected
recipes.
Eventually this should be generated by tablegen for all functions.
For now add a manually implementation for sincos_stret, which I
have an immediate use for. This will allow pulling repeated code
across targets into shared call sequence code.
Also add sqrt just to make sure we can handle adding return attributes
on the declaration.
PR #159163's probability computation for epilogue loops does not handle
the possibility of an original loop probability of one. Runtime loop
unrolling does not make sense for such an infinite loop, and a division
by zero results. This patch works around that case.
Issue #165998.
ConstantRange uses `[-1, -1)` as the canonical form of a full set.
Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't
work for full ranges. This patch fixes the value enumeration in
`ConstantComparesGatherer` to prevent missing values for full sets.
Closes https://github.com/llvm/llvm-project/issues/166369.
Currently, `LoopFullUnrollPass` incorrectly performs partial unrolling
when `#pragma unroll` is specified and both `TripCount` and
`MaxTripCount` are unknown. This patch adds a check to prevent partial
unrolling when `OnlyFullUnroll` parameter is true and both trip count
values are zero.
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.
I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.
Fixes https://github.com/llvm/llvm-project/issues/160912.
PR: https://github.com/llvm/llvm-project/pull/161445
This change aims to avoid inserting a freeze instruction between the
load and bitcast when scalarizing extend-extract. This is particularly
useful in combination with
https://github.com/llvm/llvm-project/pull/164682, which can then
potentially further scalarize, provided there is no freeze.
alive2 proof: https://alive2.llvm.org/ce/z/W-GD88
Also add a corresponding intrinsic property that can be used to mark
intrinsics that do not introduce poison, for example simple arithmetic
intrinsics that propagate poison just like a simple arithmetic
instruction.
As a smoke test this patch adds the new property to
llvm.amdgcn.fmul.legacy.
Propagate alignment through ptrmask based on potential constant values
of mask and align of ptr.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would
convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats
booleans as unsigned values, so this resulted in the attribute
`DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the
debugger would display the value as `255` instead of `true`.
This change modifies the behavior to use zero-extension for 1-bit values
instead, ensuring that `true` is represented as 1. Consequently, the
DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the
debugger to correctly display the boolean as `true`.
For the simplification
```
(C && A) || (!C && B) --> sel C, A, B
```
(and related), if `C` (or (`!C`)) is the condition in the select
instruction representing the logical and, we can preserve that logical
and's branch weights when emitting the new instruction. Otherwise, the
profile data is unknown.
If `C` is the condition of both logical ands, then we just take the
branch weights of the first logical and (though in practice they should
be equal.)
Furthermore, `select-safe-transforms.ii` now passes under the profcheck
configuration, so we remove it from the failing tests.
Tracking issue: #147390
If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected.
Fixes#166035
Currently in optimizeMaskToEVL we convert every widened load, store or
reduction to a VP predicated recipe with EVL, regardless of whether or
not it uses the header mask.
So currently we have to be careful when working on other parts VPlan to
make sure that the EVL transform doesn't break or transform something
incorrectly, because it's not a semantics preserving transform.
Forgetting to do so has caused miscompiles before, like the case that
was fixed in #113667
This PR rewrites it to work in terms of pattern matching, so it now only
converts a recipe to a VP predicated recipe if it is exactly masked with
the header mask.
After this the transform should be a true optimisation and not change
any semantics, so it shouldn't miscompile things if other parts of VPlan
change.
This fixes#152541, and allows us to move addExplicitVectorLength into
tryToBuildVPlanWithVPRecipes in #153144
It also splits out the load/store transforms into separate patterns for
reversed and non-reversed, which should make #146525 easier to implement
and reason about.
The IR pattern is compiled from OpenCL code:
__builtin_astype(x > (uchar2)(0) ? x : -x, uchar2);
where smax is created by foldSelectInstWithICmp + canonicalizeSPF.
smax could also come from direct elementwise max call:
int c = b > (int)(0) ? (int)(0) : -b;
int d = __builtin_elementwise_max(b, (int)(0));
*a = c | d;
https://alive2.llvm.org/ce/z/2-brvrhttps://alive2.llvm.org/ce/z/Dowjzkhttps://alive2.llvm.org/ce/z/kathwZ
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
BranchOnCount and BranchOnCond do not read memory, but cannot be moved.
Mark them as having side-effects, but not reading/writing memory, which
more accurately models that above. This allows removing some special
checking for branches both in the current code and future patches.
Update VPInstruction constructor to accept VPIRMetadata between the
Flags and DebugLoc parameters. This allows metadata to be passed during
construction rather than assigned afterward.
PR #161000 introduced a bug whereby the IR would become invalid by having an unconditional branch have `!prof`attached to it. This only became evident in PR #165744, because the IR of `test/Transforms/SimplifyCFG/pr165301.ll`was simple enough to both (1) introduce the unconditional branch, and (2) survive in that fashion until the end of the pass (simplifycfg) and thus trip the verifier.
If the laternate operation is more stricter than the main operation, we
cannot rely on the analysis of the main operation. In such case, better
to avoid doing the analysis at all, since it may affect the overall
result and lead to incorrect optimization
Fixes#165878
`simplifySwitchOfPowersOfTwo` converts (when applicable, see `00f5a1e30b`) a switch to a conditional branch. Its false case goes to the `default` target of the former switch, and the true case goes to a BB performing a `cttz`. We can calculate the branch weights from the branch weights of the old switch.
Issue #147390
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.
For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.
This patch fixes branch weights in tests that PR #157754 adversely
affected.
This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.
In summary, for the case of partial loop unrolling without a remainder
loop, this patch changes LoopUnroll to:
- Maintain branch weights consistently with the original loop for the
sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
`llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
original estimated trip count (e.g., 10) divided by the unroll count
(e.g., 4) leaves a remainder (e.g., 2).
There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a remainder loop and complete unrolling, and
there are two associated tests whose branch weights this patch adversely
affects. They will be addressed in future patches that should land with
this patch.