At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
Try to optimize a call to the result of a ptrauth intrinsic, potentially
into the ptrauth call bundle:
call(ptrauth.resign(p)), ["ptrauth"()] -> call p, ["ptrauth"()]
call(ptrauth.sign(p)), ["ptrauth"()] -> call p
as long as the key/discriminator are the same in sign and auth-bundle,
and we don't change the key in the bundle (to a potentially-invalid
key.)
Generating a plain call to a raw unauthenticated pointer is generally
undesirable, but if we ended up seeing a naked ptrauth.sign in the first
place, we already have suspicious code. Unauthenticated calls are also
easier to spot than naked signs, so let the indirect call shine.
Note that there is an arguably unsafe extension to this, where we don't
bother checking that the key in bundle and intrinsic are the same (and
also allow folding away an auth into a bundle.)
This can end up generating calls with a bundle that has an invalid key
(which an informed frontend wouldn't have otherwise done), which can be
problematic. The C that generates that is straightforward but arguably
unreasonable. That wouldn't be an issue if we were to bite the bullet
and make these fully AArch64-specific, allowing key knowledge to be
embedded here.
Try to optimize a call to a ptrauth constant, into its ptrauth bundle:
call(ptrauth(f)), ["ptrauth"()] -> call f
as long as the key/discriminator are the same in constant and bundle.
This is the intrinsic version of #146349, and handles fabs as well as
other intrinsics.
It's largely a copy of InstCombinerImpl::foldShuffledIntrinsicOperands
but a bit simpler since we don't need to find a common mask.
Creating a separate function seems to be cleaner than trying to shoehorn
it into the existing one.
When folding `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp
C1`, if NUW/NSW flags are present on `X BOp C1` and could be safely
applied to `C2 BOp C1`, then they may be added on the BOp after the fold
is complete. https://alive2.llvm.org/ce/z/n_3aNJ
Preserving these flags can allow subsequent transforms to re-order the
min/max and BOp, which in the case of NVPTX would allow for some
potential future transformations which would improve
instruction-selection.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
We currently pull shuffles through binops and intrinsics, which is an
important canonical form for VectorCombine to be able to scalarize
vector sequences. But while binops can be folded with a constant
operand, intrinsics currently require all operands to be shufflevectors.
This extends intrinsic folding to be in line with regular binops by
reusing the constant "unshuffling" logic.
As far as I can tell the list of currently folded intrinsics don't
require any special UB handling.
This change in combination with #138095 and #137823 fixes the following
C:
```c
void max(int *x, int *y, int n) {
for (int i = 0; i < n; i++)
x[i] += *y > 42 ? *y : 42;
}
```
Not using the splatted vector form on RISC-V with `-O3 -march=rva23u64`:
```asm
vmv.s.x v8, a4
li a4, 42
vmax.vx v10, v8, a4
vrgather.vi v8, v10, 0
.LBB0_9: # %vector.body
# =>This Inner Loop Header: Depth=1
vl2re32.v v10, (a5)
vadd.vv v10, v10, v8
vs2r.v v10, (a5)
```
i.e., it now generates
```asm
li a6, 42
max a6, a4, a6
.LBB0_9: # %vector.body
# =>This Inner Loop Header: Depth=1
vl2re32.v v8, (a5)
vadd.vx v8, v8, a6
vs2r.v v8, (a5)
```
Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper. This also generalizes an existing fold,
which has therefore been dropped.
Proofs: https://alive2.llvm.org/ce/z/NS7Vzu.
Fixes: https://github.com/llvm/llvm-project/issues/138212.
Support the ptrtoint(gep null, x) -> x and ptrtoint(gep inttoptr(x), y)
-> x+y folds for the case where there is a chain of geps that ends in
null or inttoptr. This avoids some regressions from #137297.
While here, also be a bit more careful about edge cases like pointer to
vector splats and mismatched pointer and index size.
Also add `tryGetLog2` helper that encapsulates the common pattern:
```
if (takeLog2(..., /*DoFold=*/false)) {
Value * Log2 = takeLog2(..., /*DoFold=*/true);
...
}
```
Closes#122498
We would like to optimize situations of the form that happen after loop
vectorization+SROA:
```
loop:
%phi = phi zeroinitializer, %interleaved
%deinterleave_a = shufflevector %phi, poison ; pick half of the lanes
%deinterleave_b = shufflevector %phi, posion ; pick remaining lanes
... %a = ... %b = ...
%interleaved = shufflevector %a, %b ; interleave lanes of a+b
```
where the interleave and de-interleave shuffle operations cancel each
other out.
This could be handled by `foldOpPhi` but does not currently work because
it does
not proceed when there are multiple uses of the `Phi` operation.
This extends `foldOpPhi` to allow multiple `shufflevector` uses when
they are
shown to simplify for all `Phi` input values.
Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.
We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.
The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
Since cd16b07 (IR: introduce CmpInst::isEquivalence), there is now an
isEquivalence routine in CmpInst that we can use to determine
equivalence in foldSelectValueEquivalence. Implement this, extending it
to include floating-point equivalences as well.
Add a helper for shared folds between logical and bitwise and/or
and move the and/or of icmp and fcmp folds in there. This makes
it easier to extend to more folds.
A possible extension would be to base the current and/or of icmp
reassociation logic on this helper, so that it for example also
applies to fcmp.
This corresponds to the canonicalized form of some logic that was
seen in Swift-generated code for comparing optional pointers:
`(X==Z || Y==Z) ? (X==Z && Y==Z) : X==Y --> X==Y`
where `Z` was the constant `0`.
https://alive2.llvm.org/ce/z/J_3aa9
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
Added folds:
- `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)`
- `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)`
The fold typically is handled in the `Reassosiate` pass, but it fails
if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate
doesn't propagate flags correctly.
This patch adds the fold explicitly the InstCombine
Proofs: https://alive2.llvm.org/ce/z/p6JyRPCloses#105866
This patch adds the aforementioned fold to InstCombine. This pattern is
produced after naive implementations of 3-way comparison in high-level
languages are transformed into LLVM IR and then optimized.
Proofs: https://alive2.llvm.org/ce/z/w4QLq_
Repplied with a clang test fix.
-----
When simplifying a multi-use root value, the demanded bits were
reset to full, but we also need to reset the context instruction.
To make this convenient (without requiring by-value passing of
SimplifyQuery), move the logic that handles constants and
dispatches to SimplifyDemandedUseBits/SimplifyMultipleUseDemandedBits
into SimplifyDemandedBits. The SimplifyDemandedInstructionBits
caller starts with full demanded bits and an appropriate context
anyway.
The different context instruction does mean that the ephemeral
value protection no longer triggers in some cases, as the changes
to assume tests show.
An alternative, which I will explore in a followup, is to always
use SimplifyMultipleUseDemandedBits() -- the previous root special
case is only really intended for SimplifyDemandedInstructionBits(),
which now no longer shares this code path.
Fixes https://github.com/llvm/llvm-project/issues/97330.
When simplifying a multi-use root value, the demanded bits were
reset to full, but we also need to reset the context extract. To
make this convenient (without requiring by-value passing of
SimplifyQuery), move the logic that that handles constants and
dispatches to SimplifyDemandedUseBits/SimplifyMultipleUseDemandedBits
into SimplifyDemandedBits. The SimplifyDemandedInstructionBits
caller starts with full demanded bits and an appropriate context
anyway.
The different context instruction does mean that the ephemeral
value protection no longer triggers in some cases, as the changes
to assume tests show.
An alternative, which I will explore in a followup, is to always
use SimplifyMultipleUseDemandedBits() -- the previous root special
case is only really intended for SimplifyDemandedInstructionBits(),
which now no longer shares this code path.
Fixes https://github.com/llvm/llvm-project/issues/97330.
This will enable calling SimplifyDemandedBits() with a SimplifyQuery
that has CondContext set in the future.
Additionally this also marginally strengthens the analysis by
retaining the original context instruction for one-use chains.
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
Instead of taking the sign of the cast operation as the required since
for the transform, only force a sign if an operation is maybe
negative.
This gives us more flexability when checking if the floats are safely
converable to integers.
Closes#84389
The matchFunnelShift function was doing pattern matching and creating
the fshl/fshr instruction if needed. Moved the pattern matching code to
function convertOrOfShiftsToFunnelShift. It can be reused for other
optimizations.
This reverts commit ef388334ee5a3584255b9ef5b3fefdb244fa3fd7.
The referenced issue violates the spec for finite-only math only by
using a return value for a constant infinity. If the interpretation
is results and arguments cannot violate nofpclass, then any
std::numeric_limits<T>::infinity() result is invalid under
-ffinite-math-only. Without this interpretation the utility of
nofpclass is slashed.