Added two patterns for IR pattern matching, `m_IToFP` and `m_FPToI`
which are basically shortcuts of `m_CombinedOr(..., ...)`
> if there isn't already one, PatternMatch should have an m_ItoFP which
covers both
_Originally posted by @arsenm in
https://github.com/llvm/llvm-project/pull/185826#discussion_r2967473936_
/cc @arsenm
This reverts commit ea3fdc5972db7f2d459e543307af05c357f2be26.
Re-enable const-folding for maxnum/minnum in the middle-end, GlobalISel,
and SelectionDAG.
Re-enable optimizations that depend on maxnum/minnum sNaN semantics in
InstCombine and DAGCombiner.
Now that maxnum(x, sNaN) is specified to non-deterministically produce
either NaN or x, these constant-foldings and optimizations are now valid
again according to the newly clarified semantics in #172012 .
Add a new helper function `matchEquivZeroRHS()` that recognizes
comparisons with constants that are equivalent to comparisons with zero,
and transforms the predicate accordingly.
This handles the following transformations:
- icmp sgt X, -1 --> icmp sge X, 0
- icmp sle X, -1 --> icmp slt X, 0
- icmp [us]ge X, 1 --> icmp [us]gt X, 0
- icmp [us]lt X, 1 --> icmp [us]le X, 0
This enables more optimization opportunities in `simplifyICmpWithZero`,
such as folding icmp sgt X, -1 when X is known to be non-negative.
---
- IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.
Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.
@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.
In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.
The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
The current tests for the fold use fma with a squared input.
This isn't entirely correct because fma can return -0 in this case.
Extend the fold to perform it with nsz. Also extend the tests to
test with an unknown value for the addend. The known normal constant is
almost special case that disproves a -0 result.
Split out from https://github.com/llvm/llvm-project/pull/175614
If we drop flags, we'll set the zero_is_poison/int_min_is_poison flag to
false as part of the transform. However, the constant folding was still
performed with the value true, which made constant folding return
poison. This resulted in the pattern failing to match, as the poison
value is not equal to the other select arm.
To avoid this, add some special handling to set the argument to false
during constant folding as well.
Fixes https://github.com/llvm/llvm-project/issues/175282.
We were missing a check that the inner intrinsic is in fact a min/max
op. We'd crash if it was any other intrinsic!
This was found by a fuzzer I'm working on. The high-level design is to
randomly generate LLVM IR, run a pass on it, and then run the original
and new IR through the interpreter. They should produce the same
results. Right now I'm only fuzzing instcombine.
This is basically the same change as #162653, but for InstSimplify
instead of ConstantFolding.
It folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)` and `icmp
(ptrtoaddr x, C)` to `icmp (x, inttoptr C)`.
The fold is restricted to the case where the result type is the address
type, as icmp only compares the icmp bits. As in the other PR, I think
in practice all the folds are also going to work if the ptrtoint result
type is larger than the address size, but it's unclear how to justify
this in general.
There is a generic fold for recursively calling simplifyICmpInst with
the ptrtoint cast stripped:
9b6b52b534/llvm/lib/Analysis/InstructionSimplify.cpp (L3850-L3867)
As such, we shouldn't have to explicitly do this for the
computePointerICmp() fold.
This is not strictly NFC because the recursion limit applies to the
generic fold, though I wouldn't expect this to matter in practice.
The mask doesn't really affect the reverse. It only poisons the masked
off elements in the results. It should be ok to ignore the mask if we
can eliminate the pair.
I don't have a specific use case for this, but it matches what I had
implemented in our downstream before the current upstream
implementation. Submitting upstream so I can remove the delta
in my downstream.
When comparing additions with the same base where one has `nsw`, the
following simplification can be performed:
```llvm
icmp slt/sgt/sle/sge (x + C1), (x +nsw C2)
=>
icmp slt/sgt/sle/sge C1, C2
```
Previously this was only done for `slt`. This patch extends it to the
`sgt`, `sle`, and `sge` predicates when either of the conditions hold:
- `C1 <= C2 && C1 >= 0`, or
- `C2 <= C1 && C1 <= 0`
This patch also handles the `C1 == C2` case, which was previously
excluded.
Proof: https://alive2.llvm.org/ce/z/LtmY4f
When simplifying min/max intrinsics with fixed-size vector constants,
InstructionSimplify attempts to optimize element-wise. However,
getAggregateElement() can return null for certain constant expressions
like bitcasts, leading to a null pointer dereference.
This patch adds a check to bail out of the optimization when
getAggregateElement() returns null, preventing the crash while
maintaining correct behavior for normal constant vectors.
Fixes crash with patterns like:
call <2 x half> @llvm.minnum.v2f16(<2 x half> %x,
<2 x half> bitcast (<1 x i32> <i32 N> to <2 x half>))
This reverts commit f1c1063.
PR #163453 was merged and reverted since it exposed a crash.
After investigation the crash was unrelated and is then fixed in #164628.
This is an attempt to reland #163453.
Fold select instructions with true and false values that act as the same
phi, which cleans up the IR and open up opportunities for other passes
such as loop vectorization.
This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) -
ptrtoaddr(p) -> p2` fold.
This fold requires that p and p2 have the same underlying object
(otherwise the provenance may not be the same).
The argument I would like to make here is that because the underlying
objects are the same (and the pointers in the same address space), the
non-address bits of the pointer must be the same. Looking at some
specific cases of underlying object relationship:
* phi/select: Trivially true.
* getelementptr: Only modifies address bits, non-address bits must
remain the same.
* addrspacecast round-trip cast: Must preserve all bits because we
optimize such round-trip casts away.
* non-interposable global alias: I'm a bit unsure about this one, but I
guess the alias and the aliasee must have the same non-address bits?
* various intrinsics like launder.invariant.group, ptrmask. I think
these all either preserve all pointer bits (like the invariant.group
ones) or at least the non-address bits (like ptrmask). There are some
interesting cases like amdgcn.make.buffer.rsrc, but those are cross
address-space.
-----
There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform
in this function, which I am not extending to handle ptrtoaddr, adding
negative tests instead. This transform is overall dubious for provenance
reasons, but especially dubious with ptrtoaddr, as then we don't have
the guarantee that provenance of `p` has been exposed.
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
Add a new m_PtrToIntOrAddr() matcher which matches both ptrtoint and
ptrtoaddr. Pointer arithmetic only works on the address bits, so
supporting ptrtoaddr is always fine here.
isEliminableCastPair() currently tries to support elimination of
ptrtoint/inttoptr cast pairs by assuming that the maximum possible
pointer size is 64 bits. Of course, this is no longer the case nowadays.
This PR changes isEliminableCastPair() to accept an optional DataLayout
argument, which is required to eliminate pointer casts.
This means that we no longer eliminate these cast pairs during ConstExpr
construction, and instead only do it during DL-aware constant folding.
This had a lot of annoying fallout on tests, most of which I've
addressed in advance of this change.
Add support for the new maximumnum and minimumnum intrinsics in various
optimizations in InstSimplify.
Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to
qNaN instead of x to better match the LLVM IR spec, and add more tests
for sNaN behavior for all 3 max/min intrinsic types.
Scalable get_active_lane_mask intrinsic calls can be simplified to i1
splat (ptrue) when its constant range is larger than or equal to the
maximum possible number of elements, which can be inferred from
vscale_range(x, y)
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
When the second argument passed to the get.active.lane.mask intrinsic is
zero we can simplify the instruction to return an all-false mask
regardless of the first operand.
Look through extractvalue to simplify umul_with_overflow where one of
the operands is 1.
This removes some redundant instructions when expanding SCEVs, which in
turn makes the runtime check cost estimate more accurate, reducing the
minimum iterations for which vectorization is profitable.
PR: https://github.com/llvm/llvm-project/pull/157307
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
If ValueTracking can guarantee non-NaN and non-INF and the `nsz`
fast-math flag is set, we can simplify X * 0.0 ==> 0.0.
https://alive2.llvm.org/ce/z/XacRQZ
add `GenericFloatingPointPredicateUtils` in order to generalize
effects of floating point comparisons on `KnownFPClass` for both IR and
MIR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>