Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
The change improves the code in general and, as a side effect, avoids
crashing on an impossible address space casts guarded
by `__isGlobal/__isShared`, which partially fixes
https://github.com/llvm/llvm-project/issues/112760
It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.
This is #112964 + a small fix for the crash on unintended argument
access which was the root cause to revers the earlier version of the patch.
The change improves the code in general and, as a side effect, avoids crashing
on an impossible address space casts guarded by `__isGlobal/__isShared`, which
partially fixes https://github.com/llvm/llvm-project/issues/112760
It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.
We can turn:
```
%add = add i8 %arg, C1
%and = and i8 %add, C2
%cmp = icmp eq i1 %and, C3
```
into:
```
%and = and i8 %arg, C2
%cmp = icmp eq i1 %and, (C3 - C1) & C2
```
This is only worth doing if the sequence is the sole user of the addition
operation.
We left some easy opportunities for further simplifications.
log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.
log2(x >>u y) is simply `log2(x) - y`.
log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
unsigned char a = 1u << x;
return y / a;
}
```
LLVM would canonicalize this to:
```
%shl = shl nuw i32 1, %x
%conv1 = and i32 %shl, 255
%div = udiv i32 %y, %conv1
```
In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
The two folding operations are causing a cycle for the following case
with
scalable vector types:
define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
%1 = select <vscale x 2 x i1> %cond, <vscale x 2 x double>
zeroinitializer, <vscale x 2 x double> %b
%2 = fneg fast <vscale x 2 x double> %1
ret <vscale x 2 x double> %2
}
1) fold fneg: -(Cond ? C : Y) -> Cond ? -C : -Y
2) fold select: (Cond ? -X : -Y) -> -(Cond ? X : Y)
1) results in the following since '<vscale x 2 x double>
zeroinitializer' passes
the check for the immediate constant:
%.neg = fneg fast <vscale x 2 x double> zeroinitializer
%b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double> %.neg,
<vscale x 2 x double> %b.neg
and so we end up going back and forth between 1) and 2).
Attempt to fold scalable vector constants, so that we end up with a
splat instead:
define <vscale x 2 x double> @test_fneg_select_abs(<vscale x 2 x i1>
%cond, <vscale x 2 x double> %b) {
%b.neg = fneg fast <vscale x 2 x double> %b
%1 = select fast <vscale x 2 x i1> %cond, <vscale x 2 x double>
shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x
double> poison, double -0.000000e+00, i64 0), <vscale x 2 x double>
poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x double>
%b.neg
ret <vscale x 2 x double> %1
}
Add support for ``llvm.nvvm.fshl.clamp`` and ``llvm.nvvm.fshr.clamp``
intrinsics. These intrinsics are similar to the generic llvm funnel
shift, except that the shift value is clamped to the integer width.
Currently only ``i32`` is supported and is implemented with the
`shf.[rl].clamp.b32` PTX instruction.
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
InstCombinerImpl::foldSelectInstWithICmp has some inlined code for
select-icmp-xor simplification, but this simplification is already done
by other code, via another path:
(X & Y) == 0 ? X : X ^ Y ->
((X & Y) == 0 ? 0 : Y) ^ X ->
(X & Y) ^ X ->
X & ~Y
Cover the cases that it claims to simplify, and demonstrate that
stripping it doesn't cause test changes.
In
5dbfca30c1
we assume that RHS is poison implies LHS is also poison. It doesn't hold
after introducing samesign flag.
This patch drops the `samesign` flag on RHS if the original expression
is a logical and/or.
Closes#112467.
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.
maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}
The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.
The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.
New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}
The same changes are applied to the minnum intrinsic.
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
Write dedicated tests for foldSelectValueEquivalence, demonstrating that
it does not perform many GVN-like replacements when:
- the comparison is a vector-type
- the comparison is a floating-point type
as a prelude to fixing these deficiencies.
Reuse llvm::isTriviallyVectorizable in llvm::isNotCrossLaneOperation, in
order to get it to handle more intrinsics.
Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/XSV_GT
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.
Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
Similar to 112aac4e8961b9626bb84f36deeaa5a674f03f5a, this converts log
libcalls to llvm.log.f64 intrinsics if we know they do not set errno, as
the input is not zero and not negative. As log will produce errno if the
input is 0 (returning -inf) or if the input is negative (returning nan),
we also perform the conversion when we have noinf and nonan.
This is intended to solve a problem with lowering atomics in
OpenMP and C++ common to AMDGPU and NVPTX.
In OpenCL and CUDA, it is undefined behavior for an atomic instruction
to modify an object in thread private memory. In OpenMP, it is defined.
Correspondingly, the hardware does not handle this correctly. For
AMDGPU,
32-bit atomics work and 64-bit atomics are silently dropped. We
therefore
need to codegen this by inserting a runtime address space check,
performing
the private case without atomics, and fallback to issuing the real
atomic
otherwise. This metadata allows us to avoid this extra check and branch.
Handle this by introducing metadata intended to be applied to atomicrmw,
indicating they cannot access the forbidden address space.