This reverts commit 5dde755188e34c0ba5304365612904476c8adfda,
cbfcf90152de5392a36d0a0241eef25f5e159eef and
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 due to a miscompile introduced in
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 (see
https://reviews.llvm.org/D154725#4568845 for details)
Differential Revision: https://reviews.llvm.org/D157430
If the shl has either nuw or nsw flags, then we know that bits
cannot be shifted out, so a power of two cannot become zero.
Proofs: https://alive2.llvm.org/ce/z/4QfebE
6640df94f9abd4f9fef0263afbf7978ac55832b8 did not actually remove it,
just its final user. cannotBeOrderedLessThanZeroImpl still has a user
which needs to be updated before it can be removed.
The users of SignBitMustBeZero currently have broken expectations for
nan handling, so requires more work to replace.
this PR tries to match the following pattern, seperate from D156881
```
%vscale = call i64 @llvm.vscale.i64()
%shift = shl nuw nsw i64 %vscale, 11
```
Now, we only check the shl recursively when the OrZero is true.
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D157062
There is some pointer simplification code originally from isKnownNonNull
that is now better suited to be in isKnownNonZeroFromOperator.
Differential Revision: https://reviews.llvm.org/D156141
Prefer checking for non-zero operator before non-zero via
dominating conditions. This is to make sure we don't have
compile-time regressions when special cases that are currently
part of isKnownNonZero() get moved into isKnownNonZeroFromOperator().
Split off the primary part of the isKnownNonZero() implementation,
in the same way it is done for computeKnownBits(). This makes it
easier to reorder different parts of isKnownNonZero().
This mostly manifested as broken constant folding. This was
mishandling the dynamic denormal mode. It was also mishandling literal
signaling nans, such that they would also be treated as poison.
https://reviews.llvm.org/D155437
This patch is separated from D154953 to see what tests are affected by this
change alone according comment.
Depend on the related updating of LangRef on D155193.
Reviewed By: paulwalker-arm, nikic, david-arm
Differential Revision: https://reviews.llvm.org/D155350
Improve computeKnownFPClass select handling to cover the case where
the condition performs a class test. This allows us to recognize
no-nans in cases like:
%not.nan = fcmp ord float %x, 0.0
%select = select i1 %not.nan, float %x, float 0.0
Math library code has similar edge case filtering on the inputs and
final results.
https://reviews.llvm.org/D153089
The implementations of a number of math functions on amdgpu involve
pre and post-scaling the inputs out of the denormal range. If these
are chained together we can possibly fold them out.
computeConstantRange seems weaker than computeKnownBits, so this
regresses some of the older vector tests.
Support the canonical range check pattern for KnownBits assumptions.
This is the same as the generic ConstantRange handling, just shifted
by an offset.
For non-equality icmps, we don't do any KnownBits-specific
reasoning, and just use the known bits as a constraint on the range.
We can generalize this for all predicates by round-tripping through
ConstantRange and using makeAllowedICmpRegion().
The minor improvement in zext-or-icmp is because we assume that
a value is ult [0,1], which means it must be zero.
When processing assumes, we also handle assumes on ptrtoint of the
value. In canonical IR, these will have the same size as the value.
However, in non-canonical IR there may be an implicit zext or
trunc, which results in a bit width mismatch. We currently handle
this by adjusting bitwidth everywhere, but this is fragile and I'm
pretty sure that the way we do this is incorrect for some predicates,
because we effectively end up commuting an ext/trunc and an icmp.
Instead, add an m_PtrToIntSameSize() matcher that will only handle
bitwidth preserving cases. For the bitwidth-changing cases, wait
until they have been canonicalized.
The original handling for this was added purely to prevent crashes
in an earlier implementation which failed to account for this
entirely.
Use a unit test since I don't see any existing uses try to make use of
the high bits of a pointer.
This will also assert if the metadata type doesn't match the pointer
width, but I consider that a defect in the verifier and shouldn't be
handled.
AMDGPU allocates LDS globals by assigning !absolute_symbol with the
final fixed address. Tracking the high bits are 0 may help with
addressing mode matching.
The DataLayout alloca address space is the address space that should
be used when creating new allocas. However, not all allocas are
required to be in this address space. The isKnownNonZero() check
should work on the actual address space of the alloca, not the
default alloca address space.
This reverts commit 464dcab8a6c823c9cb462bf4107797b8173de088.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).
This intrinsic is lowered during the early phases of the backend.
This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.
Depends on D148184
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D148957
This fixes the largest remaining discrepancy between results of
computeKnownBits() and SimplifyDemandedBits(). We only care about
the multi-use case here, because the assume necessarily introduces
an extra use.
These implement essentially the same thing, so normalize
ValueTracking to use SimplifyQuery. In the future we can directly
expose the SimplifyQuery-based APIs.
The ORE argument threaded through ValueTracking is used only in a
single, untested place. It is also essentially never passed: The
only places that do so have been added very recently as part of the
KnownFPClass migration, which is vanishingly unlikely to hit this
code path. Remove this effectively dead argument.
Differential Revision: https://reviews.llvm.org/D151562
Make ValueTracking directly call the KnownBits shift helpers, which
provides more precise results.
Unfortunately, ValueTracking has a special case where sometimes we
determine non-zero shift amounts using isKnownNonZero(). I have my
doubts about the usefulness of that special-case (it is only tested
in a single unit test), but I've reproduced the special-case via an
extra parameter to the KnownBits methods.
Differential Revision: https://reviews.llvm.org/D151816
This reverts commit 754f3ae65518331b7175d7a9b4a124523ebe6eac.
Unfortunately the change can cause regressions due to dropping flags
from instructions (like nuw,nsw,inbounds), prevent further optimizations
depending on those flags.
A simple example is the IR below, where `inbounds` is dropped with the
patch and the phase-ordering test added in 7c91d82ab912fae8b.
define i1 @test(ptr %base, i64 noundef %len, ptr %p2) {
bb:
%gep = getelementptr inbounds i32, ptr %base, i64 %len
%c.1 = icmp uge ptr %p2, %base
%c.2 = icmp ult ptr %p2, %gep
%select = select i1 %c.1, i1 %c.2, i1 false
ret i1 %select
}
For more discussion, see D149404.
Implement precise nuw/nsw support in the KnownBits implementation,
replacing the rather crude handling in ValueTracking.
Differential Revision: https://reviews.llvm.org/D151208
I removed the conflict check from computeKnownBitsFromShiftOperator()
in D150648 assuming that this is now handled on the KnownBits side.
However, the nsw handling is still inside ValueTracking, so we
still need to handle conflicts there. Restore the check closer to
where it is relevant.
Fixes https://github.com/llvm/llvm-project/issues/62908.