This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.
This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.
For image stores it also trims DMask if necessary.
---------
Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.
Also bail out of dmask is 0, as this case has special semantics:
> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
Rationale:
- It does not enable any further IR simplifications.
- It does not improve the generated code since the isel lowering of
ballot also has special cases for 0 and 1.
- getreg is "too powerful" since it can read from many different
registers, so its intrinsic properties have to be set very
conservatively.
There is also a correctness problem that getreg can read from exec but
it is currently not marked as convergent.
Differential Revision: https://reviews.llvm.org/D153047
Remove undef values from the end of the vector operand in image and
buffer store instructions.
Also instead of call to computeKnownFPClass, use only findScalarElement.
Continuation of: 88421ea973916e Trim zero components from buffer and image stores
Differential Revision: https://reviews.llvm.org/D152440
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.
In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.
Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed by: arsenm, foad, nhaehnle, piotr
This barely matters since 99% are converted to the generic intrinsic now,
and the only real difference is the target intrinsic supports a variable
test mask. Start propagating poison. Prefer folding to a defined result (false)
for an undef test mask. Propagate undef for the first operand.
InstCombine already does this for minnum/maxnum. If we
also apply this to fmed3, we don't need to explicitly
use 16-bit fmed3 if we're not sure the target
supports 16-bit instructions yet.
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D146737
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
There are 2 issues here:
1. In the default LLVM FP environment (regular FP math instructions),
SNaN is some flavor of "don't care" which we will nail down in
D143074, so this is just a quality-of-implementation improvement
for default FP.
2. In the constrained FP environment (constrained intrinsics), SNaN
must not propagate through a math operation; it has to be quieted
according to IEEE-754 spec. That is independent of exception
handling mode, so the current behavior is a miscompile.
Differential Revision: https://reviews.llvm.org/D143505
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.
Also use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D127662
Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.
This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.
Differential Revision: https://reviews.llvm.org/D124232
Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.
This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.
Differential Revision: https://reviews.llvm.org/D124232
Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.
Therefore, only zext instructions, not sext instructions should match.
Also match integer constants that can be truncated.
Differential Revision: https://reviews.llvm.org/D118043