Common up handling of intrinsics that are a no-op on uniform arguments.
This catches a couple of new cases:
readlane (readlane x, y), z -> readlane x, y
(for any z, does not have to equal y).
permlane64 (readfirstlane x) -> readfirstlane x
(and likewise for any other uniform argument to permlane64).
If the parameters(the input and segment select) coming in to
amdgcn.trig.preop intrinsic are compile time constants, we pre-compute
the output of amdgcn.trig.preop on the CPU and replaces the uses with
the computed constant.
This work extends the patch https://reviews.llvm.org/D120150 to make it
a complete coverage.
For the segment select, only src1[4:0] are used. A segment select is
invalid if we are selecting the 53-bit segment beyond the [1200:0] range
of the 2/PI table. 0 is returned when a segment select is not valid.
A crash could happen in `PointerReplacer::replace` when constructing a
new
select instruction and there is no replacement for one of its operand.
This can
happen when the operand is a load instruction that has been replaced
earlier
such that the operand itself is already the new value. In this case, it
is not
in the replacement map and `getReplacement` simply returns nullptr.
Fix SWDEV-472192.
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
Fixes#68120
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.
This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.
For image stores it also trims DMask if necessary.
---------
Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.
Also bail out of dmask is 0, as this case has special semantics:
> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
Otherwise we may replace undef with poison.
Note that a lot of tests regressing here already have variants
that use poison instead of undef (often in a separate
inseltpoison file), which is why I'm not adjusting them to the
new pattern.
Correct InstCombine strictfp tests to follow the rules documented
in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.
Test changes verified with D146845.
Rationale:
- It does not enable any further IR simplifications.
- It does not improve the generated code since the isel lowering of
ballot also has special cases for 0 and 1.
- getreg is "too powerful" since it can read from many different
registers, so its intrinsic properties have to be set very
conservatively.
There is also a correctness problem that getreg can read from exec but
it is currently not marked as convergent.
Differential Revision: https://reviews.llvm.org/D153047
Remove undef values from the end of the vector operand in image and
buffer store instructions.
Also instead of call to computeKnownFPClass, use only findScalarElement.
Continuation of: 88421ea973916e Trim zero components from buffer and image stores
Differential Revision: https://reviews.llvm.org/D152440
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.
In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.
Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed by: arsenm, foad, nhaehnle, piotr
This barely matters since 99% are converted to the generic intrinsic now,
and the only real difference is the target intrinsic supports a variable
test mask. Start propagating poison. Prefer folding to a defined result (false)
for an undef test mask. Propagate undef for the first operand.
InstCombine already does this for minnum/maxnum. If we
also apply this to fmed3, we don't need to explicitly
use 16-bit fmed3 if we're not sure the target
supports 16-bit instructions yet.
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This ensures that the new instructions get reprocessed in the same
iteration.
This should be largely NFC, apart from worklist order effects and
naming changes, as seen in the test diff.
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D146737
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.
This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.
Differential Revision: https://reviews.llvm.org/D149776