126 Commits

Author SHA1 Message Date
Valery Pykhtin
b8025d1482
Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)
Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.
2024-02-02 13:09:25 +01:00
Valery Pykhtin
9791e54149
Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)
Reverts llvm/llvm-project#71556

Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289
2024-01-17 14:12:07 +01:00
Valery Pykhtin
57b50ef017
[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556)
Substitute with zero-extended to i64 ballot.i32 intrinsic.
2024-01-17 17:02:05 +07:00
Mariusz Sikora
2b83ceee3d
[AMDGPU][GFX12] Default component broadcast store (#76212)
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.

This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.

For image stores it also trims DMask if necessary.

---------

Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
2024-01-12 08:26:08 +01:00
Nikita Popov
9d60e95bcd
[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.

Also bail out of dmask is 0, as this case has special semantics:

> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
2023-12-20 11:01:59 +01:00
Nikita Popov
9d4557920f [InstCombine] Don't treat undef as poison in demanded element simplification
We can only set PoisonElts if the element is poison, not if it is
undef.
2023-12-19 12:26:48 +01:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Nikita Popov
e93d324adb [InstCombine] Preserve poison in evaluateInDifferentElementOrder()
Don't unnecessarily replace poison with undef.
2023-12-18 15:36:22 +01:00
Nikita Popov
6c9813aa02 [InstCombine] Check for poison instead of undef in shuffle combine
Otherwise we may replace undef with poison.

Note that a lot of tests regressing here already have variants
that use poison instead of undef (often in a separate
inseltpoison file), which is why I'm not adjusting them to the
new pattern.
2023-12-18 15:19:16 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Nikita Popov
c00f49cf12 [InstCombine] Remove instcombine-infinite-loop-threshold option
This option has been superseded by the fixpoint verification
functionality.
2023-09-21 15:30:05 +02:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Matt Arsenault
61c8af6792 AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16
There's nothing special about f16 sqrt handling.

https://reviews.llvm.org/D158090
2023-08-23 20:30:40 -04:00
Matt Arsenault
7c4aa3b37e AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq
We currently have some wrong combines in the backend that
approximately do this.

https://reviews.llvm.org/D158002
2023-08-16 10:04:13 -04:00
Matt Arsenault
f19ee76f35 AMDGPU: Add baseline tests for rcp to rsq fold 2023-08-16 10:03:49 -04:00
Kevin P. Neal
1e7c79d362 [FPEnv][InstCombine] Correct strictfp tests.
Correct InstCombine strictfp tests to follow the rules documented
in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.

Test changes verified with D146845.
2023-08-02 13:03:10 -04:00
Jay Foad
70eafa391b [InstCombine] Regenerate AMDGPU test checks 2023-07-14 15:28:55 +01:00
Matt Arsenault
5ccfc4543d AMDGPU: Fold away mbcnt.hi in wave32 mode
This will allow libraries to drop some of the special casing based on
wave size.
2023-06-30 15:04:03 -04:00
Matt Arsenault
3680b57a88 AMDGPU: Add baseline tests for mbcnt.hi combine 2023-06-30 15:04:03 -04:00
Jay Foad
84313162bf [AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec)
Rationale:
- It does not enable any further IR simplifications.
- It does not improve the generated code since the isel lowering of
  ballot also has special cases for 0 and 1.
- getreg is "too powerful" since it can read from many different
  registers, so its intrinsic properties have to be set very
  conservatively.

There is also a correctness problem that getreg can read from exec but
it is currently not marked as convergent.

Differential Revision: https://reviews.llvm.org/D153047
2023-06-16 17:15:52 +01:00
Mateja Marjanovic
7047cb5203 [AMDGPU] Trim trailing undefs from the end of image and buffer store
Remove undef values from the end of the vector operand in image and
buffer store instructions.
Also instead of call to computeKnownFPClass, use only findScalarElement.

Continuation of: 88421ea973916e Trim zero components from buffer and image stores

Differential Revision: https://reviews.llvm.org/D152440
2023-06-15 15:19:36 +02:00
Matt Arsenault
c6aaa0b14f AMDGPU: Perform basic folds on llvm.amdgcn.exp2 2023-06-15 07:01:06 -04:00
Matt Arsenault
6e934f2292 AMDGPU: Add baseline tests for llvm.amdgcn.exp2 folds 2023-06-15 07:01:01 -04:00
Matt Arsenault
10717f9294 AMDGPU: Add basic folds for llvm.amdgcn.log 2023-06-12 21:10:30 -04:00
Matt Arsenault
1269e45b09 AMDGPU: Add baseline instcombine test for llvm.amdgcn.log 2023-06-12 21:10:30 -04:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Mateja Marjanovic
88421ea973 [AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.

This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.

For image stores it also trims DMask if necessary.

Reviewed by: arsenm, foad, nhaehnle, piotr
2023-06-05 12:30:21 +02:00
Matt Arsenault
8609df7c6e AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic
This barely matters since 99% are converted to the generic intrinsic now,
and the only real difference is the target intrinsic supports a variable
test mask. Start propagating poison. Prefer folding to a defined result (false)
for an undef test mask. Propagate undef for the first operand.
2023-06-01 18:35:55 -04:00
Matt Arsenault
9ef1333bf4 AMDGPU: Replace certain llvm.amdgcn.class uses with llvm.is.fpclass
Most transforms should now be performed on llvm.is.fpclass. Unlike the
generic intrinsic, this supports variable test masks.
2023-05-24 21:49:52 +01:00
Matt Arsenault
f74bb32694 AMDGPU: Add some new tests for class undef/poison handling 2023-05-24 16:54:39 +01:00
Mateja Marjanovic
9c8c31eea4 Revert "[AMDGPU] Trim zero components from buffer and image stores"
This reverts commit 3181a6e3e7dae9292782216a55c5e1f0583c1668.
2023-05-18 17:02:01 +02:00
Matt Arsenault
8f3e64624c AMDGPU: Fold fmed3 of fpext sources to f16 fmed3
InstCombine already does this for minnum/maxnum. If we
also apply this to fmed3, we don't need to explicitly
use 16-bit fmed3 if we're not sure the target
supports 16-bit instructions yet.
2023-05-18 08:34:46 +01:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Nikita Popov
605f0a46dc [InstCombine] Use IRBuilder in evaluateInDifferentElementOrder()
This ensures that the new instructions get reprocessed in the same
iteration.

This should be largely NFC, apart from worklist order effects and
naming changes, as seen in the test diff.
2023-05-17 15:07:36 +02:00
Mateja Marjanovic
3181a6e3e7 [AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.

This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.

For image stores it also trims DMask if necessary.

Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D146737
2023-05-15 18:23:27 +02:00
Matt Arsenault
cc54f8eec7 AMDGPU: Add baseline tests for fmed3 shrinking combine 2023-05-10 08:02:11 +01:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Matt Arsenault
1bce1beac4 AMDGPU: Reduce number of calls to computeKnownFPClass and pass all arguments
Makes assumes work for this case.
2023-04-26 13:02:17 -04:00
Michael Liao
72fc08a541 [InstCombine] Teach alloca replacement to handle addrspacecast
- As the address space cast may not be valid on a specific target,
  `addrspacecast` is not handled when an `alloca` is able to be replaced
  with the source of memcpy/memmove. This patch addresses that by
  querying a target hook on whether that address space cast is valid.
  For example, on most GPU targets, the cast from a global pointer to a
  generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
  replacement is enhanced to handle that `addrspacecast` as well.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D147025
2023-04-11 11:47:37 -04:00
Roman Lebedev
5fb9e84047
[NFC] Port all InstCombine tests to -passes= syntax 2022-12-08 02:38:44 +03:00
Matt Arsenault
8fcf387202 InstCombine: Convert target tests to opaque pointers
The opaquify script deleted a few declarations for some reason which
were manually deleted.
2022-12-01 21:56:14 -05:00
Matt Arsenault
e04d2e20c3 AMDGPU: Add some baseline tests for llvm.amdgcn.trig.preop folding 2022-11-18 09:14:19 -08:00
Matt Arsenault
a58541f14d AMDGPU: Fold llvm.amdgcn.sqrt(undef) 2022-11-11 17:02:19 -08:00
Matt Arsenault
3e4280c04d AMDGPU: Disable some class simplifications for strictfp 2022-11-11 09:22:37 -08:00
Nikita Popov
fcfc31fffb [InstCombine] Convert some tests to opaque pointers (NFC)
Conversion was performed (without manual fixup) using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
2022-10-28 13:07:30 +02:00
Jay Foad
bfcfd53b92 [AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.

Also use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D127662
2022-06-13 21:12:11 +01:00
Mariusz Sikora
2417de2758 [AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Differential Revision: https://reviews.llvm.org/D124232
2022-05-05 06:29:19 +02:00