570 Commits

Author SHA1 Message Date
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
James Y Knight
137f785fa6
[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
2023-12-18 16:51:06 -05:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Matt Arsenault
db8b85ac58
AMDGPU: Support llvm.exp10 (#65860) 2023-12-02 21:56:35 +07:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Diana
7f5d59b38d
[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186)
The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.

Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.

This has already gone through a few rounds of review in Phab:

Differential Revision: https://reviews.llvm.org/D153761
2023-11-06 12:30:07 +01:00
Changpeng Fang
8ceb72ffe5
[AMDGPU] make v32i16/v32f16 legal (#70484)
Some upcoming intrinsics will be using these new types
2023-10-27 15:28:31 -07:00
Pierre van Houtryve
40a426fac6
[AMDGPU] Constant fold FMAD_FTZ (#69443)
Solves #68315
2023-10-19 16:05:51 +02:00
Jay Foad
21c2ba4bdb [GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal
Use LegalizerInfo::isLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D116807
2023-09-27 15:58:01 +01:00
Matt Arsenault
1328a8534b
AMDGPU: Fix handling of -0 in round lowering (#65761) 2023-09-19 09:14:17 +03:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Matt Arsenault
c48248d7f9 AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp
https://reviews.llvm.org/D158130
2023-09-12 23:23:10 +03:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Kazu Hirata
57390c914b [AMDGPU] Use isNullConstant and isOneConstant (NFC) 2023-08-27 08:26:52 -07:00
Bjorn Pettersson
a23e01ada7 [AMDGPU] Fix -Wenum-compare warnings
Avoiding warnings like this when building with GCC:
  warning: enumeral mismatch in conditional expression:
  'llvm::AMDGPUISD::NodeType' vs 'llvm::ISD::NodeType'
  [-Wenum-compare]
2023-08-23 14:24:30 +02:00
Diana Picus
26dc284498 [AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Lower formal arguments and returns for functions with the
`amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions:

* Put `inreg` arguments into SGPRs, starting at s0, and other arguments
into VGPRs, starting at v8. No arguments should end up on the stack, if
we don't have enough registers we should error out.

* Lower the return (which is always void) as an S_ENDPGM.

* Set the ScratchRSrc register to s48:51, as described in the docs.

* Set the SP to s32, matching amdgpu_gfx. This might be revisited in a
future patch.

Differential Revision: https://reviews.llvm.org/D153517
2023-08-21 11:16:17 +02:00
Matt Arsenault
81b278e613 AMDGPU: Fix fast f32 exp2
Mirror of the previous log changes, OpenCL conformance doesn't like
interpreting afn as ignore denormal handling but was previously hidden
by flag dropping.
2023-08-15 10:48:46 -04:00
Matt Arsenault
4b7b4b9458 AMDGPU: Fix fast f32 log/log10
OpenCL conformance didn't like interpreting afn as ignore the denormal
handling.

https://reviews.llvm.org/D157940
2023-08-15 10:48:46 -04:00
Matt Arsenault
e09b3593ba AMDGPU: Fix fast math log2 f32
Apparently afn doesn't allow you to drop the denormal handling
according to OpenCL conformance. This was hidden by losing the flags
during the library linking process. Fast log is still broken and needs
more work.

https://reviews.llvm.org/D157936
2023-08-15 10:48:46 -04:00
Matt Arsenault
1faa4797ca AMDGPU: Handle unsafe exp.f32 with denormal handling
I somehow missed this path when adding the new expansions. Saves a lot
of instructions for afn + IEEE.

https://reviews.llvm.org/D157867
2023-08-14 18:36:01 -04:00
Matt Arsenault
9a53f5f5c4 AMDGPU: Handle llvm.stacksave and llvm.stackrestore
Not sure if the only valid use is to have stackrestore directly
consume stacksave outputs or not. Handled exactly like a regular stack
pointer so all the edge cases theoretically should work.

https://reviews.llvm.org/D156669
2023-08-11 10:25:01 -04:00
Matt Arsenault
055a7f2512 AMDGPU: Adjust outdated comment 2023-07-31 08:05:13 -04:00
Matt Arsenault
0295513238 AMDGPU: Filter out contract flags when lowering exp
It is unsafe to contract the fsub into the fmul. It also increases
code size by duplicating a constant.
2023-07-20 18:14:24 -04:00
Matt Arsenault
fbe4ff8149 AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the
dynamic mode, since it just ignores the mode. fdiv lowering is still
somewhat broken because it involves a mode switch and we need to query
the original mode.
2023-07-11 15:14:52 -04:00
Amara Emerson
3a80bdb316 [GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
 1) Removes the check and updates affected tests which now do some more reassociations.
 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
    check. Not sure why I didn't do this the first time.
2023-07-10 01:03:12 -07:00
Matt Arsenault
8ee1cc82c9 AMDGPU: Fold out sign bit ops on frexp_exp
The sign bit has no impact on the exponent, so strip these away. Saves
on the source modifier encoding cost. I left the GlobalISel handling
until there's a resolution to issue #62628.

We should do this in instcombine too, but legalization should be
introducing more frexps than it currently is where this would occur.
2023-07-06 10:26:21 -04:00
Matt Arsenault
5491666248 AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.

Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.

Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.
2023-07-05 17:23:49 -04:00
Matt Arsenault
ed556a1ad5 AMDGPU: Correctly lower llvm.exp2.f32
Previously this did a fast math expansion only.
2023-07-05 17:23:48 -04:00
Matt Arsenault
4e15f378ee AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32
Previously we expanded these in a fast-math way and the device
libraries were relying on this behavior. The libraries have a pending
change to switch to the new target intrinsic.

Unlike the library version, this takes advantage of no-infinities on
the result overflow check.
2023-07-05 15:30:35 -04:00
Matt Arsenault
89ccfa1b39 AMDGPU: Use correct lowering for llvm.log2.f32
We previously directly codegened to v_log_f32, which is broken for
denormals. The lowering isn't complicated, you simply need to scale
denormal inputs and adjust the result. Note log and log10 are still
not accurate enough, and will be fixed separately.
2023-06-23 08:37:37 -04:00
Matt Arsenault
d9333e360a Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874.

Accidentally pushed wrong patch
2023-06-16 18:13:07 -04:00
Matt Arsenault
1159c670d4 AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp 2023-06-16 18:06:27 -04:00
Matt Arsenault
28f3edd2be AMDGPU: Add llvm.amdgcn.exp2 intrinsic
Provide direct access to v_exp_f32 and v_exp_f16, so we can start
correctly lowering the generic exp intrinsics.

Unfortunately have to break from the usual naming convention of
matching the instruction name and stripping the v_ prefix. exp is
already taken by the export intrinsic. On the clang builtin side, we
have a choice of maintaining the convention to the instruction name,
or following the intrinsic name.
2023-06-15 07:00:07 -04:00
Matt Arsenault
d0923a7739 AMDGPU: Correct constants used in fast math log expansion
The division between float constants was done with less
precision. Performing the divide in double and truncating to float
provides the same value as used in the library fast math expansion.
2023-06-12 21:11:41 -04:00
Matt Arsenault
eccc89b26c AMDGPU: Add llvm.amdgcn.log intrinsic
This will map directly to the hardware instruction which does not
handle denormals for f32. This will allow moving the generic intrinsic
to be lowered correctly. Also handles selecting the f16 version, but
there's no reason to use it over the generic intrinsic.
2023-06-12 21:10:30 -04:00
Matt Arsenault
abff7668ab AMDGPU: Implement known bits functions for min3/max3/med3 2023-06-10 10:58:44 -04:00
Matt Arsenault
4e4c351ae5 AMDGPU: Avoid endpgm in middle of block for fallback trap lowering.
This was inserting an s_endpgm in the middle of the block when it has
to be a terminator. Split the block and insert a branch to a new block
with the trap if it's not in a terminator position.

Fixes verifier error on LDS in function with no trap support (and
other trap sources).
2023-06-09 21:04:38 -04:00
Amara Emerson
086601eac2 [GlobalISel] Implement some binary reassociations, G_ADD for now
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)

Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.

Differential Revision: https://reviews.llvm.org/D150230
2023-06-08 21:14:58 -07:00
Matt Arsenault
c01f284fbb AMDGPU: Fix regressions in integer mad matching
Undo the canonicalize done in
0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c. Restores some regressed
matching of integer mad. The selection patterns fo the actual mads
don't seem to be properly commuting, so some of the commuted cases are
still missed.

Fixes: SWDEV-363009
2023-06-08 16:48:47 -04:00
Matt Arsenault
3d0350b762 AMDGPU: Add MF independent version of getImplicitParameterOffset 2023-06-07 08:26:31 -04:00
Matt Arsenault
bc61bc8d6a AMDGPU: Use available subtarget member 2023-06-07 08:26:31 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Jay Foad
a4a3ac10cb [AMDGPU] Remove extract_subvector patterns
Removing them seems to slightly increase code quality as well as
simplifying both the tablegen and C++ parts of the code.

Differential Revision: https://reviews.llvm.org/D149853
2023-06-06 14:04:50 +01:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Elliot Goodrich
ac73c48e09 [llvm] Reduce ComplexDeinterleavingPass.h includes
Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from
`ComplexDeinterleavingPass.h` and move it to the corresponding source
file.

Add missing includes that were transitively included by this header to 3
other source files.

This reduces the total number of preprocessing tokens across the LLVM
source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a
reduction of ~1.52%. This should result in a small improvement in
compilation time.
2023-05-20 17:49:18 +01:00
Thomas Symalla
91a7aa4c9b [AMDGPU] Improve abs modifier usage
If a call to the llvm.fabs intrinsic has users in another reachable
BB, SelectionDAG will not apply the abs modifier to these users and
instead generate a v_and ..., 0x7fffffff instruction.
For fneg instructions, the issue is similar.
This patch implements `AMDGPUIselLowering::shouldSinkOperands`,
which allows CodegenPrepare to call `tryToSinkFreeOperands`.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D150347
2023-05-19 12:02:21 +02:00
Philip Reames
0dc0c27989 [TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc]
Make the decision to consider zero constant stores cheap target specific.  Will be used in an upcoming change for RISCV.
2023-05-17 09:19:01 -07:00
Nicolai Hähnle
ef13308b26 AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors
v2:
- simplify the escape to TableGen patterns

Differential Revision: https://reviews.llvm.org/D149841
2023-05-05 10:55:18 +02:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00