1227 Commits

Author SHA1 Message Date
Pierre van Houtryve
e9e3868707
[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346)
Addresses the FIXME for both DAGISel and GISel.
2023-09-15 08:11:01 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Pierre van Houtryve
3d0353793b
[AMDGPU] Fix HasFP32Denormals check in FDIV32 lowering (#66212)
Fixes SWDEV-403219
2023-09-14 08:47:10 +02:00
Matt Arsenault
edecb60481 Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
2023-09-13 08:38:48 +03:00
Jeffrey Byrnes
db47264ab3
Revert "[AMDGPU]: Allow combining into v_dot4" (#66158)
This reverts commit 7fda1b74be4a173031192d8516869e87e6b7582d.
2023-09-12 16:57:17 -07:00
Matt Arsenault
72a7024add AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129
2023-09-12 23:22:54 +03:00
Kazu Hirata
0bb49afeaf [AMDGPU] Fix an unused variable warning
This patch fixes:

  llvm/lib/Target/AMDGPU/SIISelLowering.cpp:2493:33: error: unused
  variable 'UserSGPRInfo' [-Werror,-Wunused-variable]
2023-09-12 12:06:36 -07:00
Austin Kerbow
343be5132e [AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the
number of user SGRPs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159439
2023-09-12 08:52:30 -07:00
Matt Arsenault
17bd80601e AMDGPU: Implement llvm.get.fpmode
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.

https://reviews.llvm.org/D152710
2023-09-10 10:19:19 +03:00
Jay Foad
8669a9f93a
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.

Recommit with a fix for a use-after-free bug in the first version of
this patch (#65340) which was caught by asan.
2023-09-08 16:16:02 +01:00
Jeffrey Byrnes
5044531afd
[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547)
As a standalone patch, it has limited effect. However, it is necessary
as it supports upcoming commits.
2023-09-07 15:13:42 -07:00
Jeffrey Byrnes
7fda1b74be [AMDGPU]: Allow combining into v_dot4
Differential Revision: https://reviews.llvm.org/D155995

Change-Id: I794f540217f0f84141338757b41b1be0493c7207
2023-09-07 12:58:48 -07:00
Florian Mayer
42a1d16179 Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)"
This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc.

Broke ASAN bot.
2023-09-06 13:16:55 -07:00
Jay Foad
11171d81ae
[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340)
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:

t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!

Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
2023-09-06 12:51:44 +01:00
Matt Arsenault
5f8ee45d5a AMDGPU: Implement llvm.get.rounding
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.

Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.

https://reviews.llvm.org/D153158
2023-08-30 14:06:13 -04:00
Kazu Hirata
57390c914b [AMDGPU] Use isNullConstant and isOneConstant (NFC) 2023-08-27 08:26:52 -07:00
Matt Arsenault
e954085f80 AMDGPU: Fix more unsafe rsq formation
Introducing rsq contract flags is wrong, and also requires some level
of approximate functions. AMDGPUCodeGenPrepare already should handle
the f32 cases with appropriate flags, and I don't see how new
situations to handle would arise during legalization (other than cases
involving the rcp intrinsic, which instcombine tries to
handle). AMDGPUCodeGenPrepare does need to learn better handling of
rcp/rsq for f64 though, which we never bothered to handle well.

Removes another obstacle to correctly lowering sqrt.

https://reviews.llvm.org/D158099
2023-08-23 19:28:49 -04:00
Jeffrey Byrnes
d26a06728d [DAG] NFC: Add getBitcastedExtOrTrunc
Simple function which scalarizes Ops then ExtOrTruncs them according to function parameters

Differential Revision: https://reviews.llvm.org/D157733

Change-Id: Ie5215069228f7bf530cd2dbb4bd17cbf409e046a
2023-08-17 14:29:17 -07:00
Matt Arsenault
c8eeee2be2 AMDGPU: Drop unsafe 1/sqrt -> rsq combine
AMDGPUCodeGenPrepare implements a safer version of this that handles
denormals correctly.

https://reviews.llvm.org/D158032
2023-08-16 08:52:17 -04:00
Jeffrey Byrnes
d0e54e377b [AMDGPU] Extend CalculateByteProvider to capture vectors and signed
Differential Revision: https://reviews.llvm.org/D157133

Change-Id: I9ba8727b4ac5a627de2f7d87d2169eb79e01f0ee
2023-08-11 08:47:17 -07:00
Matt Arsenault
9a53f5f5c4 AMDGPU: Handle llvm.stacksave and llvm.stackrestore
Not sure if the only valid use is to have stackrestore directly
consume stacksave outputs or not. Handled exactly like a regular stack
pointer so all the edge cases theoretically should work.

https://reviews.llvm.org/D156669
2023-08-11 10:25:01 -04:00
Jay Foad
c2093b8504 [AMDGPU] Add target features for GDS and GWS
GFX9 subtargets from GFX90A onwards lack GDS but still have GWS.

Differential Revision: https://reviews.llvm.org/D156713
2023-08-02 09:02:07 +01:00
Matt Arsenault
51ec5a2733 AMDGPU: Use available subtarget member 2023-07-31 08:05:12 -04:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
Jeffrey Byrnes
391249d1af [AMDGPU] Allow 8,16 bit sources in calculateSrcByte
This is required for many trees produced in practice for i8 CodeGen.

Differential Revision: https://reviews.llvm.org/D155864

Change-Id: Iac01d183d9998b15138bdc7a5051e3bed338e7d9
2023-07-28 09:50:21 -07:00
Matt Arsenault
95e5a461f5 AMDGPU: Always custom lower extract_subvector
The patterns were ripped out in
a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be
custom lowered. I absolutely hate how difficult it is to write tests
for these, I have no doubt there are more of these hidden.

Fixes #64142
2023-07-27 08:46:44 -04:00
Sameer Sahasrabuddhe
7c760b224b Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556

This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
2023-07-27 14:49:17 +05:30
Sameer Sahasrabuddhe
d0f7850b01 Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.

The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe
baa3386edb [GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556
2023-07-27 10:00:45 +05:30
Matt Arsenault
e3fd8f83a8 AMDGPU: Correctly expand f64 sqrt intrinsic
rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these users should be
updated to call the intrinsic instead.

The library and llpc expansions are slightly different.
llpc uses an ldexp to do the scale; the library uses a multiply.

Use ldexp to do the scale instead of the multiply.
I believe v_ldexp_f64 and v_mul_f64 are always the same number of
cycles, but it's cheaper to materialize the 32-bit integer constant
than the 64-bit double constant.

The libraries have another fast version of sqrt which will
be handled separately.

I am tempted to do this in an IR expansion instead. In the IR
we could take advantage of computeKnownFPClass to avoid
the 0-or-inf argument check.
2023-07-25 07:54:11 -04:00
Pravin Jagtap
c48ed93cf8 [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.
When input to intrinsic is uniform value, reduced value is
same as input whereas if input value is divergent we need
to iterate over all active lanes of WaveFront to perform
the reduction.

The control flow for a `loop` has been set up, which
iterates over `only` active lanes to perform reduction.

Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and
WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which
are lowered Post-ISel (in `EmitInstrWithCustomInserter `).

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D154858
2023-07-24 00:06:00 -04:00
Jay Foad
e45a0c2994 [AMDGPU][RFC] Update isLegalAddressingMode for GFX9 SMEM signed offsets
Differential Revision: https://reviews.llvm.org/D155587
2023-07-21 10:56:43 +01:00
Matt Arsenault
c28e09c8d1 AMDGPU: Preserve flags in fdiv_fast lowering
We were dropping the flags and thus blocking contract into potential
fadd users. GlobalISel was already preserving the flags here.

https://reviews.llvm.org/D155443
2023-07-18 06:57:07 -04:00
Nikita Popov
7be7f23269 [llvm] Remove uses of getWithSamePointeeType() (NFC) 2023-07-18 12:07:09 +02:00
Matt Arsenault
04185f0b0b AMDGPU: Fix broken denormal constant folding of canonicalize
This needs to consider the dynamic denormal mode. It should be
possible to implement a runtime DAZ check with a canonicalize.
2023-07-17 19:54:20 -04:00
Matt Arsenault
715b12746f AMDGPU: Use available subtarget member 2023-07-17 19:54:09 -04:00
Matt Arsenault
bd2dca0f74 AMDGPU: Use hex floats instead of ugly bitcasting 2023-07-17 19:54:04 -04:00
Jon Chesterfield
74e928a081 [amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190
2023-07-13 23:54:38 +01:00
Ivan Kosarev
289ae6525d [AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions.
The patch adds the support for 'noa16' operands in non-A16 variants of
the instructions, fixes validation of A16 operands and eliminates the
custom conversion to MCInst.

Part of <https://github.com/llvm/llvm-project/issues/62629>.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155057
2023-07-13 19:46:03 +01:00
Jay Foad
f7684d8510 [DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.

On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.

Differential Revision: https://reviews.llvm.org/D154960
2023-07-12 08:12:09 +01:00
Matt Arsenault
fbe4ff8149 AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the
dynamic mode, since it just ignores the mode. fdiv lowering is still
somewhat broken because it involves a mode switch and we need to query
the original mode.
2023-07-11 15:14:52 -04:00
Amara Emerson
3a80bdb316 [GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
 1) Removes the check and updates affected tests which now do some more reassociations.
 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
    check. Not sure why I didn't do this the first time.
2023-07-10 01:03:12 -07:00
Christudasan Devadasan
b78b36e1a2 [AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.

This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D143759
2023-07-07 22:51:45 +05:30
Matt Arsenault
9c82dc6a6b AMDGPU: Always use v_rcp_f16 and v_rsq_f16
These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
2023-07-05 16:53:01 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Diana Picus
d98e44b343 [AMDGPU][DAGISel] Be more flexible about what calls are allowed
Remove DAGISel checks on calling conventions. GlobalISel doesn't have
these checks either and we prefer it that way (see D152794).

Add a simple test like the one introduced in D117479 for GlobalISel.

Differential Revision: https://reviews.llvm.org/D153535
2023-06-27 09:49:38 +02:00
Matt Arsenault
9f274939db AMDGPU: Handle the easy parts of strict fptrunc
f64->f16 is hard. The expansion is all integer but we need
to raise exceptions. Also doesn't handle the illegal f16 targets.
2023-06-25 19:26:25 -04:00
Matt Arsenault
89ccfa1b39 AMDGPU: Use correct lowering for llvm.log2.f32
We previously directly codegened to v_log_f32, which is broken for
denormals. The lowering isn't complicated, you simply need to scale
denormal inputs and adjust the result. Note log and log10 are still
not accurate enough, and will be fixed separately.
2023-06-23 08:37:37 -04:00
Matt Arsenault
92ee60b66f AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw 2023-06-21 21:20:26 -04:00
Jeffrey Byrnes
ac2d6df2d6 [AMDGPU] Add basic support for extended i8 perm matching
Differential Revision: https://reviews.llvm.org/D142782

Change-Id: Ibb95224f7885839e8b77a705f487f10b47a258a6
2023-06-19 09:53:25 -07:00