342 Commits

Author SHA1 Message Date
David Green
2802739dfd [NFC] Replace ;; with ; 2023-06-11 10:25:24 +01:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Jessica Del
04317d4da7 [AMDGPU][GISel] Add inverse ballot intrinsic
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.

Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.

The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D146287
2023-04-06 07:46:50 +02:00
Kazu Hirata
7bb6d1b32e [llvm] Skip getAPIntValue (NFC)
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
2023-03-22 22:10:25 -07:00
pvanhout
1f1fea6c38 Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 14:38:45 +01:00
pvanhout
0e79106fc9 Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel"
This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.
2023-03-14 11:48:58 +01:00
pvanhout
0022b5803f [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 11:18:28 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Justin Bogner
c083c89744 [AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.

Differential Revision: https://reviews.llvm.org/D144612
2023-02-23 10:23:34 -08:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Piotr Sobczak
51a49ec52a [AMDGPU] Clean up MUBUF immediate offset
D143174 lifted the artificial type restriction by promoting
offset to i32. This patch handles more cases: those involving
immediate offset in MUBUF.

Differential Revision: https://reviews.llvm.org/D144628
2023-02-23 13:29:53 +01:00
Piotr Sobczak
a3d7b3121c [AMDGPU][NFC] Add getMaxMUBUFImmOffset
Replace magic constant 4095 with the function getMaxMUBUFImmOffset().

Differential Revision: https://reviews.llvm.org/D144623
2023-02-23 11:29:59 +01:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Matt Arsenault
93ec3fa402 AMDGPU: Support atomicrmw uinc_wrap/udec_wrap
For now keep the exising intrinsics working.
2023-01-27 22:17:16 -04:00
Jay Foad
245e3dd948 [MC] Do not copy MCInstrDescs. NFC.
Avoid copying MCInstrDesc instances because a future patch will change
them to find their implicit operands and operand info array based on
their own "this" pointer, so it will only work for MCInstrDescs in the
TargetInsts table, not for a copy of an MCInstrDesc at a different
address.

Differential Revision: https://reviews.llvm.org/D142214
2023-01-23 11:55:49 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Nick Desaulniers
ad99774a5f [llvm][PassSupport] don't require passes to be default constructible
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.

Delete the default constructors of classes derived from
SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D140349
2022-12-20 14:07:29 -08:00
Carl Ritson
5bc703f755 [AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.

Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D139422
2022-12-20 16:22:14 +09:00
Craig Topper
c09edce1b3 [SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID.
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.

This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".

I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.

This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.

Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.

Step 1 to fixing PR59538.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140161
2022-12-15 15:48:55 -08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Justin Bogner
916ae0a060 [AMDGPU] Handle nnan and fast on the call in fpmed3 patterns
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.

Differential Revision: https://reviews.llvm.org/D139506
2022-12-06 22:57:52 -08:00
Thomas Symalla
851176c7f7 [AMDGPU] Remove AMDGPUISelDAGToDAG::isKnownNeverNaN
This function removes the mentioned function, as it only does two
checks which are already implemented as part of
SelectionDAG::isKnownNeverNaN - which is called there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138938
2022-11-30 07:58:32 +01:00
Mirko Brkusanin
e58b116843 [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
Differential Revision: https://reviews.llvm.org/D133012
2022-11-18 18:19:27 +01:00
Jay Foad
3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Piotr Sobczak
abd927e5a8 [AMDGPU] Check for num elts in SelectVOP3PMods
The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.

Differential Revision: https://reviews.llvm.org/D133852
2022-09-14 20:00:19 +02:00
Ivan Kosarev
5db8d6fd2b [AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132552
2022-09-05 14:22:06 +01:00
Ivan Kosarev
f33645301e [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130263
2022-09-05 12:53:05 +01:00
Stanislav Mekhanoshin
813ae2871d [AMDGPU] Detect uniformness of TID / wavefrontsize
A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)'
is wave uniform.

Differential Revision: https://reviews.llvm.org/D132511
2022-08-26 23:26:08 -07:00
Fangrui Song
c17450a094 [AMDGPU] Change DEBUG_TYPE from isel to amdgpu-isel
to match all other *ISelDAGToDAG.cpp
2022-07-23 11:32:02 -07:00
Ivan Kosarev
432cbd7827 [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129381
2022-07-18 11:29:31 +01:00
Ivan Kosarev
9c66c02e2e [AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets.
Saves some add instructions on a couple Rage 2 shaders and is also a
prerequisite for a coming-soon change matching (register + immediate)
offsets.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D129095
2022-07-18 11:18:23 +01:00
Ivan Kosarev
4696a33dfa [AMDGPU][NFC] Refine matching SMRD offsets.
Tell the matcher what we are looking for instead of matching everything
and then discarding the result if doesn't fit.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128171
2022-07-05 14:07:22 +01:00
Piotr Sobczak
4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Joe Nash
20d20156f4 [AMDGPU] gfx11 VINTERP intrinsics and ISel support
Depends on D127664

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127756
2022-06-17 09:16:59 -04:00
Joe Nash
2d43de13df [AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904
2022-06-16 14:19:34 -04:00
Jay Foad
7b9f620e78 [AMDGPU] Work around GFX11 flat scratch SVS swizzling bug
Differential Revision: https://reviews.llvm.org/D127635
2022-06-13 21:00:42 +01:00
Jay Foad
d943c51465 [AMDGPU] Fix GFX11 codegen for V_MAD_U64_U32 and V_MAD_I64_I32
GFX11 uses different pseudos for these because of a new constraint
on which operands' registers can overlap.

Differential Revision: https://reviews.llvm.org/D127659
2022-06-13 20:59:18 +01:00
Guillaume Chatelet
0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
Abinav Puthan Purayil
f59cb41ba1 [AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.

Differential Revision: https://reviews.llvm.org/D121770
2022-03-17 10:12:32 +05:30
Stanislav Mekhanoshin
c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin
36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Jay Foad
d7e03df719 [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32
Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.

Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.

Differential Revision: https://reviews.llvm.org/D113986
2021-11-24 11:25:02 +00:00
Abinav Puthan Purayil
078da26b1c [AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Differential Revision: https://reviews.llvm.org/D113448
2021-11-24 10:53:12 +05:30
alex-t
0a3d755ee9 [AMDGPU] Enable divergence-driven BFE selection
Detailed description: This change enables the bit field extract patterns
selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node
divergence.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110950
2021-11-03 23:26:59 +03:00