357 Commits

Author SHA1 Message Date
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
Kazu Hirata
28a78e2a4a [AMDGPU] Use isNullConstant (NFC) 2023-12-07 22:33:46 -08:00
Ruiling, Song
c1511a65d5
[AMDGPU] Folding imm offset in more cases for scratch access (#70634)
For scratch load/store, our hardware only accept non-negative value in
SGPR/VGPR. Besides the case that we can prove from known bits, we can
also prove that the value in `base` will be non-negative: 1.) When the
ADD for the address calculation has NonUnsignedWrap flag. 2.) When the
immediate offset is already negative.
2023-11-29 12:46:45 +08:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Valery Pykhtin
fe6893b1d8
Improve selection of conditional branch on amdgcn.ballot!=0 condition in SelectionDAG. (#68714)
Improve selection of the following pattern:

bool cnd = ...
if (amdgcn.ballot(cnd) != 0) {
  ...
}

which means "execute _then_ if any lane has satisfied the _cnd_
condition".
2023-11-06 15:16:49 +01:00
Stanislav Mekhanoshin
fe8335babb
[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)
This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781
2023-10-30 08:12:28 -07:00
Mirko Brkušanin
ecfdc23dd2
[AMDGPU] Select gfx1150 SALU Float instructions (#66885) 2023-09-21 12:22:55 +02:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Kazu Hirata
57390c914b [AMDGPU] Use isNullConstant and isOneConstant (NFC) 2023-08-27 08:26:52 -07:00
Matt Arsenault
9a53f5f5c4 AMDGPU: Handle llvm.stacksave and llvm.stackrestore
Not sure if the only valid use is to have stackrestore directly
consume stacksave outputs or not. Handled exactly like a regular stack
pointer so all the edge cases theoretically should work.

https://reviews.llvm.org/D156669
2023-08-11 10:25:01 -04:00
Jay Foad
c2093b8504 [AMDGPU] Add target features for GDS and GWS
GFX9 subtargets from GFX90A onwards lack GDS but still have GWS.

Differential Revision: https://reviews.llvm.org/D156713
2023-08-02 09:02:07 +01:00
Matt Arsenault
bd203072e6 AMDGPU: Silence a gcc warning 2023-07-22 08:07:49 -04:00
Matt Arsenault
fb54afd1b7 AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers
This isn't always folded to fneg for a freestanding fsub depending on
the denormal mode. When matching source modifiers, we're implicitly
canonicalizing the input so we can fold it here.

Doesn't bother handling the VOP3P case since it's only relevant with
DAZ, which nobody really uses with f16.

For f64, tests show an existing bug where DAGCombiner tries to respect
the denormal mode for fsub -0, x, but not after it's lowered to fadd
-0, (fneg x). Either the fold is wrong or we shouldn't restrict the
fsub case based on the denormal mode.

https://reviews.llvm.org/D155652
2023-07-20 19:29:40 -04:00
David Green
2802739dfd [NFC] Replace ;; with ; 2023-06-11 10:25:24 +01:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
Jessica Del
04317d4da7 [AMDGPU][GISel] Add inverse ballot intrinsic
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.

Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.

The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D146287
2023-04-06 07:46:50 +02:00
Kazu Hirata
7bb6d1b32e [llvm] Skip getAPIntValue (NFC)
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
2023-03-22 22:10:25 -07:00
pvanhout
1f1fea6c38 Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 14:38:45 +01:00
pvanhout
0e79106fc9 Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel"
This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.
2023-03-14 11:48:58 +01:00
pvanhout
0022b5803f [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 11:18:28 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Justin Bogner
c083c89744 [AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.

Differential Revision: https://reviews.llvm.org/D144612
2023-02-23 10:23:34 -08:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Piotr Sobczak
51a49ec52a [AMDGPU] Clean up MUBUF immediate offset
D143174 lifted the artificial type restriction by promoting
offset to i32. This patch handles more cases: those involving
immediate offset in MUBUF.

Differential Revision: https://reviews.llvm.org/D144628
2023-02-23 13:29:53 +01:00
Piotr Sobczak
a3d7b3121c [AMDGPU][NFC] Add getMaxMUBUFImmOffset
Replace magic constant 4095 with the function getMaxMUBUFImmOffset().

Differential Revision: https://reviews.llvm.org/D144623
2023-02-23 11:29:59 +01:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Matt Arsenault
93ec3fa402 AMDGPU: Support atomicrmw uinc_wrap/udec_wrap
For now keep the exising intrinsics working.
2023-01-27 22:17:16 -04:00
Jay Foad
245e3dd948 [MC] Do not copy MCInstrDescs. NFC.
Avoid copying MCInstrDesc instances because a future patch will change
them to find their implicit operands and operand info array based on
their own "this" pointer, so it will only work for MCInstrDescs in the
TargetInsts table, not for a copy of an MCInstrDesc at a different
address.

Differential Revision: https://reviews.llvm.org/D142214
2023-01-23 11:55:49 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Nick Desaulniers
ad99774a5f [llvm][PassSupport] don't require passes to be default constructible
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.

Delete the default constructors of classes derived from
SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D140349
2022-12-20 14:07:29 -08:00
Carl Ritson
5bc703f755 [AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.

Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D139422
2022-12-20 16:22:14 +09:00
Craig Topper
c09edce1b3 [SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID.
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.

This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".

I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.

This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.

Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.

Step 1 to fixing PR59538.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140161
2022-12-15 15:48:55 -08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Justin Bogner
916ae0a060 [AMDGPU] Handle nnan and fast on the call in fpmed3 patterns
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.

Differential Revision: https://reviews.llvm.org/D139506
2022-12-06 22:57:52 -08:00
Thomas Symalla
851176c7f7 [AMDGPU] Remove AMDGPUISelDAGToDAG::isKnownNeverNaN
This function removes the mentioned function, as it only does two
checks which are already implemented as part of
SelectionDAG::isKnownNeverNaN - which is called there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138938
2022-11-30 07:58:32 +01:00
Mirko Brkusanin
e58b116843 [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
Differential Revision: https://reviews.llvm.org/D133012
2022-11-18 18:19:27 +01:00
Jay Foad
3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Piotr Sobczak
abd927e5a8 [AMDGPU] Check for num elts in SelectVOP3PMods
The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.

Differential Revision: https://reviews.llvm.org/D133852
2022-09-14 20:00:19 +02:00
Ivan Kosarev
5db8d6fd2b [AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132552
2022-09-05 14:22:06 +01:00
Ivan Kosarev
f33645301e [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130263
2022-09-05 12:53:05 +01:00
Stanislav Mekhanoshin
813ae2871d [AMDGPU] Detect uniformness of TID / wavefrontsize
A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)'
is wave uniform.

Differential Revision: https://reviews.llvm.org/D132511
2022-08-26 23:26:08 -07:00
Fangrui Song
c17450a094 [AMDGPU] Change DEBUG_TYPE from isel to amdgpu-isel
to match all other *ISelDAGToDAG.cpp
2022-07-23 11:32:02 -07:00
Ivan Kosarev
432cbd7827 [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129381
2022-07-18 11:29:31 +01:00