7724 Commits

Author SHA1 Message Date
Kazu Hirata
7ada7bbee1 [Target] Use *{Set,Map}::contains (NFC) 2023-03-14 18:06:55 -07:00
pvanhout
1f1fea6c38 Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 14:38:45 +01:00
pvanhout
0e79106fc9 Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel"
This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.
2023-03-14 11:48:58 +01:00
pvanhout
0022b5803f [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 11:18:28 +01:00
Jay Foad
dc3882eace [AMDGPU] Fix .amdhsa_shared_vgpr_count error checking for GFX11
Differential Revision: https://reviews.llvm.org/D145936
2023-03-14 09:05:32 +00:00
Jay Foad
23b0df72d2 [AMDGPU] Remove BoolToList class
Replace all:
  foreach _ = BoolToList<cond>.ret in
with:
  if cond then

Thanks to Philip Reames for D145711 which enabled this.
2023-03-13 09:22:52 +00:00
Simon Pilgrim
9041682d2c [DAG] Remove redundant isZExtFree(SDValue,VT) overrides. NFC.
These implementations both match the TargetLoweringBase.isZExtFree implementation
2023-03-12 15:56:04 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Jay Foad
5fc5c7ebe2 [AMDGPU] Make use of defvar in defining SMEM Real instructions 2023-03-10 14:31:24 +00:00
Mirko Brkusanin
2eada459c7 [AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16
Differential Revision: https://reviews.llvm.org/D145785
2023-03-10 14:47:49 +01:00
Valery Pykhtin
8f6c47b7a4 [AMDGPU] Speedup GCNDownwardRPTracker::advanceBeforeNext
The function makes liveness tests for the entire live register set for every instruction it passes by.
This becomes very slow on high RP regions such as ASAN enabled code.

Instead only uses of last tracked instruction should be tested and this greatly improves compilation time.

This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should
be fixed first.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136267
2023-03-09 15:18:02 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Stanislav Mekhanoshin
e7ec123c6a [AMDGPU] Implement idempotent atomic lowering
This turns an idempotent atomic operation into an atomic load.

Fixes: SWDEV-385135

Differential Revision: https://reviews.llvm.org/D144759
2023-03-08 14:09:59 -08:00
Valery Pykhtin
a999669982 [AMDGPU] Scheduler: fix RP calculation for a MBB with one successor
We reuse live registers after tracking one MBB as live-ins to the successor MBB
if the successor is only one but we don't check if the successor has other predecessors.

`A   B`
` \ /`
`  C`

A and B have one successor but C has live-ins defined by A and B and therefore should be
initialized using LIS.

This fixes 83 lit tests out if 420 with EXPENSIVE_CHECK enabled.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D136918
2023-03-08 12:20:03 +01:00
Stanislav Mekhanoshin
59162e3859 [AMDGPU] Skip buffer_wbl2 before atomic fence acquire
Memory models for gfx90a and gfx940 do not require buffer_wbl2
before the fence for acquire ordering, but we do insert the full
release.

Fixes: SWDEV-386785

Differential Revision: https://reviews.llvm.org/D145524
2023-03-08 01:24:20 -08:00
Christudasan Devadasan
2171f04c12 [AMDGPU] Extend WorkGroupID* codegen for compute shaders
Currently, the codegen support for llvm.amdgcn.workgroup.id*
intrinsics are enabled only for compute kernels. In addition,
this patch enables their selection for compute shaders on
subtargets that have architected SGPRs.

Differential Revision: https://reviews.llvm.org/D145045
2023-03-08 07:36:19 +05:30
Simon Pilgrim
e6287d57a3 [DAG] isNarrowingProfitable - consistently use SrcVT/DestVT argument names. NFC.
Make it more obvious what order the narrowing types are in.
2023-03-07 14:00:06 +00:00
pvanhout
edca49cfb7 [AMDGPU] Match med3 for (max (min ..))
We previously only matched (min (max ...))

Depends on D144728

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145159
2023-03-07 11:14:31 +01:00
pvanhout
036431e31e [AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145366
2023-03-06 13:35:57 +01:00
pvanhout
dbebebf6f6 [AMDGPU] Use UniformityAnalysis in CodeGenPrepare
A little extra change was needed in UA because it didn't consider
InvokeInst and it made call-constexpr.ll assert.

Reviewed By: sameerds, arsenm

Differential Revision: https://reviews.llvm.org/D145358
2023-03-06 13:26:51 +01:00
pvanhout
7a5d850da2 [AMDGPU] Use UniformityAnalysis in RewriteUndefsForPHI
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145359
2023-03-06 12:15:33 +01:00
Matt Arsenault
9f4746b65f AMDGPU: Combine down fcopysign f64 magnitude
Copy through the low bits and only apply an f32
copysign to the high half. This is effectively
what we do for codegen anyway, but this provides
some combine benefits. The cases involving constants
show some small improvements.

https://reviews.llvm.org/D142682
2023-03-06 05:54:25 -04:00
Matt Arsenault
606a62ce27 AMDGPU: Force sign operand of f64 fcopysign to f32
The fcopysign DAG operation, unlike the IR one, allows
different types for the sign and magnitude. We can reduce
the bitwidth of the high operand since only the sign bit matters.

The default combine only introduces mixed fcopysign
operand types from fpext/fptrunc. We effectively do this
already during selection, but doing it earlier in the combiner
should expose new combine opportunities (e.g. the existing tests
now eliminate the load of the low half of the double). Unfortunately
this isn't enough to handle the case I'm interested in just yet.
2023-03-05 19:54:13 -04:00
Matt Arsenault
bd1f7c417f AMDGPU: Try to push fneg as integer into select
I initially attempted to select the source modifier from xor of
a sign mask. This proved to be more difficult since
foldBinOpIntoSelect does not consider free fneg of integers
and undoes the combine.
2023-03-05 18:53:16 -04:00
Jay Foad
7ba61eaf34 [AMDGPU] More precise limit on SALU cycles in s_delay_alu instructions
This just tweaks the fix for D145232 to make the limit more precise, so
that we could actually emit a delay of 3 SALU cycles (the maximum) if we
had any SALU instructions that required it.
2023-03-05 08:14:15 +00:00
Matt Arsenault
ce3d93e4be AMDGPU: Use static constexpr instead of static const
Not sure why this was broken, but I was seeing this linker error:

ld64.lld: error: undefined symbol: (anonymous namespace)::AMDGPUInsertDelayAlu::DelayInfo::SALU_CYCLES_MAX
>>> referenced by AMDGPUInsertDelayAlu.cpp:129 (/Users/matt/src/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:129)
2023-03-03 18:50:34 -04:00
Jeffrey Byrnes
b89236a96f [AMDGPU] Vectorize misaligned global loads & stores
Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1
2023-03-03 13:18:25 -08:00
Jay Foad
7442f8635b [AMDGPU] Fix invalid instid value in s_delay_alu instruction
Differential Revision: https://reviews.llvm.org/D145232
2023-03-03 21:08:26 +00:00
Nikita Popov
576060fb41 [ReplaceConstant] Extract code for expanding users of constant (NFC)
AMDGPU implements some handy code for expanding all constexpr
users of LDS globals. Extract the core logic into ReplaceConstant,
so that it can be reused elsewhere.
2023-03-03 16:09:06 +01:00
Jay Foad
08bdff862c [AMDGPU] Fix error message for illegal copy 2023-03-03 11:46:01 +00:00
ZHU Zijia
8fccdfa436 [AMDGPU] Remove outdated FIXME in comments [NFC]
This case has already been handled by D106449.
2023-03-03 01:34:19 +08:00
Anshil Gandhi
7474cd3e2e [SIAnnotateControlFlow] Use Uniformity analysis
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145013
2023-03-01 10:19:45 -07:00
Anshil Gandhi
1b52c7be91 [AMDGPUUnifyDivergentExitNodes] Use Uniformity Analysis
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145018
2023-03-01 10:17:11 -07:00
Ivan Kosarev
b06e5ad8a6 [AMDGPU][AsmParser][NFC] Simplify parsing cache policies.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144954
2023-03-01 12:34:21 +00:00
Anshil Gandhi
a78301560d [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues
Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D144162
2023-02-28 13:05:38 -07:00
Ivan Kosarev
905fa15d84 [AMDGPU][AsmParser] Distinguish literal and modifier SMEM offsets.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144902
2023-02-28 12:58:54 +00:00
Ivan Kosarev
dbbab71b76 [AMDGPU][NFC] Eliminate the u32imm operand definition.
It is only used to infer the types of offset parameters in isel patterns,
which we can specify directly.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D144890
2023-02-28 12:23:47 +00:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00
Justin Bogner
c083c89744 [AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.

Differential Revision: https://reviews.llvm.org/D144612
2023-02-23 10:23:34 -08:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Mirko Brkusanin
b3dc0e69cf [AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions
Image sample instructions that need more than 5 VGPRs for VAddr can use
partial NSA for NSA encoding format. VGPRs that can not fit into the
encoding are sequential after the last one.
This patch adds assembly and disassembly parts.

Differential Revision: https://reviews.llvm.org/D144033
2023-02-23 13:33:34 +01:00
Piotr Sobczak
51a49ec52a [AMDGPU] Clean up MUBUF immediate offset
D143174 lifted the artificial type restriction by promoting
offset to i32. This patch handles more cases: those involving
immediate offset in MUBUF.

Differential Revision: https://reviews.llvm.org/D144628
2023-02-23 13:29:53 +01:00
Piotr Sobczak
a3d7b3121c [AMDGPU][NFC] Add getMaxMUBUFImmOffset
Replace magic constant 4095 with the function getMaxMUBUFImmOffset().

Differential Revision: https://reviews.llvm.org/D144623
2023-02-23 11:29:59 +01:00
Konstantina Mitropoulou
944f429b21 [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16}
intrinsics are lowered as buffer_load_{u8,u16}. This patch combines
buffer_load_{u8,u16} and sign extension instructions in order to
generate buffer_load_{i8,i16} instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144313
2023-02-22 09:01:33 -08:00
Joe Nash
80a8e6805a [AMDGPU] Don't set src mods on permlane16
v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src
modifiers on any input, but they can set op_sel on src0 or src1 to
represent fi or bc when desired. The ISel patterns were setting
the src_modifier bits to -1, effectively setting abs and neg as well,
whenever it was intended to set op_sel, due to an error in ISel. ISel
should now correctly only set the op_sel bits.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144519
2023-02-22 11:41:52 -05:00
Jay Foad
c9f4df57ca [AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo
Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should
not be calling into GCNSubtarget.

Differential Revision: https://reviews.llvm.org/D144564
2023-02-22 16:19:05 +00:00
Jessica Del
fc672b6a8b [AMDGPU] Improved wide multiplies
These checks show optimized instructions if an operand is known to be
(partially) zero.

Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a

Differential Revision: https://reviews.llvm.org/D140208
2023-02-22 16:39:06 +01:00