6233 Commits

Author SHA1 Message Date
Matt Arsenault
cd60bff329 CodeGen: Add some additional is_fpclass lowering tests
Cover more cases in preparation for making greater use
of fcmp based lowerings. Also add more tests for the inverted
cases. Test iszero | isnan test masks. We should probably just
generate every combination of test masks.
2023-03-15 01:13:08 -04:00
Simon Pilgrim
4bf004e07e [DAG] Fold (bitcast (logicop (bitcast x), (c))) -> (logicop x, (bitcast c)) iff the current logicop type is illegal
Try to remove extra bitcasts around logicops if we're dealing with illegal types

Fixes the regressions in D145939

Differential Revision: https://reviews.llvm.org/D146032
2023-03-14 14:41:11 +00:00
pvanhout
1f1fea6c38 Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 14:38:45 +01:00
pvanhout
0ea6f0e158 [AMDGPU] Don't run llc-pipeline.ll when expensive_checks are enabled
AMDGPU ISel can add extra passes when expensive checks are enabled. This means the pipeline can be reordered and the checks may fail.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146038
2023-03-14 14:12:36 +01:00
pvanhout
0e79106fc9 Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel"
This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.
2023-03-14 11:48:58 +01:00
pvanhout
0022b5803f [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 11:18:28 +01:00
Chen Zheng
4f0ed16a46 Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec
[DAGCombiner] handle more store value forwarding

When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.

Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad

Reviewed By: nemanjai, RolandF

Differential Revision: https://reviews.llvm.org/D138899
2023-03-12 21:59:18 -04:00
Simon Pilgrim
f759275c1c [AMDGPU] Regenerate sdwa-peephole.ll 2023-03-12 13:50:25 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Simon Pilgrim
b53ea2b9c5 [DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable.
Try to more aggressively narrow masks of extended values.

This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.

This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.

Differential Revision: https://reviews.llvm.org/D145866
2023-03-12 13:25:23 +00:00
Mirko Brkusanin
2eada459c7 [AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16
Differential Revision: https://reviews.llvm.org/D145785
2023-03-10 14:47:49 +01:00
Max Kazantsev
6b03ce374e [LICM] Simplify (X < A && X < B) into (X < MIN(A, B)) if MIN(A, B) is loop-invariant
We don't do this transform in InstCombine in general case for arbitrary values, because cost of
AND and 2 ICMP's isn't higher than of MIN and ICMP. However, LICM also has a notion
about the loop structure. This transform becomes profitable if `A` and `B` are loop-invariant and
`X` is not: by doing this, we can compute min outside the loop.

Differential Revision: https://reviews.llvm.org/D143726
Reviewed By: nikic
2023-03-10 17:36:52 +07:00
Max Kazantsev
279f0c02ad [Test] Regenerate tests using update_llc_test_checks.py 2023-03-10 11:34:16 +07:00
Valery Pykhtin
8f6c47b7a4 [AMDGPU] Speedup GCNDownwardRPTracker::advanceBeforeNext
The function makes liveness tests for the entire live register set for every instruction it passes by.
This becomes very slow on high RP regions such as ASAN enabled code.

Instead only uses of last tracked instruction should be tested and this greatly improves compilation time.

This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should
be fixed first.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136267
2023-03-09 15:18:02 +01:00
Diana Picus
99d053a97d AMDGPU: Update checks for a couple of tests. NFC 2023-03-09 15:09:19 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Stanislav Mekhanoshin
e7ec123c6a [AMDGPU] Implement idempotent atomic lowering
This turns an idempotent atomic operation into an atomic load.

Fixes: SWDEV-385135

Differential Revision: https://reviews.llvm.org/D144759
2023-03-08 14:09:59 -08:00
Stanislav Mekhanoshin
59162e3859 [AMDGPU] Skip buffer_wbl2 before atomic fence acquire
Memory models for gfx90a and gfx940 do not require buffer_wbl2
before the fence for acquire ordering, but we do insert the full
release.

Fixes: SWDEV-386785

Differential Revision: https://reviews.llvm.org/D145524
2023-03-08 01:24:20 -08:00
Christudasan Devadasan
2171f04c12 [AMDGPU] Extend WorkGroupID* codegen for compute shaders
Currently, the codegen support for llvm.amdgcn.workgroup.id*
intrinsics are enabled only for compute kernels. In addition,
this patch enables their selection for compute shaders on
subtargets that have architected SGPRs.

Differential Revision: https://reviews.llvm.org/D145045
2023-03-08 07:36:19 +05:30
Florian Hahn
7019624ee1
[SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
pvanhout
edca49cfb7 [AMDGPU] Match med3 for (max (min ..))
We previously only matched (min (max ...))

Depends on D144728

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145159
2023-03-07 11:14:31 +01:00
pvanhout
764e39048c [AMDGPU] Precommit test: v_sat_pk_u8_i16.ll
Differential Revision: https://reviews.llvm.org/D144728
2023-03-07 11:07:13 +01:00
Jay Foad
5281f5c1e6 [AMDGPU] Add GFX9,GFX10,GFX11 checks for llvm.amdgcn.s.buffer.load 2023-03-06 18:19:50 +00:00
Jay Foad
e73d3150b1 [AMDGPU] Generate checks for llvm.amdgcn.s.buffer.load 2023-03-06 18:19:50 +00:00
pvanhout
036431e31e [AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145366
2023-03-06 13:35:57 +01:00
pvanhout
dbebebf6f6 [AMDGPU] Use UniformityAnalysis in CodeGenPrepare
A little extra change was needed in UA because it didn't consider
InvokeInst and it made call-constexpr.ll assert.

Reviewed By: sameerds, arsenm

Differential Revision: https://reviews.llvm.org/D145358
2023-03-06 13:26:51 +01:00
Jay Foad
271010bf50 [AMDGPU] Restore temporal divergence in test
The loop in this test was supposed to have temporal divergence but this
was broken by r367221. Fix it.
2023-03-06 12:09:52 +00:00
pvanhout
7a5d850da2 [AMDGPU] Use UniformityAnalysis in RewriteUndefsForPHI
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145359
2023-03-06 12:15:33 +01:00
Matt Arsenault
9f4746b65f AMDGPU: Combine down fcopysign f64 magnitude
Copy through the low bits and only apply an f32
copysign to the high half. This is effectively
what we do for codegen anyway, but this provides
some combine benefits. The cases involving constants
show some small improvements.

https://reviews.llvm.org/D142682
2023-03-06 05:54:25 -04:00
Matt Arsenault
606a62ce27 AMDGPU: Force sign operand of f64 fcopysign to f32
The fcopysign DAG operation, unlike the IR one, allows
different types for the sign and magnitude. We can reduce
the bitwidth of the high operand since only the sign bit matters.

The default combine only introduces mixed fcopysign
operand types from fpext/fptrunc. We effectively do this
already during selection, but doing it earlier in the combiner
should expose new combine opportunities (e.g. the existing tests
now eliminate the load of the low half of the double). Unfortunately
this isn't enough to handle the case I'm interested in just yet.
2023-03-05 19:54:13 -04:00
Matt Arsenault
bd1f7c417f AMDGPU: Try to push fneg as integer into select
I initially attempted to select the source modifier from xor of
a sign mask. This proved to be more difficult since
foldBinOpIntoSelect does not consider free fneg of integers
and undoes the combine.
2023-03-05 18:53:16 -04:00
Jeffrey Byrnes
b89236a96f [AMDGPU] Vectorize misaligned global loads & stores
Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1
2023-03-03 13:18:25 -08:00
Jay Foad
7442f8635b [AMDGPU] Fix invalid instid value in s_delay_alu instruction
Differential Revision: https://reviews.llvm.org/D145232
2023-03-03 21:08:26 +00:00
Jay Foad
08bdff862c [AMDGPU] Fix error message for illegal copy 2023-03-03 11:46:01 +00:00
Jay Foad
f5ab447cf6 [AMDGPU] Add test case for AMDGPUInsertDelayAlu bug 2023-03-03 11:08:39 +00:00
Petar Avramovic
c77bd1fe15 AMDGPU: Add more flat scratch load and store tests for 8 and 16-bit types
Add tests for more complicated scratch load and store patterns.
Includes:
- sign and zero extending loads of i8 and i16 to i32 into 32-bit register
- D16 instructions that affect only high or low 16 bits of 32-bit register
 - D16 sign and zero extending loads of i8 to i16 into high or low 16 bits
   of 32-bit register
 - D16 loads of i16 to high or low 16 bits of 32-bit register
 - D16 stores of i8 and i16 from high 16 bits of 32-bit register

Differential Revision: https://reviews.llvm.org/D145081
2023-03-02 13:20:14 +01:00
Anshil Gandhi
7474cd3e2e [SIAnnotateControlFlow] Use Uniformity analysis
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145013
2023-03-01 10:19:45 -07:00
Anshil Gandhi
1b52c7be91 [AMDGPUUnifyDivergentExitNodes] Use Uniformity Analysis
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145018
2023-03-01 10:17:11 -07:00
Zhongyunde
15d5c59280 [InstCombine] Improvement the analytics through the dominating condition
Address the dominating condition, the urem fold is benefit from the analytics improvements.
Fix https://github.com/llvm/llvm-project/issues/60546

NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp
is used to reduce compile time.

Reviewed By: nikic, arsenm, erikdesjardins
Differential Revision: https://reviews.llvm.org/D144248
2023-03-01 17:03:34 +08:00
Anshil Gandhi
a78301560d [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues
Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D144162
2023-02-28 13:05:38 -07:00
zhongyunde
d514726d31 [AMDGPU] Update the CHECK autogenerated as it's expired
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144771
2023-02-28 02:01:25 +08:00
Diana Picus
44f1cb04e5 [AMDGPU] Run update scripts on existing tests. NFC
Update a few tests where the checks aren't exactly kosher.

Differential Revision: https://reviews.llvm.org/D144639
2023-02-27 09:48:57 +01:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00
Jon Chesterfield
8e2f8387cc [amdgpu] Add test case showing bug prior to D141852 2023-02-24 15:15:07 +00:00
Vikram
64fc892cda [AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC)
Differential Revision: https://reviews.llvm.org/D143987
2023-02-24 02:03:56 -05:00
Piotr Sobczak
ab174c57f4 [AMDGPU] Add more tests for buffer intrinsics
Add more tests for buffer intrinsics with large voffsets.
2023-02-23 14:39:12 +01:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Diana Picus
da629d3381 [AMDGPU] Add GISel RUN lines to 2 existing tests. NFC
This adds a bit of coverage for GlobalISel.

Differential Revision: https://reviews.llvm.org/D144555
2023-02-23 09:46:54 +01:00