6187 Commits

Author SHA1 Message Date
Vikram
64fc892cda [AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC)
Differential Revision: https://reviews.llvm.org/D143987
2023-02-24 02:03:56 -05:00
Piotr Sobczak
ab174c57f4 [AMDGPU] Add more tests for buffer intrinsics
Add more tests for buffer intrinsics with large voffsets.
2023-02-23 14:39:12 +01:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Diana Picus
da629d3381 [AMDGPU] Add GISel RUN lines to 2 existing tests. NFC
This adds a bit of coverage for GlobalISel.

Differential Revision: https://reviews.llvm.org/D144555
2023-02-23 09:46:54 +01:00
Piotr Sobczak
1b9b4f3bfa [AMDGPU][NFC] Convert llvm.amdgcn tests to autogen 2023-02-23 08:21:12 +01:00
Konstantina Mitropoulou
944f429b21 [AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics
Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16}
intrinsics are lowered as buffer_load_{u8,u16}. This patch combines
buffer_load_{u8,u16} and sign extension instructions in order to
generate buffer_load_{i8,i16} instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144313
2023-02-22 09:01:33 -08:00
Joe Nash
80a8e6805a [AMDGPU] Don't set src mods on permlane16
v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src
modifiers on any input, but they can set op_sel on src0 or src1 to
represent fi or bc when desired. The ISel patterns were setting
the src_modifier bits to -1, effectively setting abs and neg as well,
whenever it was intended to set op_sel, due to an error in ISel. ISel
should now correctly only set the op_sel bits.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144519
2023-02-22 11:41:52 -05:00
Jessica Del
fc672b6a8b [AMDGPU] Improved wide multiplies
These checks show optimized instructions if an operand is known to be
(partially) zero.

Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a

Differential Revision: https://reviews.llvm.org/D140208
2023-02-22 16:39:06 +01:00
Jay Foad
e2eee902a4 [AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16
D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an
assertion failure when trying to fold into src2 of V_FMAC_F16. It would
temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel
operand, but if the fold still failed then it would forget to remove the
opsel operand.

Differential Revision: https://reviews.llvm.org/D144558
2023-02-22 14:26:03 +00:00
Jessica Del
c9fd858172 [AMDGPU] MIR-Tests for Multiplication using KBA
These tests show inefficient behavior that will be optimized by a
later change.

By using Known Bits Analysis, we can avoid unnecessary multiplications
or additions with 0.
2023-02-21 14:47:56 +01:00
pvanhout
8e68c12045 [AMDGPU] Remove function with incompatible features
Adds a new pass that removes functions
if they use features that are not supported on the current GPU.

This change is aimed at preventing crashes when building code at O0 that
uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();`
where ISA_VERSION is not constexpr, and intrinsic_a is not selectable
on older targets.
This is a pattern that's used all over the ROCm device libs. The main
motive behind this change is to allow code using ROCm device libs
to be built at O0.

Note: the feature checking logic is done ad-hoc in the pass. There is no other
pass that needs (or will need in the foreseeable future) to do similar
feature-checking logic so I did not see a need to generalize the feature
checking logic yet. It can (and should probably) be generalized later and
moved to a TargetInfo-like class or helper file.

Reviewed By: arsenm, Joe_Nash

Differential Revision: https://reviews.llvm.org/D139000
2023-02-21 10:42:39 +01:00
Jessica Del
959216f9b1 [AMDGPU] MIR-Tests for Multiplication using KBA
These tests show inefficient behavior that will be optimized by a
later change.

By using Known Bits Analysis, we can avoid unnecessary multiplications
or additions with 0.
2023-02-21 08:41:56 +01:00
Konstantina Mitropoulou
a0e258da19 [AMDGPU] Add tests for future commit
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144312
2023-02-20 21:36:25 -08:00
Tiwari Abhinav Ashok Kumar
bfb1559fbe [NFC] Fix missing colon in CHECK directives
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144412
2023-02-21 00:13:04 +05:30
Matt Arsenault
c177051f60 AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods
Provides a small code size savings for some f32 cases.
2023-02-19 22:05:54 -04:00
Matt Arsenault
28d8889d27 AMDGPU: Teach fneg combines that select has source modifiers
We do match source modifiers for f32 typed selects already, but the
combiner code was never informed of this.

A long time ago the documentation lied and stated that source
modifiers don't work for v_cndmask_b32 when they in fact do. We had a
bunch fo code operating under the assumption that they don't support
source modifiers, so we tried to move fnegs around to work around
this.

Gets a few small improvements here and there. The main hazard to watch
out for is infinite loops in the combiner since we try to move fnegs
up and down the DAG. For now, don't fold fneg directly into select.
The generic combiner does this for a restricted set of cases
when getNegatedExpression obviously shows an improvement for both
operands. It turns out to be trickier to avoid infinite looping the
combiner in conjunction with pulling out source modifiers, so
leave this for a later commit.
2023-02-19 20:13:38 -04:00
Amara Emerson
b309bc04ee [GlobalISel] Combine out-of-range shifts to undef.
Differential Revision: https://reviews.llvm.org/D144303
2023-02-17 15:05:00 -08:00
Nick Desaulniers
a3a84c9e25 [llvm] add CallBrPrepare pass to pipelines
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: https://github.com/llvm/llvm-project/issues/53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
2023-02-16 17:58:34 -08:00
Jay Foad
8a17cd9905 AMDGPU: Add a regression test case for D143963 2023-02-16 17:11:32 +00:00
Jay Foad
8e5a41e827 Revert "AMDGPU: Override getNegatedExpression constant handling"
This reverts commit 11c3cead23783e65fb30e673d62771352078ff05.

It was causing infinite loops in the DAG combiner.
2023-02-16 17:11:32 +00:00
Jay Foad
9305b63d69 [AMDGPU] Add another G_UNMERGE_VALUES legalization test case 2023-02-16 16:45:35 +00:00
Florian Hahn
2ac85cd563
[AMDGPU] Regenerate check lines to enable updating for D144050. 2023-02-16 16:38:15 +00:00
Diana Picus
819dfc338b [AMDGPU] Autogenerate checks for several tests. NFCI 2023-02-16 10:54:34 +01:00
Matt Arsenault
11c3cead23 AMDGPU: Override getNegatedExpression constant handling
Ignore the multiple use heuristics of the default
implementation, and report cost based on inline immediates. This
is mostly interesting for -0 vs. 0. Gets a few small improvements.
fneg_fadd_0_f16 is a small regression. We could probably avoid this
if we handled folding fneg into div_fixup.
2023-02-15 05:21:00 -04:00
Stanislav Mekhanoshin
12b4f9e2af [AMDGPU] Do not apply schedule metric for regions with spilling
D139710 has added a metric to increase schedule's ILP while
staying within the same occupancy. Do not bother to apply this
metric to a region which is known to have spilling, it may result
in spilling to reappear after the previous stage and will do no
good if we already spilling anyway. It may also reduce compile
time a bit for such regions.

Fixes: SWDEV-377300

Differential Revision: https://reviews.llvm.org/D143934
2023-02-14 12:16:46 -08:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Matt Arsenault
f3c008ca77 DAG: Relax foldBitcastedFPLogic conditions
Requiring a bitcast to exist was unhelpful. The most basic cases
are always going to be a CopyFromReg or load, so they would need
a new cast inserted. Don't require a bitcast if it's a free
operation. I don't think this logic makes particularly much sense
(it seems to be imparting special interpretation of bitcast), but
this needs to be in sync with foldSignChangeInBitcast.

We should also get rid of this hasBitPreservingFPLogic hook. fabs/fneg
are bitpreserving or incorrectly implemented, so this should just be a
regular legality check.
2023-02-14 07:59:10 -04:00
Matt Arsenault
4f0eb57222 AMDGPU: Teach getNegatedExpression about rcp 2023-02-14 04:02:39 -04:00
Matt Arsenault
ce4b719f33 AMDGPU: Add test for getNegatedExpression with rcp 2023-02-14 04:02:39 -04:00
Matt Arsenault
0a669bd894 AMDGPU: Add additional tests for combiner infinite loop 2023-02-14 04:02:38 -04:00
pvanhout
04f6934589 [DAG] Handle build_vector with all undefs in reduceBuildVecTruncToBitCast
While working on D143731 I hit a case where a build_vector with 2 undef operands could be generated (with one undef hidden behind a bitcast).
That made `reduceBuildVecTruncToBitCast` crash because it seems to assume there is at least one good operand.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D143886
2023-02-14 08:52:28 +01:00
Arthur Eubanks
7c6b46e87e Revert "[DAGCombiner] handle more store value forwarding"
This reverts commit f35a09daebd0a90daa536432e62a2476f708150d.

Causes miscompiles, see D138899
2023-02-13 19:07:28 -08:00
Christudasan Devadasan
1c9e6238fe [AMDGPU] Allow architected SGPRs for workgroup IDs
Some subtargets use architected SGPRs for workgroup
IDs instead of the regular SGPRs. This patch enables
the support for the same and is guarded under the
subtarget feature FeatureArchitectedSGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D143707
2023-02-13 22:11:35 +05:30
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Pierre van Houtryve
70924673af [RFC][GISel] Add a way to ignore COPY instructions in InstructionSelector
RFC to add a way to ignore COPY instructions when pattern-matching MIR in GISel.
    - Add a new "GISelFlags" class to TableGen. Both `Pattern`  and `PatFrags` defs can use it to alter matching behaviour.
    - Flags start at zero and are scoped: the setter returns a `SaveAndRestore` object so that when the current scope ends, the flags are restored to their previous values. This allows child patterns to modify the flags without affecting the parent pattern.
    - Child patterns always reuse the parent's pattern, but they can override its values. For more examples, see `GlobalISelEmitterFlags.td` tests.
    - [AMDGPU] Use the IgnoreCopies flag in BFI patterns, which are known to be bothered by cross-regbank copies.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136234
2023-02-10 08:37:42 +01:00
Pierre van Houtryve
d9a6fc82f5 [AMDGPU] Run unmerge combines post regbankselect
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which
left us with a lot of unmerge/merge pairs that could be simplified.
These often got in the way of pattern matching and made codegen
worse.

This patch:
  - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect
  - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU
  - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).

This seems to be mostly beneficial for code quality.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D142192
2023-02-10 08:34:23 +01:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
Mirko Brkusanin
43924cbd29 [AMDGPU][GlobalISel] Fix selection of image sample g16 instructions
Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are
separate instructions for 16bit gradients. This fixes the condition for
selecting G16 opcodes. Also stop adding G16 flag to instructions that do not
use gradients for GFX10 onwards.
2023-02-09 16:26:55 +01:00
Stanislav Mekhanoshin
94def1b44e [AMDGPU] Do not exapnd fp atomics on gfx940
FP atomics are safe on gfx940. This fixes regression after D131560.

Fixes: SWDEV-380468

Differential Revision: https://reviews.llvm.org/D143603
2023-02-08 13:22:04 -08:00
Stanislav Mekhanoshin
3e9f2af27a [AMDGPU] Update atomic tests. NFC.
This is to precommit tests before future patch.
2023-02-08 12:55:34 -08:00
Jay Foad
805da0e298 [AMDGPU] Fix some LABEL check lines 2023-02-06 15:53:13 +00:00
Jay Foad
785cc4d71a [AMDGPU] Fix DOS line endings in some tests 2023-02-06 15:53:13 +00:00
Ruiling Song
be3f4591af AMDGPU: Mark control flow intrinsics non-duplicable
This is used to help get simplified CFG for divergent regions as well as
get better code generation in some cases.

For example, with below IR:
```
define amdgpu_kernel void @test() {
bb:
  br label %bb1

bb1:
  %tmp = phi i32 [ 0, %bb ], [ %tmp5, %bb4 ]
  %tid = call i32 @llvm.amdgcn.workitem.id.x()
  %cnd = icmp eq i32 %tid, 0
  br i1 %cnd, label %bb4, label %bb2

bb2:
  %tmp3 = add nsw i32 %tmp, 1
  br label %bb4

bb4:
  %tmp5 = phi i32 [ %tmp3, %bb2 ], [ %tmp, %bb1 ]
  store volatile i32 %tmp5, ptr addrspace(1) undef
  br label %bb1
}
```

We got below assembly before the change:
```
  v_mov_b32_e32 v1, 0
  v_cmp_eq_u32_e32 vcc, 0, v0
  s_branch .LBB0_2
.LBB0_1:                                ; %bb4
                                        ;   in Loop: Header=BB0_2 Depth=1
  s_mov_b32 s2, -1
  s_mov_b32 s3, 0xf000
  buffer_store_dword v1, off, s[0:3], 0
  s_waitcnt vmcnt(0)
.LBB0_2:                                ; %bb
                                        ; =>This Inner Loop Header: Depth=1
  s_and_saveexec_b64 s[0:1], vcc
  s_xor_b64 s[0:1], exec, s[0:1]
                                        ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 killed $exec
  s_cbranch_execnz .LBB0_1
; %bb.3:                                ; %bb2
                                        ;   in Loop: Header=BB0_2 Depth=1
  s_or_b64 exec, exec, s[0:1]
  s_waitcnt expcnt(0)
  v_add_i32_e64 v1, s[0:1], 1, v1
  s_branch .LBB0_1
```

After the change:
```
  s_mov_b32 s0, 0
  v_cmp_eq_u32_e32 vcc, 0, v0
  s_mov_b32 s2, -1
  s_mov_b32 s3, 0xf000
  v_mov_b32_e32 v0, s0
  s_branch .LBB0_2
.LBB0_1:                                ; %bb4
                                        ;   in Loop: Header=BB0_2 Depth=1
  buffer_store_dword v0, off, s[0:3], 0
  s_waitcnt vmcnt(0)
.LBB0_2:                                ; %bb1
                                        ; =>This Inner Loop Header: Depth=1
  s_and_saveexec_b64 s[0:1], vcc
  s_cbranch_execnz .LBB0_1
; %bb.3:                                ; %bb2
                                        ;   in Loop: Header=BB0_2 Depth=1
  s_or_b64 exec, exec, s[0:1]
  s_waitcnt expcnt(0)
  v_add_i32_e64 v0, s[0:1], 1, v0
  s_branch .LBB0_1
```

We are using one less VGPR, one less s_xor_, and better LICM with one
additional branch after the change. Please note the experiment
was done with reverting the workaround D139780, as it will stop the
tail-duplication completely for this case.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D118250
2023-02-06 15:32:44 +08:00
Stanislav Mekhanoshin
dd0caa82de [AMDGPU] Fix liveness in the SIOptimizeExecMaskingPreRA.cpp
If a condition register def happens past the newly created use
we do not properly update LIS. It has two problems:

1) We do not extend defining segment to the end of its block
   marking it a live-out (this is regression after
   https://reviews.llvm.org/rG09d38dd7704a52e8ad2d5f8f39aaeccf107f4c56)

2) We do not extend use segment to the beginning of the use block
   marking it a live-in.

Fixes: SWDEV-379563

Differential Revision: https://reviews.llvm.org/D143302
2023-02-05 12:21:28 -08:00
Matt Arsenault
6ce86a7eff AMDGPU: Ensure flat loads are broken into dword in functions
We were assuming we could rely on the flat scratch init detection
to imply if there are possible flat addressed stack objects, which
doesn't work outside of a kernel. We should have a way to prove
if a given flat access can't access the stack.

We could use a not-stack parameter attribute to avoid
these splits.

Make the minimally correct change for GlobalISel; I'll address
this better in my larger patch to rewrite load and store legalization.

Fixes: SWDEV-218237
2023-02-05 05:25:15 -04:00
Matt Arsenault
d53699cc45 AMDGPU: Add some regression tests that infinite looped combiner
Prevent a future patch from introducing an infinite combine loop.
2023-02-04 08:21:18 -04:00
Matt Arsenault
a7fad92ba8 AMDGPU: Add more tests to fneg modifier with casting tests 2023-02-03 07:28:47 -04:00
Changpeng Fang
54cf69c9d5 AMDGPU: Use module flag to get code object version at IR level
Summary:
  This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
  https://reviews.llvm.org/D14313
2023-02-02 18:57:26 -08:00
Matt Arsenault
0f8b3b97fd AMDGPU: Add additional tests for is.fpclass legalization 2023-02-02 22:50:23 -04:00
Matt Arsenault
3a01b4a93d AMDGPU: Regenerate test checks
Use right prefix order to get merging.

Also drop -verify-machineinstrs and add -amdgpu-enable-delay-alu=0
2023-02-02 22:50:23 -04:00