407 Commits

Author SHA1 Message Date
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Petar Avramovic
83fe85115d
AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256)
Use i32 for offset instead of i16, this way it does not get interpreted
as negative 16 bit offset.
2024-09-30 10:44:59 +02:00
Jay Foad
73b8074e68
[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414) 2024-09-20 13:45:04 +01:00
Nikita Popov
cee0bf9626
[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413) 2024-09-20 14:35:38 +02:00
Diana Picus
3356208531
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
7792b4ae79.

The problem was a conflict with
e55d6f5ea2
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.
2024-09-13 11:54:30 +02:00
Diana Picus
7792b4ae79
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
2024-09-12 10:12:09 +02:00
Diana Picus
703ebca869
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
c7a7767fca.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
2024-09-12 09:11:41 +02:00
Vitaly Buka
c7a7767fca
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.

Reverts llvm/llvm-project#105822
2024-09-10 09:51:43 -07:00
Diana Picus
44556e64f2
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
  %func_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %func_exec, label %func, label %tail

func:
  ; ... stuff that should run with the actual EXEC mask
  br label %tail

tail:
  ; ... stuff that runs with all the lanes enabled;
  ; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
2024-09-10 13:24:53 +02:00
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Simon Pilgrim
11ba72e651
[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468) 2024-08-12 10:21:28 +01:00
Matt Arsenault
dd094b2647
NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645)
This was much more difficult than I anticipated. The pass is
not in a good state, with poor test coverage. The legacy PM
does seem to be relying on maintaining the map state between
different SCCs, which seems bad. The pass is going out of its
way to avoid putting the attributes it introduces onto non-callee
functions. If it just added them, we could use them directly
instead of relying on the map, I would think.

The NewPM path uses a ModulePass; I'm not sure if we should be
using CGSCC here but there seems to be some missing infrastructure
to support backend defined ones.
2024-08-11 15:11:10 +04:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Ivan Kosarev
430cf6537b
[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560)
Being of type i8 makes them signed, which they aren't, and requires
extra work masking them on verbalisation.

Part of <https://github.com/llvm/llvm-project/issues/62629>.
2024-07-25 14:32:19 +01:00
Jay Foad
0ce3ea1bff
[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345) 2024-07-18 07:45:13 +01:00
Jay Foad
bf536cc7db
[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190)
Mark both the intrinsic and the selected MachineInstr as having side
effects to prevent MachineLICM and MachineCSE from moving/removing them.
2024-06-27 09:27:52 +01:00
vangthao95
3aef525aa4
[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165)
For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.

New PR of https://github.com/llvm/llvm-project/pull/79553.
2024-06-24 14:18:23 -07:00
Jay Foad
90779fdc19
[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.id (#96167)
Without this SelectionDAG could fail assertions when using the intrinsic
in a non-entry BB.
2024-06-20 12:30:34 +01:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
Jay Foad
990bed64fb
[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612)
This provides access to the special scalar source value
SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.
2024-05-22 19:47:59 +01:00
Jay Foad
6eb9e214b3
RFC: [AMDGPU] Check subtarget features for consistency (#86957)
Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to
check subtarget features for consistency and diagnose any
inconsistencies. To start with, the implementation just checks that
either wavefrontsize32 or wavefrontsize64 is selected.

checkSubtargetFeatures is called at the start of instruction selection.
This is pretty arbitrary. It is just a convenient point at which we have
access to the subtarget that we're going to use for codegenning a
particular function.
2024-05-09 11:37:28 +01:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Noah Goldstein
61c06775c9 [KnownBits] Add API for nuw flag in computeForAddSub; NFC 2024-03-05 12:59:58 -06:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
Martin Wehking
4bf06c16fc
Initialize unsigned integer when declared (#81894)
Initialize ModOpcode directly before the loop execution to silence
static analyzer warnings about the usage of an uninitialized variable.

This leads to a redundant assignment of ElV2F16 inside the first loop
execution, but also avoids superfluous emptiness checks of EltsV2F16
after the first execution of the loop.
2024-02-25 18:26:12 +05:30
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Shilei Tian
9c6a2de24b
[AMDGPU] Clean up functions for checking inline literals (#81282)
This patch cleans up functions for checking inline literals.
2024-02-15 12:11:51 -05:00
Valery Pykhtin
b8025d1482
Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)
Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.
2024-02-02 13:09:25 +01:00
Kazu Hirata
8582d41789 [Target] Use SDValue::getConstantOperandVal (NFC) 2024-01-29 18:46:16 -08:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Jay Foad
ea9d75aa2a [AMDGPU] Misc formatting fixes. NFC. 2024-01-19 13:50:26 +00:00
Jay Foad
c111dc72e9
[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193)
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-01-18 10:02:40 +00:00
Valery Pykhtin
9791e54149
Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)
Reverts llvm/llvm-project#71556

Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289
2024-01-17 14:12:07 +01:00
Jay Foad
4a77414660
[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633) 2024-01-17 10:28:03 +00:00
Valery Pykhtin
57b50ef017
[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556)
Substitute with zero-extended to i64 ballot.i32 intrinsic.
2024-01-17 17:02:05 +07:00
Mirko Brkušanin
2adbf254a1
[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785)
This is used to select the source modifier (neg) from the immediate
operand. After a follow up commit this will no longer be DOTIU specific.

Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
2024-01-12 10:57:24 +01:00
Kazu Hirata
1e05236dbd [Target] Use isNullConstant (NFC) 2024-01-10 20:25:24 -08:00
Alex Bradbury
2d54ec36f7
[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455)
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.

Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
2024-01-09 14:27:07 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Matt Arsenault
460ffcddd9
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.

Depends #76213 #76214
2024-01-04 22:31:18 +07:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Alex Bradbury
a181b42565 [llvm][NFC] Use SDValue::getConstantOperandAPInt(i) where possible
The helper function allows examples like
`cast<ConstantSDNode>(Op.getOperand(0))->getAPIntValue();` to be changed
to `Op.getConstantOperandAPInt(0);`.

See #76708 for further context. Although there are far fewer
opportunities for replacement, I used a similar git grep and sed combo
as before, given I already had it to hand:

`git grep -l "cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getAPIntValue\(\)" | xargs sed -E -i 's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getAPIntValue\(\)/\1->getConstantOperandAPInt(\2)/'`
and
`git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getAPIntValue\(\)" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getAPIntValue\(\)/\1.getConstantOperandAPInt(\2)/'`
2024-01-02 14:43:55 +00:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00