832 Commits

Author SHA1 Message Date
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Carl Ritson
d9e6aa7048
[AMDGPU] Update LiveInterval def index for early-clobber (#79285)
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
2024-03-11 14:54:11 +09:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Shilei Tian
e9c1dbb408 Revert "[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345)"
This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks
downstream.
2024-03-06 08:42:54 -05:00
Pierre van Houtryve
52d5b8e02d
[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843)
gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.

Fixes SWDEV-447468
2024-03-06 10:38:20 +01:00
Shilei Tian
530f0e64ec
[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345) 2024-03-04 08:40:42 -05:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Corbin Robeck
2d9f350449
[AMDGPU] Consolidate SGPRSpill and VGPRSpill into single Spill bit (#81901)
Follow on to #81525 in the series of consolidating bits in TSFlags.

Merge SGPRSpill and VGPRSpill into single Spill bit

Modify isSGPRSpill and isVGPRSpill helper functions to differentiate
VGPR and SGPR spills:

Spill+SALU=SGPR Spill
Spill+VALU=VGPR Spill

The only exception here is SGPR spills to VGPRs which require an
explicit instruction check.
2024-02-16 13:32:59 -05:00
Konstantin Zhuravlyov
fcef407aa2
AMDGPU/NFC: Remove some bits from TSFlags (#81525)
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
  - Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
  - This flag did not make sense / remnants of something else I think
2024-02-12 16:43:48 -05:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Jay Foad
e89a7c41ba [AMDGPU] Update comment on SIInstrInfo::isLegalFLATOffset for GFX12 2024-01-19 15:53:06 +00:00
Jay Foad
80ccc72ec7
[AMDGPU] Remove GFX12 encoding hack (#78702)
This is no longer needed now that we have implemented GFX12 encoding for
all instructions.
2024-01-19 12:19:29 +00:00
Jay Foad
9ca36932b5
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186) 2024-01-18 10:23:27 +00:00
Ivan Kosarev
2a869ced61
[AMDGPU][True16] Support V_FLOOR_F16. (#78446) 2024-01-18 08:43:47 +00:00
Mirko Brkušanin
1d286ad59b
[AMDGPU] Add mark last scratch load pass (#75512) 2024-01-18 09:36:44 +01:00
Jay Foad
9d8e53818d
[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933)
This avoids listing all soft waitcnt opcodes in two places
(getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for
helpers isWaitcnt and isWaitcntVsCnt.
2024-01-12 17:12:09 +00:00
Ivan Kosarev
084f1c2ee0
[AMDGPU][True16] Support V_CEIL_F16. (#73108)
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.

Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
2024-01-10 08:46:19 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Jay Foad
daa4728dee
[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825) 2024-01-08 19:13:38 +00:00
Nicolai Hähnle
49b492048a
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.

Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.

Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.

Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.

Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.

A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.

Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
2024-01-04 00:10:15 +01:00
Pierre van Houtryve
33565750e4
[AMDGPU] Fix moveToValu for copy to phys SGPRs (#76715)
Fixes #76031
2024-01-02 14:45:33 +01:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Ivan Kosarev
8c6172b0ac
[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440)
Removing the classes requires updating tests and so is planned to be
done with a separate change.
2023-12-28 11:48:25 +00:00
Acim Maravic
48f36c6e74
[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)
Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664

---------

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
2023-12-25 11:55:20 +01:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Mariusz Sikora
a018c8cdbb
GFX12: Add LoopDataPrefetchPass (#75625)
It is currently disabled by default. It will need experiments on a real
HW to tune and decide on the profitability.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-19 08:32:16 +01:00
Stanislav Mekhanoshin
94230ce548
[AMDGPU] Fix lack of LDS DMA check in the AA handling (#75249)
SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks,
but does not account for LDS DMA instructions. Added these checks.
Without it code falls through and returns true which is wrong. As a
result mayAlias would always return false for LDS DMA and a regular LDS
instruction or 2 LDS DMA instructions.

At the moment this is NFCI because we do not use this AA in a context
which may touch LDS DMA instructions. This is also unreacheable now
because of the ordered memory ref checks just above in the function and
LDS DMA is marked as volatile. This volatile marking is removed in PR
#75247, therefore I'd submit this check before #75247.
2023-12-18 10:58:50 -08:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Jay Foad
3e6da3252f
[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499) 2023-12-14 21:11:39 +00:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Jay Foad
220e095a2c [AMDGPU] Remove unused function splitScalar64BitAddSub 2023-12-12 15:01:57 +00:00
Piotr Sobczak
cd138fddf1
[AMDGPU] Turn off clang-format in moveToVALU (#75188) 2023-12-12 15:38:06 +01:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Alex Bradbury
b717365216
[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778)
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).

This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.

As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
2023-12-06 15:30:48 +00:00
Alex Bradbury
6cf3566850
[NFC][MachineScheduler] Rename NumLoads parameter of shouldClusterMemOps to ClusterSize (#73757)
As the same hook is called for both load and store clustering, NumLoads
is a misleading name. Use ClusterSize instead.
2023-11-29 09:47:03 +00:00
Stanislav Mekhanoshin
87d884b5c8
[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709)
We can use inline constants with packed 16-bit operands, but these
should use op_sel. Currently splat of inlinable constants is considered
legal, which is not really true if we fail to fold it with op_sel and
drop the high half. It may be legal as a literal but not as inline
constant, but then usual literal checks must be performed.

This patch makes these splat literals illegal but adds additional logic
to the operand folding to keep current folds. This logic is somewhat
heavy though.

This has fixed constant bus violation in the fdot2 test.
2023-11-28 09:07:26 -08:00
Jay Foad
0d40831765
[AMDGPU] Allow folding to FMAAK with SGPR and immediate operand on GFX10+ (#72266)
Allow foldImmediate to create instructions like:

  v_fmaak_f32 v0, s0, v0, 0x42000000

This instruction has two "scalar values": s0 and 0x42000000. On GFX10+
this is allowed. This fold was originally implemented before the
compiler supported GFX10, when all ASICs were limited to one scalar
value.
2023-11-28 14:36:37 +00:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Jay Foad
be2388c0d9
[AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (#72506)
As explained in the comment in SIInstrInfo::FoldImmediate, if we have a
choice between v_madak_f32 and v_madmk_f32 we should choose the former
so that the literal that is not folded into the instruction can be
materialized in an sgpr instead of a vgpr.
2023-11-16 12:50:26 +00:00
Christudasan Devadasan
ce7fd498ed
[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140)
We adjust the insertion point at the BB top for spills/copies during RA
to ensure they are placed after the exec restore instructions required
for the divergent control flow execution. This is, however, required
only for the vector operations. The insertions for scalar registers can
still go to the BB top.
2023-11-16 10:30:03 +05:30
Jay Foad
1e8c17e9c7
[AMDGPU] Allow folding to FMAMK with SGPR and immediate operand on GFX10+ (#72258)
Allow foldImmediate to create instructions like:

  v_fmamk_f32 v0, s0, 0x42000000, v0

This instruction has two "scalar values": s0 and 0x42000000. On GFX10+
this is allowed. This fold was originally implemented before the
compiler supported GFX10, when all ASICs were limited to one scalar
value.
2023-11-15 10:58:00 +00:00
Jay Foad
0448a1c0dc
[AMDGPU] Simplify commuted operand handling. NFCI. (#71965)
SIInstrInfo::commuteInstructionImpl should accept indices to commute in
either order. This simplifies SIFoldOperands::tryAddToFoldList where
OtherIdx, CommuteIdx0 and CommuteIdx1 are no longer needed.
2023-11-10 21:51:52 +00:00