863 Commits

Author SHA1 Message Date
Aditi Medhane
5a8d2dd1f9
[AMDGPU] Handle subregisters properly in generic operand legalizer (#108496)
Fix for the issue found during COPY introduction during legalization of
PHI operands for sgpr to vgpr copy when subreg is involved.
2024-09-18 13:14:49 +05:30
Jay Foad
e55d6f5ea2
[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)
Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
2024-09-11 17:16:06 +01:00
Brox Chen
35e27c0ee5
[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510)
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.

This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.

There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
2024-09-11 10:48:11 -04:00
Jay Foad
7a30b9c0f0
[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186) 2024-09-11 14:55:53 +01:00
Matt Arsenault
a9daad8280
AMDGPU: Update live intervals in convertToThreeAddress (#104610)
Fixes #98741
2024-09-06 18:18:27 +04:00
Stanislav Mekhanoshin
bd840a4004
[AMDGPU] Add target intrinsic for s_prefetch_data (#107133) 2024-09-05 15:14:31 -07:00
Carl Ritson
16cda01d22
[AMDGPU] V_SET_INACTIVE optimizations (#98864)
Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.

Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.
2024-09-05 14:39:28 +09:00
Piyou Chen
b01c006f73
[TII][RISCV] Add renamable bit to copyPhysReg (#91179)
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
2024-08-27 10:08:43 +08:00
Juan Manuel Martinez Caamaño
cbf34a5f77
[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645) 2024-08-23 14:06:17 +02:00
Carl Ritson
fc6300a5f7
[AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395)
Prevent operand folding from inlining constants into pseudo scalar
transcendental f16 instructions.
However still allow literal constants.
2024-08-17 16:52:38 +09:00
Ivan Kosarev
f0fe6c66cb
[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)
And declare it to take an MCRegister.

Also rename related entities and remove a comment for the function that
depending on its purpose is either irrelevant or misleading.
2024-08-14 17:04:15 +01:00
Brox Chen
ae059a1f9f
[AMDGPU][True16][CodeGen] support v_mov_b16 and v_swap_b16 in true16 format (#102198)
support v_swap_b16 in true16 format.
update tableGen pattern and folding for v_mov_b16.

---------

Co-authored-by: guochen2 <guochen2@amd.com>
2024-08-08 16:52:59 -04:00
Matt Arsenault
ca409892c5
AMDGPU: Permit more frame index operands in verifier (#101691)
Treat FI operands more like a register. When it gets materialized,
we will typically need to introduce a scavenged register anyway.
Add baseline tests for folding frame indexes into add/or.
2024-08-05 21:34:58 +04:00
Carl Ritson
62aa596ba1
[AMDGPU] Add no return image_sample intrinsics and instructions (#97542)
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.

This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
2024-07-20 17:26:58 +09:00
Jay Foad
0ce3ea1bff
[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345) 2024-07-18 07:45:13 +01:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jay Foad
f10a78b7e4 [AMDGPU] clang-tidy: use std::make_unique. NFC. 2024-07-17 07:58:09 +01:00
Carl Ritson
befd44bcdc
[AMDGPU] Update hasUnwantedEffectsWhenEXECEmpty (#97982)
Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty.
Add a comment documenting the current expected use of the function.
2024-07-17 11:30:44 +09:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Jay Foad
ff81bbede4 [AMDGPU] Concatenate nested namespaces. NFC. 2024-07-16 15:36:49 +01:00
Matt Arsenault
d8b63b680d
AMDGPU: Don't fold clamp/omod modifiers without nofpexcept (#95950) 2024-06-18 23:55:49 +02:00
Jay Foad
6b4760acc7
[AMDGPU] Make use of composeSubRegIndices. NFCI. (#95548)
Simplify SIInstrInfo::buildExtractSubReg by building one COPY with a
composed subreg index instead of two COPYs.
2024-06-14 15:07:04 +01:00
Carl Ritson
db096adba0
[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133)
This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.

This reverts commit a3646ec1bc662e221c2a1d182987257c50958789.
2024-06-12 15:51:40 +09:00
Matt Arsenault
b263033c2b
AMDGPU: Remove arbitrary SCC liveness scan threshold (#94097) 2024-06-01 14:19:38 +02:00
Matt Arsenault
5f243b3fff
AMDGPU: Generalize instruction shrinking code (#93810)
Try to avoid referring to specific operand names, except in the special
case. The special case for hasNamedOperand(Op32, sdst) seems to have
been dead code.
2024-05-30 19:25:04 +02:00
Emma Pilkington
9e0be65f24
[AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (#91652)
This was breaking the CFG connection between uses of virtual registers
after the trap and their definitions before it. Fixes SWDEV-460384.

Fixes a bug in #85854.
2024-05-22 10:55:19 -04:00
Petar Avramovic
f5e49279c0
AMDGPU: fix isSafeToSink expecting exactly one predecessor (#89224)
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.

Fixes #89200
2024-05-10 13:02:05 +02:00
Stanislav Mekhanoshin
57216f7bd6
[AMDGPU] Support byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f32_bf8 (#90887) 2024-05-02 12:03:51 -07:00
David Stuttard
2914a11e3f
[AMDGPU] Fix hard clausing for image instructions on gfx12 (#90221)
Also updated hard-clauses.mir to have separate versions for gfx11 and
gfx12 since
the MIR instructions are different for each of them.
2024-04-29 11:42:36 +01:00
Emma Pilkington
a04714701f
[AMDGPU] Add a trap lowering workaround for gfx11 (#85854)
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated
as a nop, which means it isn't a correct lowering for the trap
intrinsic. As a workaround, this commit instead lowers the trap
intrinsic to instructions that simulate the behavior of s_trap 2.

Fixes: SWDEV-438421
2024-04-24 09:43:54 -04:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Carl Ritson
d9e6aa7048
[AMDGPU] Update LiveInterval def index for early-clobber (#79285)
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
2024-03-11 14:54:11 +09:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Shilei Tian
e9c1dbb408 Revert "[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345)"
This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks
downstream.
2024-03-06 08:42:54 -05:00
Pierre van Houtryve
52d5b8e02d
[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843)
gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.

Fixes SWDEV-447468
2024-03-06 10:38:20 +01:00
Shilei Tian
530f0e64ec
[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345) 2024-03-04 08:40:42 -05:00
Shilei Tian
46734aa1e5
[AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Corbin Robeck
2d9f350449
[AMDGPU] Consolidate SGPRSpill and VGPRSpill into single Spill bit (#81901)
Follow on to #81525 in the series of consolidating bits in TSFlags.

Merge SGPRSpill and VGPRSpill into single Spill bit

Modify isSGPRSpill and isVGPRSpill helper functions to differentiate
VGPR and SGPR spills:

Spill+SALU=SGPR Spill
Spill+VALU=VGPR Spill

The only exception here is SGPR spills to VGPRs which require an
explicit instruction check.
2024-02-16 13:32:59 -05:00
Konstantin Zhuravlyov
fcef407aa2
AMDGPU/NFC: Remove some bits from TSFlags (#81525)
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
  - Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
  - This flag did not make sense / remnants of something else I think
2024-02-12 16:43:48 -05:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Jay Foad
e89a7c41ba [AMDGPU] Update comment on SIInstrInfo::isLegalFLATOffset for GFX12 2024-01-19 15:53:06 +00:00
Jay Foad
80ccc72ec7
[AMDGPU] Remove GFX12 encoding hack (#78702)
This is no longer needed now that we have implemented GFX12 encoding for
all instructions.
2024-01-19 12:19:29 +00:00
Jay Foad
9ca36932b5
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186) 2024-01-18 10:23:27 +00:00
Ivan Kosarev
2a869ced61
[AMDGPU][True16] Support V_FLOOR_F16. (#78446) 2024-01-18 08:43:47 +00:00
Mirko Brkušanin
1d286ad59b
[AMDGPU] Add mark last scratch load pass (#75512) 2024-01-18 09:36:44 +01:00
Jay Foad
9d8e53818d
[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933)
This avoids listing all soft waitcnt opcodes in two places
(getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for
helpers isWaitcnt and isWaitcntVsCnt.
2024-01-12 17:12:09 +00:00
Ivan Kosarev
084f1c2ee0
[AMDGPU][True16] Support V_CEIL_F16. (#73108)
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.

Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
2024-01-10 08:46:19 +00:00