Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.
This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.
There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.
Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.
This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
And declare it to take an MCRegister.
Also rename related entities and remove a comment for the function that
depending on its purpose is either irrelevant or misleading.
Treat FI operands more like a register. When it gets materialized,
we will typically need to introduce a scavenged register anyway.
Add baseline tests for folding frame indexes into add/or.
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.
This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.
This reverts commit a3646ec1bc662e221c2a1d182987257c50958789.
Try to avoid referring to specific operand names, except in the special
case. The special case for hasNamedOperand(Op32, sdst) seems to have
been dead code.
This was breaking the CFG connection between uses of virtual registers
after the trap and their definitions before it. Fixes SWDEV-460384.
Fixes a bug in #85854.
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
Fixes#89200
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated
as a nop, which means it isn't a correct lowering for the trap
intrinsic. As a workaround, this commit instead lowers the trap
intrinsic to instructions that simulate the behavior of s_trap 2.
Fixes: SWDEV-438421
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
Follow on to #81525 in the series of consolidating bits in TSFlags.
Merge SGPRSpill and VGPRSpill into single Spill bit
Modify isSGPRSpill and isVGPRSpill helper functions to differentiate
VGPR and SGPR spills:
Spill+SALU=SGPR Spill
Spill+VALU=VGPR Spill
The only exception here is SGPR spills to VGPRs which require an
explicit instruction check.
- AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags
- Moved to helper function in SIInstInfo
- AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags
- This flag did not make sense / remnants of something else I think
This avoids listing all soft waitcnt opcodes in two places
(getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for
helpers isWaitcnt and isWaitcntVsCnt.
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.
Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.