Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by
excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0.
Passes check-libc-amdgcn-amd-amdhsa now.
Distribution of instructions that allowed a redundant S_CMP to be
deleted in check-libc-amdgcn-amd-amdhsa test:
```
S_AND_B32 485
S_AND_B64 47
S_ANDN2_B32 42
S_ANDN2_B64 277492
S_CSELECT_B64 17631
S_LSHL_B32 6
S_OR_B64 11
```
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Extract the core of the instruction rewriting into an implementation
method, and unify the update of live variables / intervals updates in
its caller.
This is intended to help make future changes to three-address conversion
more robust.
S_PACK_XX_B32_B16 requires special lowering for true16 mode when it's
being lowered to VALU in fix-sgpr-copy pass.
Added test cases in fix-sgpr-copies-f16-true16.mir
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.
This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but
the src operand is in different index.
Use the correct src index for v_mov_b16_t16.
- Remove one-line wrappers around a simple function call when they're
only used once or twice.
- Move very generic helpers into SIInstrInfo
- Delete unused functions
The goal is simply to reduce the noise in SIInsertWaitCnts without
hiding functionality. I focused on moving trivial helpers, or helpers
with very descriptive/verbose names (so it doesn't hide too much logic
away from the pass), and that have some reusability potential.
I'm also trying to make the code style more consistent. It doesn't make
sense to see a function call `TII->isXXX` then suddenly call a random
`isY` method that just wraps around `TII->isY`.
The context of this work is that I'm trying to learn how this pass
works, and while going through the code I noticed some little things
here and there that I thought would be good to fix.
This patch includes:
1. fma_mix inst takes fp16 type as input, but place the operand in
vgpr32. Update selector to insert vgpr32 for true16 mode if necessary.
2. fma_mix inst returns fp16 type as output, but place the vdst in
vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it
to mix_lo/hi in the mc lowering pass.
These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve
code quality
.. to isReMaterializableImpl. The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.
Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
The manual legalizeOperands code only need to consider cases that
require full instruction context to know if the operand is legal.
This does not need to handle basic operand register class constraints.
The S_NOP instruction has an immediate operand which is one less than
the number of cycles to delay for. The maximum value that may be encoded
in this field was increased in GFX8 and again in GFX12.
I have no idea why this was here. MIMG atomics use tied operands
for the input and output, so AV classes should have always worked.
We have poor test coverage for AGPRs with atomics, so add a partial
set. Everything seems to work OK, although it seems image cmpswap
always uses VGPRs unnecessarily.
Find a common subclass instead of directly checking for a subclass
relationship. This fixes folding logic for unaligned register defs
into aligned use contexts. e.g., a vreg_64 def into an av_64_align2
use should be able to find the common subclass vreg_align2. This
avoids regressions in future patches.
Checking the subclass was also redundant on the subregister path;
getMatchingSuperRegClass is sufficient.
Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.
We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).
Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
This should not require aligned registers. Fixes expensive_checks
test failure. I don't see a better way until the new system
to specify the alignment per register is done.
This loop over all the operands in the MachineInstr will eventually
go past the end of the MCInstrDesc's explicit operands. We don't
need the instr desc to compute the constant bus usage, just the
register and whether it's implicit or not. The check here is slightly
conservative. e.g. a random vcc implicit use appended to an instruction
will falsely report a constant bus use.
Dead VGPR->SGPR copies were converted to IMPLICIT_DEF assignments that
were unused. Prevent these from being created and update the numerous
affected tests.
This was not checking the alignment requirement for 64-bit
operands which accept inline immediates. Not all custom operand
types were handled in the switch, so round out with explicit
handling of all enum values, and change the default to use
the default checks for unhandled cases.
Fixes#155095
Generalize the code over the properties of the mov instruction,
rather than maintaining parallel logic to figure out the type
of mov to use. I've maintained the behavior with 16-bit physical
SGPRs, though I think the behavior here is broken and corrupting
any value that happens to be live in the high bits. It just happens
there's no way to separately write to those with a real instruction
but I don't think we should be trying to make assumptions around
that property.
This is NFC-ish. It now does a better job with imm pseudos which
practically won't reach here. This also will make it easier
to support more folds in a future patch.
I added a couple of new tests with 16-bit extract of 64-bit sources.