This makes them more consistent with the checks performed by regular loads. We can't simply add IsNonExtLoad to the existing atomic_load_8/16/32/64 as that would affect out of tree targets.
isAnyExtLoad/isZExtLoad/isSignExtLoad are able to emit predicate checks
from tablegen now so we should use them.
The next step would be to add isNonExtLoad versions and migrate all
remaining uses of atomic_load_8/16/32/64 to that.
According to the shader manual there are not buffer load lds
instructions of gfx11.
The tests for the regular `buffer_load ... lds` instructions for gfx11
are already present in AMDGPU/gfx11_asm_mubuf.s, where the compiler
fails to encode the instructions for this target.
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
These properties are only valid on ComplexPatterns. Having them as flags
is more convenient because one can now use "let = ... in" syntax to set
these flags on several patterns at a time. This is also less error-prone
as it makes it impossible to specify these properties on records derived
from SDPatternOperator.
Pull Request: https://github.com/llvm/llvm-project/pull/119599
Atomic loads are handled differently from the DAG, and have separate opcodes
and explicit control over the extensions, like ordinary loads. Add
new patterns for these.
There's room for cleanup and improvement. d16 cases aren't handled.
Fixes#111645
This patch copies the flag isConvergent from pseudo instructions to the
corresponding real instructions, so that isConvergent flag is also
defined for real instructions.
Flags are not required by the compiler, but for consistency it would be
nice to have them.
Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
Define subtarget features for atomic fmin/fmax support.
The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.
gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.
SubtargetPredicate should be the primary "does this instruction exist"
predicate, with OtherPredicates used for other side pieces of information.
Changes like 856d1c4410 were backwards. The problematic usage is how
GFX12 is using HasRestrictedOffset. The multiclasses for buffers
should probably be split up instead of hiding OtherPredicates inside
the buffer atomic multiclasses. The two cases are mutually exclusive
and really need a negated predicate for the not-gfx12 case.
It's pretty terrible we have to manage this in the first place.
TableGen should be able to figure out the required predicates
from any instructions that appear in the pattern output.
The global/flat/buffer atomic fmin/fmax situation is a mess. These
instructions have been renamed 3 times. We currently have
separate pseudos defined for the same opcodes with the different names
(e.g. GLOBAL_ATOMIC_MIN_F64 from gfx90a and GLOBAL_ATOMIC_FMIN_X2 from gfx10).
Use the _FMIN versions as the canonical name for the f32 versions. Use the
_MIN_F64 style as the canonical name for the f64 case. This is because
gfx90a has the most sensible names, but does not have the f32 versions.t sho
Wire through the pseudo to use for the instruction properties vs. the assembly
name like in other cases. This will simplify handling of direct atomicrmw selection.
This will simplify directly selecting these from atomicrmw.
These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.
The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.
The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.
If you would hit the unexpected case in these !if trees, you'd get an
error message like "error: Not a known RegisterClass! def VReg_1..."
This can happen when changing code quite indirectly related to these
class definitions. We can use !cond here, which has a builtin facility
to throw an error if no case in the !cond statement is hit.
NFC.
An 8 x i16 raw load was incorrectly using a 64-bit memory type, which
would assert in the MachineMemOperand constructor.
This is preparation for a cleanup which will make the buffer intrinsics
work for all legal types.
Some tools generate such instructions with the FORMAT field set to 0,
which corresponds to buf_fmt_invalid, but that should not prevent them
from being recognised on decoding.
Currently, the tablegen files that generate the instruction definitions
in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit
operands for the architecture-independent pseudo instructions, but not
for the corresponding real instructions. The missing implicit operands
(most prominently: the EXEC mask) do not affect code generation, since
that operates on pseudo instructions, but they are problematic when
working with real instructions, e.g., as a decoding result from the MC
layer.
This patch copies the implicit Defs and Uses from pseudo instructions to
the corresponding real instructions, so that implicit operands are also
defined for real instructions.
Addresses issue #89830.
AMDGPUMnemonicAlias is a MnemonicAlias that inherits from
GCNPredicateControl, so that we can set predicates on the alias the same
way as Instructions.
Use AssemblerPredicate instead of Requires on aliases
NFC.
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
Abstracting out a new base class VBUFFER_Real_gfx12 just highlights that
the only difference between the MUBUF and MTBUF forms is in the handling
of the "format" field.
Define BUF Real instructions with this general pattern for all
architectures (not just GFX11):
multiclass Something_Real_gfx11<...> {
defvar ps = !cast<Pseudo>(NAME);
def _gfx11 : ...;
}
This allows removing a huge amount of repetition in the definitions of
individual Real instructions, where they would have to !cast their own
name to a Pseudo and pass that in as a class argument.