duplicating mc test, and updating proper flag for true16 and fake16 test
file for vop3 instructions. This is preparing for the up-coming VOP3
true16 changes
Some flat instructions have an saddr operand. When 'null' is provided as
saddr, it may have the same encoding as another instruction. For
example, the instructions 'global_atomic_add v1, v2, null' and
'global_atomic_add v[1:2], v2, off' have the same encoding. This patch
disallows having null as saddr.
Support true16 and fake16 format for more VOP1 instructions in MC
This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
V_CVT_F16_U16
V_CVT_F16_I16
V_CVT_U16_F16
V_CVT_I16_F16
V_CVT_NORM_U16_F16
V_CVT_NORM_I16_F16
V_FREXP_EXP_I16_F16
Since this patch introduce fake16 instructions for V_CVT_F16_U16, it
address an issue in fix-sgprs-copy-f16 test which is brought up here
https://github.com/llvm/llvm-project/pull/104510#discussion_r1742499668
This adds support for disassembler for the new `try_table` instruction.
This adds tests for `throw` and `throw_ref` as well.
Currently tag expressions are not supported for `throw` or `try_table`
instruction when instructions are parsed from the disassembler. Not sure
whether there is a way to support it. (This is not a new thing for the
new EH proposal; it has not been supported for the legacy EH as well.)
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.
This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.
There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
In the legacy space, if both the 66 prefix and REX.W=1 are present, the
REX.W=1 takes precedence and makes OSIZE=64b. EVEX map 4 inherits this
convention, with EVEX.pp=01 and EVEX.W playing the roles of the 66
prefix and REX.W. So if EVEX.pp=00, the OSIZE can only be 64b or 32b,
depending on whether EVEX.W=1 or not. But if EVEX.pp=01, then OSIZE is
either 64b or 16b depending on whether EVEX.W=1 or not.
This adds a check that all ExtensionWithMArch which are marked as
implied features for an architecture are also present in the list of
default features. It doesn't make sense to have something mandatory but
not on by default.
There were a number of existing cases that violated this rule, and some
changes to which features are mandatory (indicated by the Implies
field).
This resulted in a bug where if a feature was marked as `Implies` but
was not added to `DefaultExt`, then for `-march=base_arch+nofeat` the
Driver would consider `feat` to have never been added and therefore
would do nothing to disable it (no `-target-feature -feat` would be
added, but the backend would enable the feature by default because of
`Implies`). See
clang/test/Driver/aarch64-negative-modifiers-for-default-features.c.
Note that the processor definitions do not respect the architecture
DefaultExts. These apply only when specifying `-march=<some architecture
version>`. So when a feature is moved from `Implies` to `DefaultExts` on
the Architecture definition, the feature needs to be added to all
processor definitions (that are based on that architecture) in order to
preserve the existing behaviour. I have checked the TRMs for many cases
(see specific commit messages) but in other cases I have just kept the
current behaviour and not tried to fix it.
Support AND/OR/XOR true16 and LDEXP true/fake16 format.
These instructions are previously implemented with fake16 profile.
Fixing the implementation.
Added a RA hint so that when using 16bit register in a 32bit
instruction, try to use the register directly without an extra 16bit
move
---------
Co-authored-by: guochen2 <guochen2@amd.com>
Eliminates the need to replicate the same instructions in MC and
MC/Disassembler tests and synchronize changes in them. Also highlights
differences between disassembled, reassembled and original instructions.
Processor definition shall not include a default feature which may be
switched off by a different wave size. This allows not to write
-mattr=-wavefrontsize32,+wavefrontsize64 in tests.
This PR adds a number of thus-far missing extended mnemonics to the
assembler and disassembler for SystemZ.
The following mnemonics have been added and are supported for the
assembler and disassembler:
- `NOP(R)?`
- `LFI`
- `RISBG(N)?Z`
The following mnemonics have been added and are supported for the
assembler only:
- `JC(TH)?`
- `LLG(F|H)I`
- `NOT(G)?R`
But since SVE and friends have been added to the default extensions
list, and every CPU was opted into those extensions by default, we
couldn't correctly announce its architecutral version to the backend.
Additionally, we FEAT_MEC from llvm's "required" list for v9.0 to the
optional list for v9.2, as the spec considers it optional, and M4 does
not implement it. Similarly, fixes up several bugs w.r.t. FEAT_RME.
As a drive-by, I noticed that saphira did not have an
AArch64CPUTestParams entry, and thus added one.
Add MC layer support for new instructions:
GLOBAL_LOAD_BLOCK
GLOBAL_STORE_BLOCK
SCRATCH_LOAD_BLOCK
SCRATCH_STORE_BLOCK
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
Some tools generate such instructions with the FORMAT field set to 0,
which corresponds to buf_fmt_invalid, but that should not prevent them
from being recognised on decoding.
This adds support for `prefetcha` instruction for prefetching from
alternate address spaces.
Reviewers: jrtc27, brad0, rorth, s-barannikov
Reviewed By: s-barannikov
Pull Request: https://github.com/llvm/llvm-project/pull/94250
This adds named tag constants (such as `#one_write` and `#one_read`)
for the prefetch instruction.
Reviewers: jrtc27, rorth, brad0, s-barannikov
Reviewed By: s-barannikov
Pull Request: https://github.com/llvm/llvm-project/pull/94249
Current, if an image_atomic instruction has the 'tfe' operand, the
llvm-mc assembler in general would reject it. The only exception is when
dmask is 0x1 and the instruction is not image_atomic_cmpswap (e.g.,
image_atomic_add v[5:6], v252, s[8:15] dmask:0x1 tfe). This patch fixes
this problem and allows tfe to be specified in image_atomic
instructions.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>