This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
To prevent tests from breaking, another fix had to be made: Now, we
check if the instruction after a waiting instruction is a call, and if
so, we insert the wait.
Broadcast of a single float should not be any slower than
loading 32B using vmovaps. So remat it can help reduce
register spill when there is big register pressure.
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.
This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
The enc/dec of promoted AMX-TILE instructions have been supported in
https://github.com/llvm/llvm-project/pull/76210.
This patch support lowering for promoted AMX-TILE instructions and
integrate test to existing tests.
Add the MI-layer routine X86::getFirstAddrOperandIdx(), which returns
the index of the first address operand of a MachineInstr (or -1 if there
is none).
X86II::getMemoryOperandNo(), the existing MC-layer routine used to
obtain the index of the first address operand in a 5-operand X86 memory
reference, is incomplete: it does not handle pseudo-instructions like
TCRETURNmi, resulting in security holes in the mitigation passes that
use it (e.g., x86-slh and x86-lvi-load).
X86::getFirstAddrOperandIdx() handles both pseudo and real instructions
and is thus more suitable for most use cases than
X86II::getMemoryOperandNo(), especially in mitigation passes like
x86-slh and x86-lvi-load. For this reason, this patch replaces all uses
of X86II::getMemoryOperandNo() with X86::getFirstAddrOperandIdx() in the
aforementioned mitigation passes.
This removes some assumptions that the small code model will only
reference "near" globals.
There are still some missing optimizations and wrong code sequences, but
I'd like to address those separately. This will require auditing any
checks of the code model in the X86 backend.
It's permitted to have extra implicit-def operands of the same main
register
after the main register def. If there are implicit operands, use the
standard
legality checks which verify the operand contents.
Depends #73933
No need to generate/spill/restore to cpu stack
Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at #74304
In memory fold table, we have
```
{X86::KMOVDkk, X86::KMOVDkm, 0},
{X86::KMOVDkk_EVEX, X86::KMOVDkm_EVEX, 0}
```
where `KMOVDkm_EVEX` can use EGPR as base and index registers, while
`KMOVDkm` can't. Hence, though `KMOVkk` does not have any GPR operands,
we prefer to use `KMOVDkk_EVEX` to help register allocation.
It will be compressed to `KMOVDkk` in EVEX2VEX pass if memory folding
does not happen.
KMOV is essential for copy between k-registers and GPRs.
R16-R31 was added into GPRs in #70958, so we extend KMOV for these new
registers first.
This patch
1. Promotes KMOV instructions from VEX space to EVEX space
2. Emits prefix {evex} for the EVEX variants
3. Prefers EVEX variant than VEX variant in ISEL and optimizations for
better RA
EVEX variants will be compressed to VEX variants by existing EVEX2VEX
pass if no EGPR is used.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
TAG: llvm-test-suite && CPU2017 can be built with feature egpr
successfully.
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.
Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
Link: https://github.com/llvm/llvm-project/issues/20571
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
Both the Arm and X86 implementations of areLoadsFromSameBasePtr use a
switch over the machine opcode, and repeat the same logic for both
SDNode operands. We can avoid the duplicated logic (especially lengthy
in the X86 case) by just using a lambda. This could obviously be a
candidate for moving out to a separate helper function if there were
other users, but I've made the minimal change in this patch.
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
Explanations for some seemingly unrelated changes:
inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
class in the enum generated by tablegen. This encoding will change
any time a new register class is added. Since the number is part
of the input, this means it can become stale.
seh-directive-errors.s:
R16-R31 makes ".seh_pushreg 17" legal
musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.
This PR is to reland #67702 after #70222 in order to reduce some
compile-time regression when EGPR is not used.