1667 Commits

Author SHA1 Message Date
Simon Pilgrim
ecb34599bd
[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04 15:20:16 +01:00
Freddy Ye
36b4b9d988
[X86] Support immediate folding for CCMP/CTEST (#86616)
E.g.
%0:gr32 = MOV32ri 81
CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags
=>
CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags
2024-03-28 18:54:32 +08:00
XinWang10
7b766a6f50
[X86] Support APX CMOV/CFCMOV instructions (#82592)
This patch support ND CMOV instructions and CFCMOV instructions.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Ganesh
61fadd0b09
[X86] Fast AVX-512-VNNI vpdpwssd tuning (#85375)
Adding a tuning feature to fix
https://github.com/llvm/llvm-project/issues/84182
Generates vpdpwssd (instead of vpmaddwd + vpaddd sequence)
2024-03-15 16:45:41 +05:30
Simon Pilgrim
1ec5b1f483 [X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names 2024-03-11 13:39:25 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
AtariDreams
3e40c96d89
[X86] Resolve FIXME: Add FPCW as a rounding control register (#82452)
To prevent tests from breaking, another fix had to be made: Now, we
check if the instruction after a waiting instruction is a call, and if
so, we insert the wait.
2024-03-05 08:47:05 +08:00
Simon Pilgrim
448fe73428 [X86] Add X86::getVectorRegisterWidth helper. NFC.
Replaces internal helper used by addConstantComments to allow reuse in a future patch.
2024-02-08 12:42:33 +00:00
Shengchen Kan
e270ec47cd [X86] X86InstrInfo.cpp - Remove dead code for memory folding, NFCI
`commuteInstruction(MI, false, OpNum, CommuteOpIdx2)` should never create
any new instruction, so we don't need to check and erase it.
2024-02-02 11:14:07 +08:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Shengchen Kan
c82a645ef2 [X86][NFC] Simplify the code for memory fold 2024-02-01 13:43:25 +08:00
Shengchen Kan
e3c9327bc4 [X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load
Broadcast of a single float should not be any slower than
loading 32B using vmovaps. So remat it can help reduce
register spill when there is big register pressure.
2024-01-31 20:55:56 +08:00
Kazu Hirata
5d7a0a734a [X86] Use a range-based for loop (NFC) 2024-01-30 22:12:05 -08:00
Shengchen Kan
8e77390c06
[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761) 2024-01-31 12:51:03 +08:00
Shengchen Kan
2960656eb9 [X86][NFC] Extract code for commute in foldMemoryOperandImpl into functions
To share code for folding broadcast in #79761
2024-01-31 00:09:08 +08:00
Shengchen Kan
02a275cca1 [X86][CodeGen] Add entries for TB_BCAST_SH in getBroadcastOpcode 2024-01-30 21:01:31 +08:00
Shengchen Kan
f28430d577 [X86][CodeGen] Add entries for TB_BCAST_W in getBroadcastOpcode and fix typo 2024-01-30 01:03:32 +08:00
Shengchen Kan
169553688c [X86][NFC] Remove TB_FOLDED_BCAST and format code in X86InstrFoldTables.cpp 2024-01-30 00:27:16 +08:00
Shengchen Kan
7089c012ec [X86][NFC] Replace if-else with switch-case in X86InstrInfo::foldMemoryOperandImpl 2024-01-28 10:30:26 +08:00
Shengchen Kan
6754b5428e [X86][NFC] AnalyzeBranchImpl -> analyzeBranchImpl and remove duplicated comments in X86InstrInfo.h 2024-01-28 09:54:31 +08:00
Shengchen Kan
035f33bf41 [X86][CodeGen] Add NDD entries for X86InstrInfo::foldImmediate 2024-01-26 22:11:57 +08:00
Shengchen Kan
550f0eb2ce [NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86 2024-01-26 20:50:58 +08:00
Shengchen Kan
821dee9852 [X86][CodeGen] Add NDD entries for isAssociativeAndCommutative 2024-01-26 18:39:52 +08:00
Shengchen Kan
33ecef9812
[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245)
Reported in 134fcc62786d31ab73439201dce2d73808d1785a

Incorrect opcode is used  b/c there is a `[[fallthrough]]` at line 2386.
2024-01-24 17:10:28 +08:00
Shengchen Kan
71d64ed80f [X86][Peephole] Add NDD entries for EFLAGS optimization 2024-01-24 15:47:58 +08:00
Shengchen Kan
f7b61f81b5
[X86][CodeGen] Transform NDD SUB to CMP if dest reg is dead (#79135) 2024-01-24 13:58:48 +08:00
Anatoly Trosinenko
10bd69a4f7
[MachineOutliner] Refactor iterating over Candidate's instructions (#78972)
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.

This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
2024-01-23 17:21:40 +03:00
Shengchen Kan
66237d647e [X86][CodeGen] Add entries for NDD SHLD/SHRD to the commuteInstructionImpl 2024-01-23 17:05:09 +08:00
Shengchen Kan
134fcc6278 [X86][NFC] Simplify function X86InstrInfo::commuteInstructionImpl 2024-01-23 16:32:32 +08:00
Simon Pilgrim
4e64ed9780 [X86] Update X86::getConstantFromPool to take base OperandNo instead of Displacement MachineOperand
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
2024-01-22 15:40:45 +00:00
XinWang10
dd6fec5d4f
[X86][APX]Support lowering for APX promoted AMX-TILE instructions (#78689)
The enc/dec of promoted AMX-TILE instructions have been supported in
https://github.com/llvm/llvm-project/pull/76210.
This patch support lowering for promoted AMX-TILE instructions and
integrate test to existing tests.
2024-01-22 11:33:23 +08:00
Simon Pilgrim
d12dffacaa [X86] Add X86::getConstantFromPool helper function to replace duplicate implementations.
We had the same helper function in shuffle decode / vector constant code - move this to X86InstrInfo to avoid duplication.
2024-01-18 11:59:46 +00:00
Shengchen Kan
199117ae09 [X86] Fix error: unused variable 'isMemOp' after #78019, NFCI
BTW, I adjust the code by LLVM coding standards.
2024-01-16 13:14:55 +08:00
Jie Fu
d338d15243 [X86] Fix -Wunused-variable in X86InstrInfo.cpp (NFC)
llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:3467:14:
error: unused variable 'isMemOp' [-Werror,-Wunused-variable]
 3467 |   const auto isMemOp = [](const MCOperandInfo &OpInfo) -> bool {
      |              ^~~~~~~
1 error generated.
2024-01-16 11:57:13 +08:00
Nicholas Mosier
855e863004
[X86] Add MI-layer routine for getting the index of the first address operand, NFC (#78019)
Add the MI-layer routine X86::getFirstAddrOperandIdx(), which returns
the index of the first address operand of a MachineInstr (or -1 if there
is none).

X86II::getMemoryOperandNo(), the existing MC-layer routine used to
obtain the index of the first address operand in a 5-operand X86 memory
reference, is incomplete: it does not handle pseudo-instructions like
TCRETURNmi, resulting in security holes in the mitigation passes that
use it (e.g., x86-slh and x86-lvi-load).

X86::getFirstAddrOperandIdx() handles both pseudo and real instructions
and is thus more suitable for most use cases than
X86II::getMemoryOperandNo(), especially in mitigation passes like
x86-slh and x86-lvi-load. For this reason, this patch replaces all uses
of X86II::getMemoryOperandNo() with X86::getFirstAddrOperandIdx() in the
aforementioned mitigation passes.
2024-01-16 10:55:00 +08:00
Kazu Hirata
a041da3109 [X86] Use range-based for loops (NFC) 2023-12-24 15:56:36 -08:00
Simon Pilgrim
bcee4a9363
[X86] Rename VPERMI2/VPERMT2 to VPERMI2*Z/VPERMT2*Z (#75192)
Add missing AVX512 Z prefix to conform to the standard naming convention and simplify matching in X86FoldTablesEmitter::addBroadcastEntry etc.
2023-12-14 09:55:18 +00:00
Arthur Eubanks
843ea98437
[X86] Allow constant pool references under medium code model in X86InstrInfo::foldMemoryOperandImpl() (#75011)
The medium code model assumes that the constant pool is referenceable
with 32-bit relocations.
2023-12-11 19:00:56 -08:00
Arthur Eubanks
687e63a2bd
[X86] Allow accessing large globals in small code model (#74785)
This removes some assumptions that the small code model will only
reference "near" globals.

There are still some missing optimizations and wrong code sequences, but
I'd like to address those separately. This will require auditing any
checks of the code model in the X86 backend.
2023-12-08 11:09:54 -08:00
Matt Arsenault
546a9ce80c
CodeGen: Fix bypassing legality checks for IMPLICIT_DEF rematerialization (#73934)
It's permitted to have extra implicit-def operands of the same main
register
after the main register def. If there are implicit operands, use the
standard
legality checks which verify the operand contents.

Depends #73933
2023-12-06 21:43:19 +07:00
Simon Pilgrim
56eb3e738a
[X86] Set x87 fld1/fldz pseudo instructions as rematerializable (#74592)
No need to generate/spill/restore to cpu stack

Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at #74304
2023-12-06 14:36:42 +00:00
Shengchen Kan
68d6fe508c
[X86][CodeGen] Prefer KMOVkk_EVEX than KMOVkk when EGPR is supported (#74048)
In memory fold table, we have

```
 {X86::KMOVDkk, X86::KMOVDkm, 0},
 {X86::KMOVDkk_EVEX, X86::KMOVDkm_EVEX, 0}
```

where `KMOVDkm_EVEX` can use EGPR as base and index registers, while
`KMOVDkm` can't. Hence, though `KMOVkk` does not have any GPR operands,
we prefer to use `KMOVDkk_EVEX` to help register allocation.

It will be compressed to `KMOVDkk` in EVEX2VEX pass if memory folding
does not happen.
2023-12-02 22:43:02 +08:00
Shengchen Kan
e017169dbd [X86][NFC] Extract ReplaceableInstrs to a separate file and clang-format X86InstrInfo.cpp 2023-12-01 15:21:38 +08:00
Shengchen Kan
511ba45a47
[X86][MC][CodeGen] Support EGPR for KMOV (#73781)
KMOV is essential for copy between k-registers and GPRs.
R16-R31 was added into GPRs in #70958, so we extend KMOV for these new
registers first.

This patch
1.  Promotes KMOV instructions from VEX space to EVEX space
2.  Emits prefix {evex} for the EVEX variants
3. Prefers EVEX variant than VEX variant in ISEL and optimizations for
better RA

EVEX variants will be compressed to VEX variants by existing EVEX2VEX
pass if no EGPR is used.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
TAG: llvm-test-suite && CPU2017 can be built with feature egpr
successfully.
2023-11-30 16:13:51 +08:00
Nick Desaulniers
b053359892
[X86InstrInfo] support memfold on spillable inline asm (#70832)
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.

Link: https://github.com/llvm/llvm-project/issues/20571
2023-11-29 08:18:51 -08:00
Shengchen Kan
bafa51c8a5 [X86] Rename X86MemoryFoldTableEntry to X86FoldTableEntry, NFCI
b/c it's used for element that folds a load, store or broadcast.
2023-11-28 19:49:14 +08:00
Craig Topper
a845061935
[AArch64] Use the same fast math preservation for MachineCombiner reassociation as X86/PowerPC/RISCV. (#72820)
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
    
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.

Fixes #72777.
2023-11-22 14:17:45 -08:00
Alex Bradbury
5b3eb1bc22
[ARM][X86][NFC] Use lambda to avoid duplicate switches in areLoadsFromSameBasePtr (#72376)
Both the Arm and X86 implementations of areLoadsFromSameBasePtr use a
switch over the machine opcode, and repeat the same logic for both
SDNode operands. We can avoid the duplicated logic (especially lengthy
in the X86 case) by just using a lambda. This could obviously be a
candidate for moving out to a separate helper function if there were
other users, but I've made the minimal change in this patch.
2023-11-15 12:35:35 +00:00
Shengchen Kan
c9017bc793
[X86] Support EGPR (R16-R31) for APX (#70958)
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR

RFC:

https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

Explanations for some seemingly unrelated changes:

inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
  class in the enum generated by tablegen. This encoding will change
  any time a new register class is added. Since the number is part
  of the input, this means it can become stale.

seh-directive-errors.s:
   R16-R31 makes ".seh_pushreg 17" legal

musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.

This PR is to reland #67702 after #70222 in order to reduce some
compile-time regression when EGPR is not used.
2023-11-09 23:39:40 +08:00