1728 Commits

Author SHA1 Message Date
Craig Topper
f58bc72759 Revert "[X86][ARM][RISCV][XCore][M68K] Invert the low bit to get the inverse predicate (NFC) (#151748)"
This reverts commit 518703806286c98bac7b84156738839f8bd55bef.

Failing M68k build bot.
2025-08-04 15:24:52 -07:00
AZero13
5187038062
[X86][ARM][RISCV][XCore][M68K] Invert the low bit to get the inverse predicate (NFC) (#151748)
All these platforms defined their predicate in such a way to allow bit
twiddling to get inverse predicates
2025-08-04 14:45:04 -07:00
Simon Pilgrim
f1036d844e
[X86] X86InstrInfo::commuteInstructionImpl - remove (V)BLENDPD/S commutation to (V)MOVSD/S optsize handling (#144051)
Just commute with (V)BLENDPD/S like all other BLEND instructions

This is now handled more generally by the X86FixupInstTuningPass (OptSize fold occurs even without a scheduler model).

First step towards #142972
2025-06-13 12:49:22 +01:00
Simon Pilgrim
054646f335 [X86] commuteInstructionImpl - assert that only MOVSDrr is being commuted to SHUFPDrri
Noticed while preparing for #142972
2025-06-10 08:49:18 +01:00
Daniel Paoliello
a414877a7a
[x64][win] Add compiler support for x64 import call optimization (equivalent to MSVC /d2guardretpoline) (#126631)
This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
2025-05-20 14:48:41 -07:00
Feng Zou
80547cd705
[X86][APX] Fix issues of suppressing APX for relocation (#139285)
1. There is ADD64rm_ND instruction emitted with GOTPCREL relocation.
Handled it in "Suppress APX for relocation" pass and transformed it to
ADD64rm with register operand in non-rex2 register class. The relocation
type R_X86_64_CODE_6_GOTPCRELX will be added later for APX enabled with
relocation.
2. The register class for operands in instruction with relocation is
updated to non-rex2 one in "Suppress APX for relocation" pass, but it
may be updated/recomputed to larger register class (like
GR64_NOREX2RegClass to GR64RegClass). Fixed by not updating the register
class if it's non-rex2 register class and APX support for relocation is
disabled.
3. After "Suppress APX for relocation" pass, the instruction with
relocation may be folded with add NDD instruction to a add NDD
instruction with relocation. The later will be emitted to instruction
with APX relocation type which breaks backward compatibility. Fixed by
not folding instruction with GOTPCREL relocation with NDD instruction.
4. If the register in operand 0 of instruction with relocation is used
in the PHI instruction, it may be replaced with operand 0 of PHI
instruction (maybe EGPR) after PHI elimination and Machine Copy
Propagation pass. Fixed by suppressing EGPR in operand 0 of PHI
instruction to avoid APX relocation types emitted.
2025-05-12 20:56:07 +08:00
Feng Zou
bd6addc032
[X86][APX] Suppress EGPR/NDD instructions for relocations (#136660)
Suppress EGPR/NDD instructions for relocations to avoid APX relocation
types emitted. This is to keep backward compatibility with old version
of linkers without APX support. The use case is to try APX features with
LLVM + old built-in linker on RHEL9 OS which is expected to be EOL in
2032.
If there are APX relocation types, the old version of linkers would
raise "unsupported relocation type" error. Example:
```
$ llvm-mc -filetype=obj -o got.o -triple=x86_64-unknown-linux got.s
$ ld got.o -o got.exe
ld: got.o: unsupported relocation type 0x2b
...

$ cat got.s
...
movq foo@GOTPCREL(%rip), %r16

$ llvm-objdump -dr got.o
...
1: d5 48 8b 05 00 00 00 00       movq    (%rip), %r16
0000000000000005:  R_X86_64_CODE_4_GOTPCRELX    foo-0x4
```
2025-04-29 19:12:59 +08:00
Feng Zou
7a424276de
Revert "[X86][APX] Support peephole optimization with CCMP instruction (#129994)" (#136796)
This reverts commit 7ae75851b2e1570662261c97c13cfc65357c283d.

There is a problem with peephole optimization for CCMP instruction. See
the example below:
C source code:
```
  if (a > 2 || (b && (a == 2))) { … }
```
MIR before peephole optimization:
```
  TEST8rr %21:gr8, %21:gr8, implicit-def $eflags // b
  CCMP32ri %30:gr32, 2, 0, 5, implicit-def $eflags, implicit $eflags // a == 2
  CCMP32ri %30:gr32, 3, 0, 5, implicit-def $eflags, implicit $eflags // a > 2 (transformed to a < 3)
  JCC_1 %bb.6, 2, implicit $eflags
  JMP_1 %bb.3
```
Inputs:
```
  a = 1, b = 0.
```
With the inputs above, the expected behavior is to jump to %bb.6 BB.
After TEST8rr instruction being executed with b(%21) == 0, the ZF bit is
set to 1 in eflags, so the eflags doesn't satisfy SCC condition in the
following CCMP32ri instruction (for a==2 condition) which skips compare
a(%30) with 2 and set flags in its payload to 0x202 (ZF = 0). The eflags
satisfies the SCC condition in the 2nd CCMP32ri instruction which
compares a(%30) with 3. It sets CF to 1 in eflags and the JCC
instruction jumps to %bb.6 BB.

But after adding CCMP support, peephole optimization eliminates the 2nd
CCMP32ri instruction and updates the condition of JCC instruction to
"BE" from "B". With the same inputs, JCC instruction falls through to
the next instruction. It's not expected and the 2nd CCMP32ri should not
be eliminated.
```
  TEST8rr %21:gr8, %21:gr8, implicit-def $eflags // b
  CCMP32ri %30:gr32, 2, 0, 5, implicit-def $eflags, implicit $eflags  // a == 2
  JCC_1 %bb.6, 6, implicit $eflags
  JMP_1 %bb.3
```
2025-04-25 10:55:31 +08:00
Evgenii Kudriashov
db97d56c97
[X86][APX] Handle AND_NF instruction for compare peephole (#136233) 2025-04-19 01:48:40 +02:00
Philip Reames
f2ecd86e34
[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.

We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
2025-04-18 07:46:31 -07:00
Connector Switch
cc354d6a6d
[NFC] Fix destroy typo. (#135640) 2025-04-15 08:20:44 +08:00
Craig Topper
2ec88374e0 [X86] Use MCRegister. NFC 2025-03-29 11:14:06 -07:00
Philip Reames
236f938ef6 [CodeGen] Provide a target independent default for optimizeLoadInst [NFC]
This just moves the x86 implementation into generic code since it appears
to be suitable for any target.  The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API.  This just removes one piece to implement in the
process of enabling foldMemoryOperand.
2025-03-26 08:52:40 -07:00
Daniel Zabawa
5afa0fa9a6
[X86] Prevent APX NDD compression when it creates a partial write (#132051)
APX NDD instructions may be compressed when the result is also a source.
For 8/16b instructions, this may create partial register write hazards
if a previous super-register def is within the partial reg update
clearance, or incorrect code if the super-register is not dead.

This change prevents compression when the super-register is marked as an
implicit define, which the virtual rewriter already adds in the case
where a subregister is defined but the super-register is not dead.

The BreakFalseDeps interface is also updated to add implicit
super-register defs for NDD instructions that would incur partial-write
stalls if compressed to legacy ops.
2025-03-22 00:50:12 +08:00
Philip Reames
4d4d9d5d33
[TTI] Use TypeSize in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#132244)
Motivation is supporting scalable spills and reloads, e.g. in
https://github.com/llvm/llvm-project/pull/120524.

Looking at this API, I'm suspicious that the access size should just be
coming from the memory operand on the load or store, but we don't appear
to be consistently setting that up. That's a larger change so I may or
may not bother pursuing that.
2025-03-20 10:17:36 -07:00
Craig Topper
3fe914c9fa [X86] Use Register and MCRegister. NFC 2025-03-15 23:15:28 -07:00
Craig Topper
86ae25d2be [CodeGen][X86] Use Register in TTI unfoldMemoryOperand interface. NFC 2025-03-15 10:04:54 -07:00
Phoebe Wang
254951749f
[X86][APX] Remove the EFLAGS def operand rather than the last one (#131430)
The last one may be an implict use, e.g.,
`IDIV32r %4:gr32, implicit-def dead $eax, implicit-def $edx,
implicit-def dead $eflags, implicit $eax, implicit $edx`

https://godbolt.org/z/KPKzj5c8K
2025-03-15 16:37:38 +08:00
Craig Topper
6b7daf2249
[MachineCombiner][Targets] Use Register in TII genAlternativeCodeSequence interface. NFC (#131272) 2025-03-13 23:27:56 -07:00
Phoebe Wang
bc4b2c74fe
[X86][APX] Add NF instructions to convertToThreeAddress functions (#130969)
Since #130488, we have NF instructions when converting to three address
instructions.
2025-03-13 13:23:50 +08:00
Phoebe Wang
ad704ff62b
[X86][NF] Switch the order of Inst and &Target.getInstruction(NewRec) (#130739)
Because Inst is ordered by Instruction ID.
2025-03-12 17:35:54 +08:00
Feng Zou
7ae75851b2
[X86][APX] Support peephole optimization with CCMP instruction (#129994)
This extends `opitimizeCompareInstr` to re-use previous CCMP results if
the
previous comparison was with an immediates that was 1 bigger or smaller.
Example:
```
CCMP x, 13, 2, 5
...
CCMP x, 12, 2, 5 ; can be removed if we change the SETg
SETg ...         ; x > 12 changed to SETge (x >= 13) & remove the 2nd
CCMP
```
2025-03-12 09:24:10 +08:00
Phoebe Wang
507e0c3b67
[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488)
https://godbolt.org/z/rWYdqnjjx
2025-03-10 21:08:01 +08:00
Craig Topper
571b787b83
[CodeGen] Change copyPhysReg interface to use Register instead of MCRegister. (#128473)
NVPTX, SPIRV, and WebAssembly pass virtual registers to this function
since they don't perform register allocation. We need to use Register to
avoid a virtual register being converted to MCRegister by the caller.
2025-02-24 09:55:34 -08:00
Matt Arsenault
1f6165e184
X86: Fix convertToThreeAddress losing subregister indexes (#124098)
This avoids dozens of regressions in a future patch. These
primarily manifested as assertions where we had copies of 64-bit
registers to 32-bit registers.

This is testable in principle with hand written MIR, but that's
a bit too much x86 for me.
2025-02-19 01:27:19 +07:00
Craig Topper
27e01d1d74
[X86] Use new Flags argument to storeRegToStackSlot to simplify code. NFC (#124658)
Use the Flags argument to add FrameSetup directly instead of walking
backwards to add the flag after the call.
2025-01-29 09:45:29 -08:00
Simon Pilgrim
90e9895a93
[X86] Handle BSF/BSR "zero-input pass through" behaviour (#123623)
Intel docs have been updated to be similar to AMD and now describe
BSF/BSR as not changing the destination register if the input value was
zero, which allows us to support CTTZ/CTLZ zero-input cases by setting
the destination to support a NumBits result (BSR is a bit messy as it
has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also
been confirmed to match this behaviour.

This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through"
argument for zero-input cases, by default this is set to UNDEF to match
existing behaviour, but it can be set to a suitable value if supported.

There are still some limits to this - its only supported for x86_64
capable processors (and I've only enabled it for x86_64 codegen), and
Intel CPUs sometimes zero the upper 32-bits of a pass through register
when used for BSR32/BSF32 with a zero source value (i.e. the whole
64bits may not get passed through).

Fixes #122004
2025-01-23 12:59:59 +00:00
Venkata Ramanaiah Nalamothu
f7d8336a2f
[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622)
This patch is in preparation to enable setting the MachineInstr::MIFlag
flags, i.e. FrameSetup/FrameDestroy, on callee saved register
spill/reload instructions in prologue/epilogue. This eventually helps in
setting the prologue_end and epilogue_begin markers more accurately.

The DWARF Spec in "6.4 Call Frame Information" says:

The code that allocates space on the call frame stack and performs the
save
operation is called the subroutine’s prologue, and the code that
performs
the restore operation and deallocates the frame is called its epilogue.

which means the callee saved register spills and reloads are part of
prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction),
respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags
to identify instructions that are part of call frame setup and
destruction.

In the trunk, while most targets consistently set
FrameSetup/FrameDestroy on save/restore call frame information (CFI)
instructions of callee saved registers, they do not consistently set
those flags on the actual callee saved register spill/reload
instructions.

I believe this patch provides a clean mechanism to set
FrameSetup/FrameDestroy flags on the actual callee saved register
spill/reload instructions as needed. And, by having default argument of
MachineInstr::NoFlags for Flags, this patch is a NFC.

With this patch, the targets have to just pass FrameSetup/FrameDestroy
flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the
target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters
to set those flags on callee saved register spill/reload instructions.

Also, this patch makes it very easy to set the source line information
on callee saved register spill/reload instructions which is needed by
the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin
markers more accurately.

As per DwarfDebug.cpp implementation:

prologue_end is the first known non-DBG_VALUE and non-FrameSetup
location
    that marks the beginning of the function body

epilogue_begin is the first FrameDestroy location that has been seen in
the
    epilogue basic block

With this patch, the targets have to just do the following to set the
source line information on callee saved register spill/reload
instructions, without hampering the LLVM's efforts to avoid adding
source line information on the artificial code generated by the
compiler.

    <Foo>InstrInfo::storeRegToStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I);
    ...
    }

    <Foo>InstrInfo::loadRegFromStackSlot() {
    ...
      DebugLoc DL =
Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc();
    ...
    }

While I understand this patch would break out-of-tree backend builds, I
think it is in the right direction.

One immediate use case that can benefit from this patch is fixing
#120553 becomes simpler.
2025-01-22 13:36:39 +05:30
Phoebe Wang
9cd774d1e4
[X86][NFC] Move "_Int" after "k"/"kz" (#121450)
Address comment at
https://github.com/llvm/llvm-project/pull/121373#discussion_r1900402932
2025-01-02 21:02:19 +08:00
Simon Pilgrim
29f11f0a32
[X86] Add missing reg/imm attributes to VRNDSCALES instruction names (#117203)
More canonicalization of the instruction names to make the predictable - more closely matches VRNDSCALEP / VROUND equivalent instructions
2024-11-22 17:45:30 +00:00
Phoebe Wang
7b5b01980c
Revert "[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265) (#117089)
This reverts commit 6fb7cdff3d90c565b87a253ff7dbd36319879111.
2024-11-21 09:16:22 +08:00
Simon Pilgrim
3a5cf6d99b
[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826)
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
2024-11-20 08:25:01 +00:00
Simon Pilgrim
7dcefb37a4
[X86] Tidyup up AVX512 FPCLASS instruction naming (#116661)
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
2024-11-19 11:26:46 +00:00
Daniel Zabawa
6fb7cdff3d
[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265)
This code assumed only PUSHes would appear in call sequences. However,
if calls require frame-pointer/base-pointer spills, only the PUSH
operations inserted by spillFPBP will be recognized, and the adjustments
to frame object offsets in prologepilog will be incorrect.

This change correctly reports the SP adjustment for POP and ADD/SUB to
rsp, and an assertion for unrecognized instructions that modify rsp.
2024-11-14 17:20:16 +01:00
Phoebe Wang
08af115d97
Fix mistakes in #113532 (#115631)
Found during review #115151
2024-11-10 12:46:21 +08:00
Kazu Hirata
dfe43bd1ca
[X86] Remove unused includes (NFC) (#115593)
Identified with misc-include-cleaner.
2024-11-09 08:23:46 -08:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Simon Pilgrim
c59ac1a2f6
[X86] Cleanup AVX512 VBROADCAST subvector instruction names. (#108888)
This patch makes the `VBROADCAST***X**` subvector broadcast instructions consistent - the `***X**` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).
2024-09-18 10:34:35 +01:00
Simon Pilgrim
c91f2a259f
[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780)
We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.
2024-09-17 08:57:57 +01:00
Simon Pilgrim
614a064cac
[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names (#108593)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:42:13 +01:00
Simon Pilgrim
ba8e4246e2
[X86] Add missing immediate qualifier to the (V)INSERTPS instruction names (#108568)
Matches (V)BLENDPS etc and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:27:36 +01:00
Kyungwoo Lee
93b8d07a75
[MachineOutliner][NFC] Refactor (#105398)
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.

- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.

This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-08-27 14:38:36 -07:00
Piyou Chen
b01c006f73
[TII][RISCV] Add renamable bit to copyPhysReg (#91179)
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
2024-08-27 10:08:43 +08:00
Temperatureblock
db3c3fc90a
Simple check to ignore Inline asm fwait insertion (#101686)
Just a simple check to ignore Inline asm fwait insertion

Fixes #101613
2024-08-12 22:36:58 +08:00
Phoebe Wang
b0329206db
[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions (#101783)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 18:57:42 +08:00
Shengchen Kan
50cf413426 [X86,CodeGen] Return the correct condition code for SETZUCC
llvm-issue: https://github.com/llvm/llvm-project/issues/101288
2024-07-31 14:09:08 +08:00
Pengcheng Wang
ed4e75d5e5
[CodeGen] Remove AA parameter of isSafeToMove (#100691)
This `AA` parameter is not used and for most uses they just pass
a nullptr.

The use of `AA` was removed since 8d0383e.
2024-07-26 15:47:47 +08:00
Matt Arsenault
3cb5604d2c
MachineOutliner: Use PM to query MachineModuleInfo (#99688)
Avoid getting this from the MachineFunction
2024-07-24 13:22:56 +04:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Haohai Wen
be00190ce3
[TII][X86] Do not schedule frame-setup/frame-destory instructions (#96611)
frame-setup/frame-destroy instruction can not be scheduled around by
PostRAScheduler. Their order is critical for SEH.
2024-06-26 17:08:59 +08:00