1697 Commits

Author SHA1 Message Date
Simon Pilgrim
3a5cf6d99b
[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826)
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
2024-11-20 08:25:01 +00:00
Simon Pilgrim
7dcefb37a4
[X86] Tidyup up AVX512 FPCLASS instruction naming (#116661)
FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name
2024-11-19 11:26:46 +00:00
Daniel Zabawa
6fb7cdff3d
[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265)
This code assumed only PUSHes would appear in call sequences. However,
if calls require frame-pointer/base-pointer spills, only the PUSH
operations inserted by spillFPBP will be recognized, and the adjustments
to frame object offsets in prologepilog will be incorrect.

This change correctly reports the SP adjustment for POP and ADD/SUB to
rsp, and an assertion for unrecognized instructions that modify rsp.
2024-11-14 17:20:16 +01:00
Phoebe Wang
08af115d97
Fix mistakes in #113532 (#115631)
Found during review #115151
2024-11-10 12:46:21 +08:00
Kazu Hirata
dfe43bd1ca
[X86] Remove unused includes (NFC) (#115593)
Identified with misc-include-cleaner.
2024-11-09 08:23:46 -08:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Simon Pilgrim
c59ac1a2f6
[X86] Cleanup AVX512 VBROADCAST subvector instruction names. (#108888)
This patch makes the `VBROADCAST***X**` subvector broadcast instructions consistent - the `***X**` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).
2024-09-18 10:34:35 +01:00
Simon Pilgrim
c91f2a259f
[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780)
We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.
2024-09-17 08:57:57 +01:00
Simon Pilgrim
614a064cac
[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names (#108593)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:42:13 +01:00
Simon Pilgrim
ba8e4246e2
[X86] Add missing immediate qualifier to the (V)INSERTPS instruction names (#108568)
Matches (V)BLENDPS etc and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-09-15 11:27:36 +01:00
Kyungwoo Lee
93b8d07a75
[MachineOutliner][NFC] Refactor (#105398)
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.

- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.

This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-08-27 14:38:36 -07:00
Piyou Chen
b01c006f73
[TII][RISCV] Add renamable bit to copyPhysReg (#91179)
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
2024-08-27 10:08:43 +08:00
Temperatureblock
db3c3fc90a
Simple check to ignore Inline asm fwait insertion (#101686)
Just a simple check to ignore Inline asm fwait insertion

Fixes #101613
2024-08-12 22:36:58 +08:00
Phoebe Wang
b0329206db
[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions (#101783)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-05 18:57:42 +08:00
Shengchen Kan
50cf413426 [X86,CodeGen] Return the correct condition code for SETZUCC
llvm-issue: https://github.com/llvm/llvm-project/issues/101288
2024-07-31 14:09:08 +08:00
Pengcheng Wang
ed4e75d5e5
[CodeGen] Remove AA parameter of isSafeToMove (#100691)
This `AA` parameter is not used and for most uses they just pass
a nullptr.

The use of `AA` was removed since 8d0383e.
2024-07-26 15:47:47 +08:00
Matt Arsenault
3cb5604d2c
MachineOutliner: Use PM to query MachineModuleInfo (#99688)
Avoid getting this from the MachineFunction
2024-07-24 13:22:56 +04:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Haohai Wen
be00190ce3
[TII][X86] Do not schedule frame-setup/frame-destory instructions (#96611)
frame-setup/frame-destroy instruction can not be scheduled around by
PostRAScheduler. Their order is critical for SEH.
2024-06-26 17:08:59 +08:00
Shengchen Kan
bdc7840c57 [X86][CodeGen] Share code between CompressEVEX pass and ND2NonND transform, NFCI 2024-06-19 16:03:57 +08:00
Shengchen Kan
1216cde81a [X86][mem-fold] Support memory folding from MOV32r0 to MOV64mi32 2024-06-12 22:06:10 +08:00
paperchalice
837dc542b1
[CodeGen][NewPM] Split MachineDominatorTree into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
2024-06-11 21:27:14 +08:00
Shengchen Kan
22c572eae0 [X86][CodeGen] Support memory folding for NDD -> RMW 2024-05-30 19:06:22 +08:00
Shengchen Kan
7f524f7ef2 [X86][CodeGen] Simplify the code in foldMemoryOperandImpl, NFCI
In preparation for the coming NDD -> RMW fold.
2024-05-30 14:57:38 +08:00
Shengchen Kan
a9e8a3a18e [X86][CodeGen] Extend X86CompressEVEX for NF transform 2024-05-29 15:41:31 +08:00
Shengchen Kan
331eb8a004
[X86][CodeGen] Support lowering for CCMP/CTEST (#91747)
DAG combine for `CCMP` and `CTESTrr`:

```
and/or(setcc(cc0, flag0), setcc(cc1, sub (X, Y)))
->
setcc(cc1, ccmp(X, Y, ~cflags/cflags, cc0/~cc0, flag0))

and/or(setcc(cc0, flag0), setcc(cc1, cmp (X, 0)))
 ->
setcc(cc1, ctest(X, X, ~cflags/cflags, cc0/~cc0, flag0))
```
 where `cflags` is determined by `cc1`.

Generic DAG combine:
```
cmp(setcc(cc, X), 0)
brcond ne
->
X
brcond cc

sub(setcc(cc, X), 1)
brcond ne
->
X
brcond ~cc
```

Post DAG transform:  `ANDrr/rm + CTESTrr -> CTESTrr/CTESTmr`


Pattern match for `CTESTri`:
```
X= and A, B
ctest(X, X, cflags, cc0/, flag0)
->
ctest(A, B, cflags, cc0/, flag0)
```

`CTESTmi` is already handled by the memory folding mechanism in MIR.
2024-05-26 18:32:23 +08:00
Shengchen Kan
4b62afca64
[X86][CodeGen] Support flags copy lowering for CCMP/CTEST (#91849)
```
%1:gr64 = COPY $eflags
OP1 may update eflags
$eflags = COPY %1
OP2 may use eflags
```

To use eflags as input at 4th instruction, we need to use SETcc to
preserve the eflags before 2, and update the source condition of OP2
according to value in GPR %1.

In this patch, we support CCMP/CTEST as OP2.
2024-05-18 19:50:16 +08:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Pengcheng Wang
b564036933
[MachineCombiner][NFC] Split target-dependent patterns
We split target-dependent MachineCombiner patterns into their target
folder.

This makes MachineCombiner much more target-independent.

Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc

Reviewed By: topperc, mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/87991
2024-04-11 12:20:27 +08:00
Simon Pilgrim
ecb34599bd
[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636)
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04 15:20:16 +01:00
Freddy Ye
36b4b9d988
[X86] Support immediate folding for CCMP/CTEST (#86616)
E.g.
%0:gr32 = MOV32ri 81
CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags
=>
CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags
2024-03-28 18:54:32 +08:00
XinWang10
7b766a6f50
[X86] Support APX CMOV/CFCMOV instructions (#82592)
This patch support ND CMOV instructions and CFCMOV instructions.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-03-17 20:18:56 +08:00
Ganesh
61fadd0b09
[X86] Fast AVX-512-VNNI vpdpwssd tuning (#85375)
Adding a tuning feature to fix
https://github.com/llvm/llvm-project/issues/84182
Generates vpdpwssd (instead of vpmaddwd + vpaddd sequence)
2024-03-15 16:45:41 +05:30
Simon Pilgrim
1ec5b1f483 [X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names 2024-03-11 13:39:25 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
AtariDreams
3e40c96d89
[X86] Resolve FIXME: Add FPCW as a rounding control register (#82452)
To prevent tests from breaking, another fix had to be made: Now, we
check if the instruction after a waiting instruction is a call, and if
so, we insert the wait.
2024-03-05 08:47:05 +08:00
Simon Pilgrim
448fe73428 [X86] Add X86::getVectorRegisterWidth helper. NFC.
Replaces internal helper used by addConstantComments to allow reuse in a future patch.
2024-02-08 12:42:33 +00:00
Shengchen Kan
e270ec47cd [X86] X86InstrInfo.cpp - Remove dead code for memory folding, NFCI
`commuteInstruction(MI, false, OpNum, CommuteOpIdx2)` should never create
any new instruction, so we don't need to check and erase it.
2024-02-02 11:14:07 +08:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Shengchen Kan
c82a645ef2 [X86][NFC] Simplify the code for memory fold 2024-02-01 13:43:25 +08:00
Shengchen Kan
e3c9327bc4 [X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load
Broadcast of a single float should not be any slower than
loading 32B using vmovaps. So remat it can help reduce
register spill when there is big register pressure.
2024-01-31 20:55:56 +08:00
Kazu Hirata
5d7a0a734a [X86] Use a range-based for loop (NFC) 2024-01-30 22:12:05 -08:00
Shengchen Kan
8e77390c06
[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761) 2024-01-31 12:51:03 +08:00
Shengchen Kan
2960656eb9 [X86][NFC] Extract code for commute in foldMemoryOperandImpl into functions
To share code for folding broadcast in #79761
2024-01-31 00:09:08 +08:00
Shengchen Kan
02a275cca1 [X86][CodeGen] Add entries for TB_BCAST_SH in getBroadcastOpcode 2024-01-30 21:01:31 +08:00
Shengchen Kan
f28430d577 [X86][CodeGen] Add entries for TB_BCAST_W in getBroadcastOpcode and fix typo 2024-01-30 01:03:32 +08:00
Shengchen Kan
169553688c [X86][NFC] Remove TB_FOLDED_BCAST and format code in X86InstrFoldTables.cpp 2024-01-30 00:27:16 +08:00
Shengchen Kan
7089c012ec [X86][NFC] Replace if-else with switch-case in X86InstrInfo::foldMemoryOperandImpl 2024-01-28 10:30:26 +08:00