52796 Commits

Author SHA1 Message Date
weiguozhi
c166a43c6e
New calling convention preserve_none (#76868)
The new experimental calling convention preserve_none is the opposite
side of existing preserve_all. It tries to preserve as few general
registers as possible. So all general registers are caller saved
registers. It can also uses more general registers to pass arguments.
This attribute doesn't impact floating-point registers. Floating-point
registers still follow the c calling convention.

Currently preserve_none is supported on X86-64 only. It changes the c
calling convention in following fields:
  
* RSP and RBP are the only preserved general registers, all other
general registers are caller saved registers.
* We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX]
to pass arguments.

It can improve the performance of hot tailcall chain, because many
callee saved registers' save/restore instructions can be removed if the
tail functions are using preserve_none. In my experiment in protocol
buffer, the parsing functions are improved by 3% to 10%.
2024-02-05 13:28:43 -08:00
Michael Maitland
0bf165e383
[RISCV] Add support for RISC-V Pointer Masking (#79929)
This patch implements the v0.8.1 specification. This patch reports
version 0.8 in llvm since `RISCVISAInfo::ExtensionVersion` only has a
`Major` and `Minor` version number. This patch includes includes support
of the `Ssnpm`, `Smnpm`, `Smmpm`, `Sspm` and `Supm` extensions that make
up RISC-V pointer masking.

All of these extensions require emitting attribute containing correct
`march` string.

`Ssnpm`, `Smnpm`, `Smmpm` extensions introduce a 2-bit WARL field (PMM).
The extension does not specify how PMM is set, and therefore this patch
does not need to address this. One example of how it *could* be set is
using the Zicsr instructions to update the PMM bits of the described
registers.

The full specification can be found at
https://github.com/riscv/riscv-j-extension/blob/master/zjpm-spec.pdf
2024-02-05 13:56:25 -05:00
Alex Lorenz
dd70aef05a
[x86_64][windows][swift] do not use Swift async extended frame for wi… (#80468)
…ndows x86_64
targets that use windows 64 prologue

Windows x86_64 stack frame layout is currently not compatible with
Swift's async extended frame, which reserves the slot right below RBP
(RBP-8) for the async context pointer, as it doesn't account for the
fact that a stack object in a win64 frame can be allocated at the same
location. This can cause issues at runtime, for instance, Swift's TCA
test code has functions that fail because of this issue, as they spill a
value to that slack slot, which then gets overwritten by a store into
address returned by the @llvm.swift.async.context.addr() intrinsic (that
ends up being RBP - 8), leading to an incorrect value being used at a
later point when that stack slot is being read from again. This change
drops the use of async extended frame for windows x86_64 subtargets and
instead uses the x32 based approach of allocating a separate stack slot
for the stored async context pointer.

Additionally, LLDB which is the primary consumer of the extended frame
makes assumptions like checking for a saved previous frame pointer at
the current frame pointer address, which is also incompatible with the
windows x86_64 frame layout, as the previous frame pointer is not
guaranteed to be stored at the current frame pointer address. Therefore
the extended frame layout can be turned off to fix the current
miscompile without introducing regression into LLDB for windows x86_64
as it already doesn't work correctly. I am still investigating what
should be made for LLDB to support using an allocated stack slot to
store the async frame context instead of being located at RBP - 8 for
windows.
2024-02-05 10:19:26 -08:00
Simon Pilgrim
2096e57905 [X86] addConstantComments - add FP16 MOVSH asm comments support 2024-02-05 18:02:03 +00:00
Simon Pilgrim
8fa1e5771b [X86] Regenerate some vector constant comments missed in recent patches to improve mask predicate handling in addConstantComments
These were missed as filecheck just ignores what's after the end of the check pattern for each line
2024-02-05 18:02:03 +00:00
Stanislav Mekhanoshin
ea9276d47e
[AMDGPU] GlobalISel for f8 conversions (#80503) 2024-02-05 09:41:37 -08:00
Stanislav Mekhanoshin
d0b5d32ce6
[AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8 (#80502)
Opsel bits are swapped. Actual byte select table:

Byte  OPSEL
0     0
1     2
2     1
3     3
2024-02-05 09:35:01 -08:00
Billy Laws
8f070144e3
[AArch64] Fix generated types for ARM64EC variadic entry thunk targets (#80595)
ISel handles filling in x4/x5 when calling variadic functions as they
don't correspond to the 5th/6th X64 arguments but rather to the end of
the shadow space on the stack and the size in bytes of all stack
parameters (ignored and written as 0 for calls from entry thunks).

Will PR a follow up with ISel handling after this is merged.
2024-02-05 09:26:16 -08:00
Simon Pilgrim
f958ad3b89 [X86] printZeroUpperMove - add support for mask predicated instructions
Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register

This will more significantly help improve 'fixup constant' comments from #73509
2024-02-05 16:23:16 +00:00
Simon Pilgrim
47dcf5d5dc [X86] printBroadcast - add support for mask predicated instructions
Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register

This will more significantly help improve 'fixup constant' comments from #73509
2024-02-05 16:23:15 +00:00
Kevin P. Neal
d15c454bed [FPEnv][AMDGPU] Correct strictfp tests.
Correct AMDGPU strictfp tests to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

These tests needed the strictfp attribute added to function calls and
some declarations.

Some of the tests now pass with D146845, others get farther along and
fail with D146845. The tests revealed that further work is required
in mostly AMDGPU atomics to get the tests passing.

Since I was here anyway I removed the strictfp attribute from some
constrained intrinsic declarations. They have this attribute by default.

Test changes verified with D146845.
2024-02-05 09:29:31 -05:00
Shih-Po Hung
a826a0c234
[RISCV] Add tests for reduce.fmaximum/fminimum. NFC (#80553)
This is to add test coverage for crash report in #80340
2024-02-05 21:41:24 +08:00
Matt Arsenault
a5d206df79
AMDGPU: Set max supported div/rem size to 64 (#80669)
This enables IR expansion for i128 divisions. The vector case is still
broken because ExpandLargeDivRem doesn't try to handle them.

Fixes: SWDEV-426193
2024-02-05 19:09:38 +05:30
Pierre van Houtryve
4e958abf2f
[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678)
Fixes #80366
2024-02-05 14:36:15 +01:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Petar Avramovic
06f711a906
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-02-05 14:07:01 +01:00
Christudasan Devadasan
89ec940b4a
[AMDGPU] Insert spill codes for the SGPRs used for EXEC copy (#79428)
The SGPR registers used for preserving EXEC mask while lowering the
whole-wave register spills and copies should be preserved at the prolog
and epilog if they are in the CSR range. It isn't happening when there
is only wwm-copy lowered and there are no wwm-spills. This patch
addresses that problem.
2024-02-05 18:32:23 +05:30
Nikita Popov
b31fffbc7f [ARM] Convert tests to opaque pointers (NFC) 2024-02-05 13:56:59 +01:00
Nikita Popov
7bdc80f35c [AVR] Convert tests to opaque pointers (NFC) 2024-02-05 13:55:50 +01:00
Simon Pilgrim
69ffa7be3b
[X86] X86FixupVectorConstants - load+zero vector constants that can be stored in a truncated form (#80428)
Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.
2024-02-05 12:17:58 +00:00
Nikita Popov
6e83c0a1cb [X86] Convert tests to opaque pointers (NFC) 2024-02-05 12:43:44 +01:00
Nikita Popov
00a4e248dc [AMDGPU] Convert tests to opaque pointers (NFC) 2024-02-05 12:42:23 +01:00
Nikita Popov
1ee315ae79 [AArch64] Convert tests to opaque pointers (NFC) 2024-02-05 12:39:51 +01:00
Anatoly Trosinenko
7d879bc851
[AArch64][PAC] Refine authenticated pointer check methods (#74074)
Align the values of the immediate operand of BRK instruction with those
used by the existing arm64e implementation.

Make AuthCheckMethod::DummyLoad use the requested register
instead of LR.
2024-02-05 13:53:26 +03:00
David Green
d11c912f42 [AArch64][GlobalISel] Addition GISel testing for u/s add_sat and sub_sat. NFC 2024-02-05 08:47:12 +00:00
Pierre van Houtryve
500846d2f5
[AMDGPU] Introduce Code Object V6 (#76954)
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch #76955
2024-02-05 08:19:53 +01:00
Craig Topper
8ed046fc15 [RISCV] Custom type legalize i32 SADDSAT/SSUBSAT without Zbb.
While working on -riscv-experimental-rv64-legal-i32, I noticed this
missed optimization in our current codegen.

This expands to SADDO/SSUBO+select while still in i32. These will
be type legalized individually.
2024-02-04 23:15:58 -08:00
Craig Topper
5afeba051e [RISCV] Custom legalize i32 UADDSAT/USUBSAT for -riscv-experimental-rv64-legal-i32 with Zbb.
This matches the codegen we get from type legalization without
-riscv-experimental-rv64-legal-i32.
2024-02-04 21:37:38 -08:00
Chia
db060ab053
[RISCV][ISel] Remove redundant vmerge for vwsub(u).wv. (#80523) 2024-02-05 13:59:11 +09:00
Shengchen Kan
115c0c6513
[X86][test] Remove useless pattern for VDPBF16PSZmb and add a test for broadcast folding (#80629)
llvm-issue: https://github.com/llvm/llvm-project/issues/68810
2024-02-05 12:15:18 +08:00
Craig Topper
6590d0fed5
[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342)
If we have a shifted mask, we may be able to reduce the load width
to the width of the non-zero part of the mask and use an offset
to the base address to remove the srl. The offset is given by
C+trailingzeros(ShiftedMask).
    
Then we add a final shl to restore the trailing zero bits.
    
I've use the ARM test because that's where the existing (and (srl
(load))) tests were.
    
The X86 test was modified to keep the H register.
2024-02-04 16:05:51 -08:00
Craig Topper
146e5ce481 [RISCV] Add i32 zext.h pattern for -riscv-experimental-rv64-legal-i32. 2024-02-04 12:39:13 -08:00
Craig Topper
32b99617ac [RISCV] Custom promote i32 UADDSAT/USUBSAT for -riscv-experimental-rv64-legal-i32 with Zbb. 2024-02-04 12:39:13 -08:00
Craig Topper
859b09da08 [RISCV] Promote i32 ISD::VAARG to i64 for -riscv-experimental-rv64-legal-i32. 2024-02-04 11:03:12 -08:00
Craig Topper
3bcb1f2bdd [RISCV] Rework isSignExtendingOpW to store Register in the worklist.
Previously we stored MachineInstr which restricted the implementation
to only handle operand 0.

The TH_LWD instruction has two sign extended destinations.
2024-02-03 23:40:09 -08:00
Serge Pavlov
b4eb7a10c0
[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852)
Implement handling of get/set floating point environment for ARM in
Global Instruction Selector. Lowering of these intrinsics to operations
on FPSCR was previously inplemented in DAG selector, in GlobalISel it is
reused.
2024-02-04 12:30:33 +07:00
Craig Topper
9dfdea6fbd [RISCV] Add XTheadMac patterns for -riscv-experimental-rv64-legal-i32. 2024-02-03 19:37:46 -08:00
Craig Topper
f2cf8da636 [RISCV] Add more XTheadMemIdx patterns for -riscv-experimental-rv64-legal-i32. 2024-02-03 19:06:25 -08:00
Craig Topper
1da2921bbd [RISCV] Add missing extload test cases to xtheadmemidx.ll. NFC
We had the isel patterns, but no tests that used them. We only had
sextload and zextload tests.

Also reduce the alignment on some of the test cases that were
unnecessarily over aligned.
2024-02-03 17:55:29 -08:00
Harald van Dijk
61ff9f8db8
[X86] Add strictfp version of PR43024 test. (#80573)
For the current version of the PR43024 test, we should be able to
optimize away the operations but fail to do so. This commit adds a
strictfp version of the test where we should not be able to optimize
away the operations, as a verification that changes to improve the other
effect have no adverse effect.
2024-02-04 01:36:00 +00:00
Craig Topper
08e942aca6 [RISCV] Combine (xor (trunc (X cc Y)) 1) -> (trunc (X !cc Y)) for RV64LegalI32.
This is needed with RV64LegalI32 when the setcc is created after type
legalization. An i1 xor would have been promoted to i32, but the setcc
would have i64 result.
2024-02-03 13:57:47 -08:00
David Green
9d00c34132 [AArch64] Extend and cleanup movi tests. NFC 2024-02-03 21:23:01 +00:00
Craig Topper
ea59b15cf7 [RISCV] Add more RUN lines to rv64-legal-i32/xaluo.ll. NFC
This matches the non-rv64-legal-i32 version.
2024-02-03 13:11:59 -08:00
Craig Topper
f090924344 [RISCV] Custom legalize i32 SADDO/SSUBO with RV64LegaI32.
The default legalization uses 2 compares and an xor. We can instead
use add+addw+xor+snez like we do without RV64LegaI32.
2024-02-03 13:07:08 -08:00
Craig Topper
d62c5706a8 [RISCV] Custom legalize i32 SMULO with RV64LegalI32.
The default lowering will use shifts to make use of an i32 setcc.
We don't support i32 setcc, so its better to sig extend the low
32 bits and compare the full 64 bit result. This gives produces
mul+mulw+xor+snez like we do without RV64LegalI32.
2024-02-03 13:07:08 -08:00
yubingex007-a11y
b49fa21289
[X86] Stop custom-widening v2f32 = fpext v2bf16 (#80106) 2024-02-03 11:27:50 +08:00
Harald van Dijk
52864d9c7b
[ARM] Switch to soft promoting half types. (#80440)
The traditional promotion is known to generate wrong code.

Fixes #73805.
2024-02-02 21:40:40 +00:00
Craig Topper
e12be9cde4 [RISCV] Don't promote ISD::SELECT with rv64-legal-i32 when XTHeadCondMov is enabled.
Fixes an infinite loop.

Test copied from the non-rv64-legal-i32 test.
2024-02-02 11:53:47 -08:00
Fangrui Song
d4de4c3eaf
[AArch64] Support optional constant offset for constraint "S" (#80255)
Modify the initial implementation (https://reviews.llvm.org/D46745) to
support a constant offset so that the following code will compile:
```
int a[2][2];
void foo() { asm("// %0" :: "S"(&a[1][1])); }
```

We use the generic code path for "s". In GCC's aarch64 port, "S" is
supported for PIC while "s" isn't, making "s" less useful. We implement
"S" but not "s".

Similar to #80201 for RISC-V.
2024-02-02 10:33:09 -08:00
Simon Pilgrim
faeb3d1f10 [AMDGPU] Regenerate ctpop64.ll test checks 2024-02-02 18:08:15 +00:00