llvm-project

Author	SHA1	Message	Date
weiguozhi	c166a43c6e	New calling convention preserve_none (#76868 ) The new experimental calling convention preserve_none is the opposite side of existing preserve_all. It tries to preserve as few general registers as possible. So all general registers are caller saved registers. It can also uses more general registers to pass arguments. This attribute doesn't impact floating-point registers. Floating-point registers still follow the c calling convention. Currently preserve_none is supported on X86-64 only. It changes the c calling convention in following fields: * RSP and RBP are the only preserved general registers, all other general registers are caller saved registers. * We can use [RDI, RSI, RDX, RCX, R8, R9, R11, R12, R13, R14, R15, RAX] to pass arguments. It can improve the performance of hot tailcall chain, because many callee saved registers' save/restore instructions can be removed if the tail functions are using preserve_none. In my experiment in protocol buffer, the parsing functions are improved by 3% to 10%.	2024-02-05 13:28:43 -08:00
Michael Maitland	0bf165e383	[RISCV] Add support for RISC-V Pointer Masking (#79929 ) This patch implements the v0.8.1 specification. This patch reports version 0.8 in llvm since `RISCVISAInfo::ExtensionVersion` only has a `Major` and `Minor` version number. This patch includes includes support of the `Ssnpm`, `Smnpm`, `Smmpm`, `Sspm` and `Supm` extensions that make up RISC-V pointer masking. All of these extensions require emitting attribute containing correct `march` string. `Ssnpm`, `Smnpm`, `Smmpm` extensions introduce a 2-bit WARL field (PMM). The extension does not specify how PMM is set, and therefore this patch does not need to address this. One example of how it could be set is using the Zicsr instructions to update the PMM bits of the described registers. The full specification can be found at https://github.com/riscv/riscv-j-extension/blob/master/zjpm-spec.pdf	2024-02-05 13:56:25 -05:00
Alex Lorenz	dd70aef05a	[x86_64][windows][swift] do not use Swift async extended frame for wi… (#80468 ) …ndows x86_64 targets that use windows 64 prologue Windows x86_64 stack frame layout is currently not compatible with Swift's async extended frame, which reserves the slot right below RBP (RBP-8) for the async context pointer, as it doesn't account for the fact that a stack object in a win64 frame can be allocated at the same location. This can cause issues at runtime, for instance, Swift's TCA test code has functions that fail because of this issue, as they spill a value to that slack slot, which then gets overwritten by a store into address returned by the @llvm.swift.async.context.addr() intrinsic (that ends up being RBP - 8), leading to an incorrect value being used at a later point when that stack slot is being read from again. This change drops the use of async extended frame for windows x86_64 subtargets and instead uses the x32 based approach of allocating a separate stack slot for the stored async context pointer. Additionally, LLDB which is the primary consumer of the extended frame makes assumptions like checking for a saved previous frame pointer at the current frame pointer address, which is also incompatible with the windows x86_64 frame layout, as the previous frame pointer is not guaranteed to be stored at the current frame pointer address. Therefore the extended frame layout can be turned off to fix the current miscompile without introducing regression into LLDB for windows x86_64 as it already doesn't work correctly. I am still investigating what should be made for LLDB to support using an allocated stack slot to store the async frame context instead of being located at RBP - 8 for windows.	2024-02-05 10:19:26 -08:00
Simon Pilgrim	2096e57905	[X86] addConstantComments - add FP16 MOVSH asm comments support	2024-02-05 18:02:03 +00:00
Simon Pilgrim	8fa1e5771b	[X86] Regenerate some vector constant comments missed in recent patches to improve mask predicate handling in addConstantComments These were missed as filecheck just ignores what's after the end of the check pattern for each line	2024-02-05 18:02:03 +00:00
Stanislav Mekhanoshin	ea9276d47e	[AMDGPU] GlobalISel for f8 conversions (#80503 )	2024-02-05 09:41:37 -08:00
Stanislav Mekhanoshin	d0b5d32ce6	[AMDGPU] Fixed byte_sel of v_cvt_f32_bf8/v_cvt_f32_fp8 (#80502 ) Opsel bits are swapped. Actual byte select table: Byte OPSEL 0 0 1 2 2 1 3 3	2024-02-05 09:35:01 -08:00
Billy Laws	8f070144e3	[AArch64] Fix generated types for ARM64EC variadic entry thunk targets (#80595 ) ISel handles filling in x4/x5 when calling variadic functions as they don't correspond to the 5th/6th X64 arguments but rather to the end of the shadow space on the stack and the size in bytes of all stack parameters (ignored and written as 0 for calls from entry thunks). Will PR a follow up with ISel handling after this is merged.	2024-02-05 09:26:16 -08:00
Simon Pilgrim	f958ad3b89	[X86] printZeroUpperMove - add support for mask predicated instructions Handle masked predicated movss/movsd in addConstantComments now that we can generically handle the destination + mask register This will more significantly help improve 'fixup constant' comments from #73509	2024-02-05 16:23:16 +00:00
Simon Pilgrim	47dcf5d5dc	[X86] printBroadcast - add support for mask predicated instructions Handle masked predicated load/broadcasts in addConstantComments now that we can generically handle the destination + mask register This will more significantly help improve 'fixup constant' comments from #73509	2024-02-05 16:23:15 +00:00
Kevin P. Neal	d15c454bed	[FPEnv][AMDGPU] Correct strictfp tests. Correct AMDGPU strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to function calls and some declarations. Some of the tests now pass with D146845, others get farther along and fail with D146845. The tests revealed that further work is required in mostly AMDGPU atomics to get the tests passing. Since I was here anyway I removed the strictfp attribute from some constrained intrinsic declarations. They have this attribute by default. Test changes verified with D146845.	2024-02-05 09:29:31 -05:00
Shih-Po Hung	a826a0c234	[RISCV] Add tests for reduce.fmaximum/fminimum. NFC (#80553 ) This is to add test coverage for crash report in #80340	2024-02-05 21:41:24 +08:00
Matt Arsenault	a5d206df79	AMDGPU: Set max supported div/rem size to 64 (#80669 ) This enables IR expansion for i128 divisions. The vector case is still broken because ExpandLargeDivRem doesn't try to handle them. Fixes: SWDEV-426193	2024-02-05 19:09:38 +05:30
Pierre van Houtryve	4e958abf2f	[AMDGPU][PromoteAlloca] Support memsets to ptr allocas (#80678 ) Fixes #80366	2024-02-05 14:36:15 +01:00
Nikita Popov	ff9af4c43a	[CodeGen] Convert tests to opaque pointers (NFC)	2024-02-05 14:07:09 +01:00
Petar Avramovic	06f711a906	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-02-05 14:07:01 +01:00
Christudasan Devadasan	89ec940b4a	[AMDGPU] Insert spill codes for the SGPRs used for EXEC copy (#79428 ) The SGPR registers used for preserving EXEC mask while lowering the whole-wave register spills and copies should be preserved at the prolog and epilog if they are in the CSR range. It isn't happening when there is only wwm-copy lowered and there are no wwm-spills. This patch addresses that problem.	2024-02-05 18:32:23 +05:30
Nikita Popov	b31fffbc7f	[ARM] Convert tests to opaque pointers (NFC)	2024-02-05 13:56:59 +01:00
Nikita Popov	7bdc80f35c	[AVR] Convert tests to opaque pointers (NFC)	2024-02-05 13:55:50 +01:00
Simon Pilgrim	69ffa7be3b	[X86] X86FixupVectorConstants - load+zero vector constants that can be stored in a truncated form (#80428 ) Further develops the vsextload support added in #79815 / b5d35feacb7246573c6a4ab2bddc4919a4228ed5 - reduces the size of the vector constant by storing it in the constant pool in a truncated form, and zero-extend it as part of the load.	2024-02-05 12:17:58 +00:00
Nikita Popov	6e83c0a1cb	[X86] Convert tests to opaque pointers (NFC)	2024-02-05 12:43:44 +01:00
Nikita Popov	00a4e248dc	[AMDGPU] Convert tests to opaque pointers (NFC)	2024-02-05 12:42:23 +01:00
Nikita Popov	1ee315ae79	[AArch64] Convert tests to opaque pointers (NFC)	2024-02-05 12:39:51 +01:00
Anatoly Trosinenko	7d879bc851	[AArch64][PAC] Refine authenticated pointer check methods (#74074 ) Align the values of the immediate operand of BRK instruction with those used by the existing arm64e implementation. Make AuthCheckMethod::DummyLoad use the requested register instead of LR.	2024-02-05 13:53:26 +03:00
David Green	d11c912f42	[AArch64][GlobalISel] Addition GISel testing for u/s add_sat and sub_sat. NFC	2024-02-05 08:47:12 +00:00
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Craig Topper	8ed046fc15	[RISCV] Custom type legalize i32 SADDSAT/SSUBSAT without Zbb. While working on -riscv-experimental-rv64-legal-i32, I noticed this missed optimization in our current codegen. This expands to SADDO/SSUBO+select while still in i32. These will be type legalized individually.	2024-02-04 23:15:58 -08:00
Craig Topper	5afeba051e	[RISCV] Custom legalize i32 UADDSAT/USUBSAT for -riscv-experimental-rv64-legal-i32 with Zbb. This matches the codegen we get from type legalization without -riscv-experimental-rv64-legal-i32.	2024-02-04 21:37:38 -08:00
Chia	db060ab053	[RISCV][ISel] Remove redundant vmerge for vwsub(u).wv. (#80523 )	2024-02-05 13:59:11 +09:00
Shengchen Kan	115c0c6513	[X86][test] Remove useless pattern for VDPBF16PSZmb and add a test for broadcast folding (#80629 ) llvm-issue: https://github.com/llvm/llvm-project/issues/68810	2024-02-05 12:15:18 +08:00
Craig Topper	6590d0fed5	[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342 ) If we have a shifted mask, we may be able to reduce the load width to the width of the non-zero part of the mask and use an offset to the base address to remove the srl. The offset is given by C+trailingzeros(ShiftedMask). Then we add a final shl to restore the trailing zero bits. I've use the ARM test because that's where the existing (and (srl (load))) tests were. The X86 test was modified to keep the H register.	2024-02-04 16:05:51 -08:00
Craig Topper	146e5ce481	[RISCV] Add i32 zext.h pattern for -riscv-experimental-rv64-legal-i32.	2024-02-04 12:39:13 -08:00
Craig Topper	32b99617ac	[RISCV] Custom promote i32 UADDSAT/USUBSAT for -riscv-experimental-rv64-legal-i32 with Zbb.	2024-02-04 12:39:13 -08:00
Craig Topper	859b09da08	[RISCV] Promote i32 ISD::VAARG to i64 for -riscv-experimental-rv64-legal-i32.	2024-02-04 11:03:12 -08:00
Craig Topper	3bcb1f2bdd	[RISCV] Rework isSignExtendingOpW to store Register in the worklist. Previously we stored MachineInstr which restricted the implementation to only handle operand 0. The TH_LWD instruction has two sign extended destinations.	2024-02-03 23:40:09 -08:00
Serge Pavlov	b4eb7a10c0	[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852 ) Implement handling of get/set floating point environment for ARM in Global Instruction Selector. Lowering of these intrinsics to operations on FPSCR was previously inplemented in DAG selector, in GlobalISel it is reused.	2024-02-04 12:30:33 +07:00
Craig Topper	9dfdea6fbd	[RISCV] Add XTheadMac patterns for -riscv-experimental-rv64-legal-i32.	2024-02-03 19:37:46 -08:00
Craig Topper	f2cf8da636	[RISCV] Add more XTheadMemIdx patterns for -riscv-experimental-rv64-legal-i32.	2024-02-03 19:06:25 -08:00
Craig Topper	1da2921bbd	[RISCV] Add missing extload test cases to xtheadmemidx.ll. NFC We had the isel patterns, but no tests that used them. We only had sextload and zextload tests. Also reduce the alignment on some of the test cases that were unnecessarily over aligned.	2024-02-03 17:55:29 -08:00
Harald van Dijk	61ff9f8db8	[X86] Add strictfp version of PR43024 test. (#80573 ) For the current version of the PR43024 test, we should be able to optimize away the operations but fail to do so. This commit adds a strictfp version of the test where we should not be able to optimize away the operations, as a verification that changes to improve the other effect have no adverse effect.	2024-02-04 01:36:00 +00:00
Craig Topper	08e942aca6	[RISCV] Combine (xor (trunc (X cc Y)) 1) -> (trunc (X !cc Y)) for RV64LegalI32. This is needed with RV64LegalI32 when the setcc is created after type legalization. An i1 xor would have been promoted to i32, but the setcc would have i64 result.	2024-02-03 13:57:47 -08:00
David Green	9d00c34132	[AArch64] Extend and cleanup movi tests. NFC	2024-02-03 21:23:01 +00:00
Craig Topper	ea59b15cf7	[RISCV] Add more RUN lines to rv64-legal-i32/xaluo.ll. NFC This matches the non-rv64-legal-i32 version.	2024-02-03 13:11:59 -08:00
Craig Topper	f090924344	[RISCV] Custom legalize i32 SADDO/SSUBO with RV64LegaI32. The default legalization uses 2 compares and an xor. We can instead use add+addw+xor+snez like we do without RV64LegaI32.	2024-02-03 13:07:08 -08:00
Craig Topper	d62c5706a8	[RISCV] Custom legalize i32 SMULO with RV64LegalI32. The default lowering will use shifts to make use of an i32 setcc. We don't support i32 setcc, so its better to sig extend the low 32 bits and compare the full 64 bit result. This gives produces mul+mulw+xor+snez like we do without RV64LegalI32.	2024-02-03 13:07:08 -08:00
yubingex007-a11y	b49fa21289	[X86] Stop custom-widening v2f32 = fpext v2bf16 (#80106 )	2024-02-03 11:27:50 +08:00
Harald van Dijk	52864d9c7b	[ARM] Switch to soft promoting half types. (#80440 ) The traditional promotion is known to generate wrong code. Fixes #73805.	2024-02-02 21:40:40 +00:00
Craig Topper	e12be9cde4	[RISCV] Don't promote ISD::SELECT with rv64-legal-i32 when XTHeadCondMov is enabled. Fixes an infinite loop. Test copied from the non-rv64-legal-i32 test.	2024-02-02 11:53:47 -08:00
Fangrui Song	d4de4c3eaf	[AArch64] Support optional constant offset for constraint "S" (#80255 ) Modify the initial implementation (https://reviews.llvm.org/D46745) to support a constant offset so that the following code will compile: ``` int a[2][2]; void foo() { asm("// %0" :: "S"(&a[1][1])); } ``` We use the generic code path for "s". In GCC's aarch64 port, "S" is supported for PIC while "s" isn't, making "s" less useful. We implement "S" but not "s". Similar to #80201 for RISC-V.	2024-02-02 10:33:09 -08:00
Simon Pilgrim	faeb3d1f10	[AMDGPU] Regenerate ctpop64.ll test checks	2024-02-02 18:08:15 +00:00

... 16 17 18 19 20 ...

52796 Commits