llvm-project

Author	SHA1	Message	Date
Oliver Stannard	f2e7285b03	[AArch64][PtrAuth] Fix unwind state for tail calls When generating unwind tables for code which uses return-address signing, we need to toggle the RA_SIGN_STATE DWARF register around any tail-calls, because these require the return address to be authenticated before the call, and could throw an exception. This is done using the .cfi_negate_ra_state directive before the call, and .cfi_restore_state at the start of the next basic block. However, since D153098, the .cfi_restore_state isn't being inserted, because the CFIFixup pass isn't being run. This re-enables that pass when return-adress signing is enabled. Reviewed By: ikudrin, MaskRay Differential Revision: https://reviews.llvm.org/D156428	2023-08-03 11:45:51 +01:00
Jay Foad	0da19a2be5	[PEI][WebAssembly] Switch to backwards frame index elimination Backwards frame index elimination uses backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156691	2023-08-03 10:21:43 +01:00
Simon Pilgrim	7f9b94c044	[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic	2023-08-03 10:20:20 +01:00
Jim Lin	a2938ba707	[RISCV] Add tests that m extension enabled in extractelt-int-rv64.ll. NFC. It has been added in extractelt-int-rv32.ll.	2023-08-03 15:34:44 +08:00
Yeting Kuo	f68c6879ad	[RISCV] Use max pushed register to get pushed register number. Previously we used the number of registers needed saved and pushable as the number of pushed registers. We also use pushed register number to caculate the stack size. It is not correct because Zcmp pushes registers from $ra to the max register needed saved and there is no gurantee that the needed saved registers are a sequenced list from $ra. There is an example about that. PushPopRegs should be 6 (ra,s0 - s4)= instead of 1. ``` ; llc -mtriple=riscv32 -mattr=+zcmp define void @foo() { entry: ; Old: .cfi_def_cfa_offset 16 ; New: .cfi_def_cfa_offset 32 tail call void asm sideeffect "li s4, 0", "~{s4}"() ret void } ``` Reviewed By: Jim, kito-cheng Differential Revision: https://reviews.llvm.org/D156407	2023-08-03 14:49:15 +08:00
Alex Bradbury	8a71f44e00	[RISCV] Expand test coverage of bf16 operations with Zfbfmin and fix gaps This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test cases, it simply picks off an initial set that can be supported especially easy. In order to make the review more manageable, I'll follow up with other cases. There is zero innovation in the test cases - they simply take the existing half/float cases and replace f16->bf16 and half->bfloat. Differential Revision: https://reviews.llvm.org/D156895	2023-08-03 07:06:57 +01:00
Bing1 Yu	6ee497aa0b	[X86][Regcall] Add an option to respect regcall ABI v.4 in win64&win32 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D155863	2023-08-03 13:58:33 +08:00
Jim Lin	40cc106fa0	[RISCV] Scalarize binop followed by extractelement to custom lowered instruction isOperationLegalOrCustomOrPromote returns true only if VT is other or legal and operation action is Legal, Custom or Promote. Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type. One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering, so vadd for element type i32 can be converted to addw. Reviewed By: jacquesguan, craig.topper Differential Revision: https://reviews.llvm.org/D156692	2023-08-03 13:02:49 +08:00
Craig Topper	c1c5da8f1f	[RISCV] Merge fp-imm.ll and zfh-imm.ll into float/double/half-imm.ll. NFC fp-imm.ll and zfh-imm.ll test 0.0 and -0.0 while float/double/half-imm.ll tested other non-zero constants. It seems like they should all be tested together. There are slight coverage changes due to different command lines, but I'm not sure its meaningful. For example, we now don't test double 0.0 and -0.0 with only the F extension. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156929	2023-08-02 20:16:50 -07:00
Yeting Kuo	cd79599304	[RISCV] Teach lowerScalarInsert to handle scalar value is the first element of a fixed vector. D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0) and specifically converts fixed extracted vectors to scalable vectors when lowering vector reduction. It's not enough because there is another way to create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert like #64327. #64327: https://github.com/llvm/llvm-project/issues/64327. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156863	2023-08-03 10:53:14 +08:00
Phoebe Wang	4d6f4c9c93	[X86] Special handle for v1i1 during ExtractBitFromMaskVector Fixes #64322 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D156855	2023-08-03 09:50:31 +08:00
Luke Lau	0834355227	[RISCV] Add VP patterns for vwsll.[vv,vx,vi] This patch adds patterns for the existing riscv_shl_vl VL node. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156915	2023-08-03 00:43:13 +01:00
Matt Arsenault	54bda79335	AMDGPU: Simplify and improve sincos matching The first trivial example I tried failed to merge due to the user scan logic. Remove the complicated scan of users handling with distance thresholds, with a same block restriction. The actual expansion of sincos is basically the same size as sin or cos individually. Copy the technique the generic optimization uses, which is to just use the input instruction as the insert point or just insert at the start of the entry block. https://reviews.llvm.org/D156706	2023-08-02 17:48:35 -04:00
Philip Reames	660b740e4b	[DAG] Support store merging of vector constant stores Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged. Differential Revision: https://reviews.llvm.org/D156349	2023-08-02 14:41:46 -07:00
Alex Bradbury	667602793b	[RISCV] Implement support for bf16 select when zfbfmin is enabled These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers. Differential Revision: https://reviews.llvm.org/D156883	2023-08-02 20:04:30 +01:00
4vtomat	346c1f2641	[RISCV] Support vector crypto extension LLVM IR Depends on D141672 Differential Revision: https://reviews.llvm.org/D138809	2023-08-02 10:25:36 -07:00
Philip Reames	fe4c99d1d6	[RISCV] Add test case showing CSE regression from issue 64282	2023-08-02 09:12:46 -07:00
Matt Arsenault	b953155b49	AMDGPU: Fix counting debug instructions in execz skip threshold	2023-08-02 08:09:41 -04:00
Mirko Brkusanin	acdc503d6c	[AMDGPU][GlobalISel] Update applyMappingImpl for G_ABS and type v2s16 For G_ABS with type v2s16 and sgpr inputs break down into two s32 G_ABS instructions. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155867	2023-08-02 12:27:06 +02:00
Mirko Brkusanin	fadf3e7f2b	[AMDGPU][GlobalISel] Update legalizer for G_ABS, G_SMIN, G_SMAX, G_UMIN, G_UMAX There is no need to increase the size of odd sized vectors if they are going to be scalarized by a different rule. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155865	2023-08-02 12:18:18 +02:00
Alex Bradbury	8acb8a143f	[RISCV] Make Zcf and Zcd imply the F and D extensions respectively This was an omission in the spec that has now been addressed https://github.com/riscv/riscv-code-size-reduction/pull/224. Differential Revision: https://reviews.llvm.org/D156314	2023-08-02 10:40:38 +01:00
Alex Bradbury	be0dac268d	[RISCV] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}' As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand with appropriate mask. There are a range of possible ways to go about this - e.g. writing a combine based on the `llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new interface to AtomicExpandPass to allow target-specific atomics conversions, or trying to lift the conversion into AtomicExpandPass itself based on querying some target hook. Ultimately I've gone with what appears to be the simplest approach - just covering this case in emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a different name way back when it was introduced. This also handles the `atomicrmw xchg a, -1` case suggested by Craig during review. Fixes https://github.com/llvm/llvm-project/issues/64090 Differential Revision: https://reviews.llvm.org/D156801	2023-08-02 09:48:50 +01:00
Jay Foad	c2093b8504	[AMDGPU] Add target features for GDS and GWS GFX9 subtargets from GFX90A onwards lack GDS but still have GWS. Differential Revision: https://reviews.llvm.org/D156713	2023-08-02 09:02:07 +01:00
Jay Foad	8f973d5c45	[DebugInfo] Fix crash when printing malformed DBG machine instructions MachineVerifier does not check that DBG_VALUE, DBG_VALUE_LIST and DBG_INSTR_REF have the expected number of operands, so printing them (e.g. with -print-after-all) should not crash. Differential Revision: https://reviews.llvm.org/D156226	2023-08-02 08:28:20 +01:00
Jim Lin	d6a48a348a	[RISCV] Fix the CFI offset for callee-saved registers stored by Zcmp push. Issue mentioned: https://github.com/riscv/riscv-code-size-reduction/issues/182 The order of callee-saved registers stored by Zcmp push in memory is reversed. Pseudo code for cm.push in https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-1/Zc.1.0.4-1.pdf ``` if (XLEN==32) bytes=4; else bytes=8; addr=sp-bytes; for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { //if register i is in xreg_list if (xreg_list[i]) { switch(bytes) { 4: asm("sw x[i], 0(addr)"); 8: asm("sd x[i], 0(addr)"); } addr-=bytes; } } ``` The placement order for push is s11, s10, ..., ra. CFI offset should be calculed as reversed order for correct stack unwinding. Reviewed By: fakepaper56, kito-cheng Differential Revision: https://reviews.llvm.org/D156437	2023-08-02 13:03:21 +08:00
Matt Arsenault	5dfdd3494b	AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify It's not a libcall so doesn't really belong here to begin with. Relying on checking the target name and explicit features isn't particularly sound either. The library doesn't use the intrinsic anymore, so it doesn't matter anyway.	2023-08-01 18:20:50 -04:00
Matt Arsenault	eb00555c16	AMDGPU: Add more tests for sincos recognition These show both broken cases and cases which are handled too conservatively.	2023-08-01 18:20:50 -04:00
Philip Reames	1e86abc914	[RISCVRVVInitUndef] Ignore tied use for partial undef register The purpose of this code is to restrict overlap between source and destination registers. The tied input register is conceptually part of the destination. I can't see any reason why we need to prevent a partial undef tied source here, and skipping it reduces register pressure slightly. Differential Revision: https://reviews.llvm.org/D156709	2023-08-01 12:16:26 -07:00
Philip Reames	e938217f81	[RISCV] Implement getOptimalMemOpType for memcpy/memset lowering This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar. Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code. The primary effects of this change are: * Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset. * Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here. Differential Revision: https://reviews.llvm.org/D156249	2023-08-01 12:14:50 -07:00
Craig Topper	5a519961c8	[RISCV] Call combineSelectToBinOp before generic select expansion for Zicond. This handles logical ops of setccs and optimizes when the true or false value is -1. Reviewed By: asb, wangpc Differential Revision: https://reviews.llvm.org/D156810	2023-08-01 12:09:35 -07:00
Philip Reames	e93a8137d3	[RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF (try 2) Reapplying after revert due to sanitizer failure. Includes fix to avoid querying dead lanes for vreg introduced by previous transform. The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility. I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now. As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.) Differential Revision: https://reviews.llvm.org/D156477	2023-08-01 10:50:03 -07:00
Alex Bradbury	bc2ea021ec	[RISCV][test] Add 'atomicrmw xchg a, -1' tests in preparation for D156801 As noted by Craig, we can improve codegen for the -1 case as well.	2023-08-01 18:39:00 +01:00
Craig Topper	048458f94c	[RISCV] Add no NaN support to lowerFMAXIMUM_FMINIMUM. Using the nonans FMF and the DAG.isKnownNeverNaN on the inputs. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156748	2023-08-01 09:51:24 -07:00
Mikhail Gudim	0fb3ebb2fc	[RISCV] Generalize 'tryFoldSelectIntOp` to other operations. Currently, only `SUB`, `ADD`, `OR` and `XOR` are covered. This patch adds `AND`, `SHL`, `SRA`, `SRL`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155344	2023-08-01 11:27:10 -04:00
Alex Bradbury	fdac86cce8	[RISCV][test] Add atomicrmw test cases for suboptimal codegen report in #64090 <https://github.com/llvm/llvm-project/issues/64090> A forthcoming patch addresses these cases.	2023-08-01 15:30:09 +01:00
Francesco Petrogalli	cd921e0fd7	[MISched] Do not erase resource booking history for subunits. When dealing with the subunits of a resource group, we should reset the subunits availability at the first avaiable cycle of the resource that contains the subunits. Previously, the reset operation was returning cycle 0, effectively erasing the booking history of the subunits. Without this change, when using intervals for models have make use of subunits, the erasing of resource booking for subunits can raise the assertion "A resource is being overwritten" in `ResourceSegments::add`. The test added in the patch is one of such cases. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D156530	2023-08-01 14:00:37 +02:00
Paulo Matos	8f3b87fc14	[SPIRV] Add support for SPV_INTEL_optnone Adds support for SPV_INTEL_optnone. Currently still in draft form but I wanted to open this revision to ask some questions. Differential Revision: https://reviews.llvm.org/D156297	2023-08-01 12:53:54 +02:00
Ben Shi	75c3c6ac15	[CSKY] Optimize 'llvm.cttz.i32' and 'llvm.ctlz.i32' Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D156780	2023-08-01 18:15:20 +08:00
Ben Shi	f94e9bdc57	[CSKY][NFC][test] Add more tests of CodeGen for intrinsics Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D156543	2023-08-01 17:12:32 +08:00
Ben Shi	80cd505914	[CSKY] Optimize implementation of intrinsic 'llvm.cttz.i32' Reviewed By: zixuan-wu Differential Revison: https://reviews.llvm.org/D154588	2023-08-01 17:12:32 +08:00
Yeting Kuo	4c8cf92067	[RISCV] Use the first element of source as the start value of reduction. Previously when llvm.reduce.* lowered, riscv backend created scalar vector with netural element as start value. For llvm.reduce.and/or/min/max/fmax/fmin, we could use the first element of source as the start value. It's benefit for RVV since we could just use source vector as start vector. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155929	2023-08-01 13:15:40 +08:00
Jun Sha (Joshua)	934b490530	[RISCV] Expand load extension / truncate store for bf16 Currentlt, bf16 operations are automatically supported by promoting to float. This patch adds bf16 support by ensuring that load extension / truncate store operations are properly expanded. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156646	2023-08-01 11:10:41 +08:00
Tamir Duberstein	59afd29899	[BPF] Match CHECK w/ LLVM_ENABLE_ASSERTIONS=OFF (D156136)	2023-08-01 11:12:43 +09:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Steven Wu	42c9354a92	Revert "Reland "[LoongArch] Support -march=native and -mtune="" This reverts commit c56514f21b2cf08eaa7ac3a57ba4ce403a9c8956. This commit adds global state that is shared between clang driver and clang cc1, which is not correct when clang is used with `-fno-integrated-cc1` option (no integrated cc1). The -march and -mtune option needs to be properly passed through cc1 command-line and stored in TargetInfo.	2023-07-31 16:57:06 -07:00
Matt Arsenault	5b5bd81b71	AMDGPU: Move placement of RemoveIncompatibleFunctions This should be approximately first and run with other module passes. https://reviews.llvm.org/D155987	2023-07-31 19:22:04 -04:00
Matt Arsenault	db4d6ef9ef	AMDGPU: Directly emit fabs intrinsic instead of new libcall	2023-07-31 19:19:56 -04:00
Matt Arsenault	02a0b11331	AMDGPU: Remove weird usage of implicit operand on COPY For the purpose of the test it works as well to have a use after the copy itself.	2023-07-31 19:16:11 -04:00
Matt Arsenault	0aa439d502	AMDGPU/GlobalISel: Use SGPR results for G_AMDGPU_WAVE_ADDRESS	2023-07-31 19:16:11 -04:00
Tamir Duberstein	d542a56c1c	[BPF] Clean up SelLowering This patch contains a number of uncontroversial changes: - Replace all uses of `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with informative error strings. - Replace calls to `fail` in loops with at most one call per error instance. Previously a function with 19 arguments would log "too many args" 14 times. This was not helpful. - Change one `if (..) switch ...` to `if (..) { switch ...`. The added brace is consistent with a near-identical switch immediately above. - Elide one `SDValue` copy by using a reference rather than value. This is consistent with a variable declared immediately before it. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D156136	2023-08-01 00:31:12 +03:00

1 2 3 4 5 ...

49321 Commits