llvm-project

Author	SHA1	Message	Date
Igor Kirillov	c15557d64e	[CodeGen] Extend ComplexDeinterleaving pass to recognise patterns using integer types AArch64 introduced CMLA and CADD instructions as part of SVE2. This change allows to generate such instructions when this architecture feature is available. Differential Revision: https://reviews.llvm.org/D153808	2023-07-19 11:01:19 +00:00
Simon Pilgrim	98b0f1360d	[DAG] hoistLogicOpWithSameOpcodeHands - add support for SIGN_EXTEND_INREG nodes. This can reuse the existing *_EXTEND node handling (with special handling for the valuetype arg)	2023-07-19 11:56:32 +01:00
Ivan Kosarev	1b32427213	[AMDGPU] Combine the SDAG and GISel versions of the fmed3.ll test. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D155590	2023-07-19 11:42:36 +01:00
Simon Pilgrim	2167ae93c9	[DAG] hoistLogicOpWithSameOpcodeHands - add support for _EXTEND_VECTOR_INREG nodes. This can reuse the existing _EXTEND node handling.	2023-07-19 10:50:23 +01:00
Jay Foad	7fa7a08f21	[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) Differential Revision: https://reviews.llvm.org/D155681	2023-07-19 10:33:11 +01:00
Jay Foad	496766840f	[ARM] Add a regression test for D154281 This is a reduced version of one of the tests that was broken by the original commit of D154281 "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.". Differential Revision: https://reviews.llvm.org/D155471	2023-07-19 10:32:21 +01:00
Jingu Kang	62ed3ff4bb	Revert "[MachineLICM] Handle Subloops" This reverts commit 33e60484d750291e99301e29e60fe72c8fa48ccd.	2023-07-19 10:30:50 +01:00
Simon Pilgrim	5bc836422e	[X86] LowerEXTEND_VECTOR_INREG - add sign_extend_vector_inreg fast path for all-signbits source values If the source operand is already all-signbits we don't need to create the sign extended elements - just splat the source element to the destination element width	2023-07-19 10:13:08 +01:00
Jay Foad	4389f9b2ad	[AMDGPU] Regenerate is.fpclass checks	2023-07-19 09:32:22 +01:00
Martin Storsjö	20b7584455	Reland [AArch64] Fix an immediate out of range for large realignments on Windows Also add a missing FrameSetup flag on the existing add instruction. This fixes https://github.com/llvm/llvm-project/issues/63701. Since the previous iteration, change ADDXrr to ADDXrx64, which works with this use of SP. Differential Revision: https://reviews.llvm.org/D155447	2023-07-19 11:19:04 +03:00
Patryk Wychowaniec	4e831753b9	[AVR] Expand shifts of all types except int8 and int16 Currently our AVRShiftExpand pass expands only 32-bit shifts, with the assumption that other kinds of shifts (e.g. 64-bit ones) are automatically reduced to 8-bit ones by LLVM during ISel. However this is not always true and causes problems in the rust-lang runtime. This commit changes the logic a bit, so that instead of expanding only 32-bit shifts, we expand shifts of all types except 8-bit and 16-bit. This is not the most optimal solution, because 64-bit shifts can be expanded to 32-bit shifts which has been deeply optimized. I've checked the generated code using rustc + simavr, and all shifts seem to behave correctly. Spotted in the wild in rustc: https://github.com/rust-lang/compiler-builtins/issues/523 https://github.com/rust-lang/rust/issues/112140 Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D154785	2023-07-19 11:57:00 +08:00
Jianjian GUAN	eb33db4f91	[AVR] Enable verifyInstructionPredicates for AVR This patch fixes the failed test of verifyInstructionPredicates which is caused by verifyInstructionPredicates. verifyInstructionPredicates will add JMPk without checking the target predicate. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D155570	2023-07-19 11:43:39 +08:00
WANG Xuerui	f27017a063	[LoongArch] Align functions and loops better according to uarch The LA464 micro-architecture is very sensitive to alignment of hot code, with performance variation of up to ~12% in the go1 benchmark suite of the Go language (as observed by me during my work on the Go loong64 port). [[ https://go.dev/cl/479816 \| Manual alignment of certain loops ]] and [[ https://go.dev/cl/479817 \| automatic alignment of loop heads ]] helps a lot there, by reducing much of the random variation and generally increasing performance, so we naturally want to do the same here. Practically, LA464 is the only LoongArch micro-architecture in wide use, and we are currently supporting just that. The first "4" in "LA464" stands for "4-issue", in particular its instruction fetch and decode stages are 4-wide; so functions and branch targets should be preferably aligned to at least 16 bytes for best throughput. The Loongson team has benchmarked various combinations of function, loop, and branch target alignments with GCC. [[ https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619980.html \| The results ]] show that "16-byte label alignment together with 32-byte function alignment gives best results in terms of SPEC score". A "label" in GCC means a branch target; while we don't currently align branch targets, we do align loops, so in this patch we default to 32-byte function alignment and 16-byte loop alignment. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D148622	2023-07-19 11:25:57 +08:00
eopXD	32c257d384	[RISCV] Use the stack for MVT::f16 for fastcc when there are no other registers available In D155502, we added code for the compiler to check GPR-s for f16 under zhinx. This commit adds code to hit the stack when we run out of GPR-s. With this patch and D155502, resolves #63922 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155507	2023-07-18 19:49:17 -07:00
Daniil Suchkov	6fa66acfad	[AArch64] NFC. Add a test exposing a bug in FixupStatepointCallerSaved pass This pass doesn't take register classes into account, so it ends up trying to spill a non-GP register onto stack which is not correct.	2023-07-18 13:53:28 -07:00
David Green	74c0bdff7d	[AArch64][GISel] Additional FPTrunc vector lowering I was attempting to add llvm.reduce.fminimum/fmaximum support for GlobalISel. In the process I noticed that llvm.reduce.fmin/fmax was missing, and could do with being added first. That led on to adding additional vector support for minnum/maxnum, which in turn led to needing to handle fptrunc and fpext for some of the fp16 types. So this patch extends the vector handling for fptrunc, adding support for f16 types which are clamped to 4 elements, and scalarizing the rest. I went round in circles a little with how smaller than legal vectors should be handled, but this seems simple and seems to work, if not always optimally yet. Differential Revision: https://reviews.llvm.org/D155311	2023-07-18 18:52:19 +01:00
Craig Topper	ea3683e98f	[RISCV] Improve type promotion for i32 clmulr/clmulh on RV64. Instead of zero extending the inputs by masking. We can shift them left instead. This is cheaper when we don't zext.w instruction. This does make the case where the inputs are already zero extended or freely zero extendable worse though. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D155530	2023-07-18 10:39:25 -07:00
Simon Pilgrim	d7eb9240c0	[DAG] SimplifyDemandedBits - attempt to use SimplifyMultipleUseDemandedBits for bitcasts from larger element types Attempt to avoid multi-use ops if the bitcast doesn't need anything from them.	2023-07-18 18:38:03 +01:00
Craig Topper	0c055286b2	[RISCV] Use RISCVISD::CZERO_EQZ/CZERO_NEZ for XVentanaCondOps. This makes Zicond and XVentanaCondOps use the same code path. The instructions have identical semantics. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D155391	2023-07-18 10:18:02 -07:00
eopXD	ca72457346	[RISCV] Add test coverage for peephole vmerge optimization of unmasked rvv instruction with a rounding mode (NFC) No functional change intended. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D155550	2023-07-18 10:03:58 -07:00
eopXD	eb89bf8d0d	[RISCV] Do not use FPR registers for fastcc if zfh/f/d is not specified in the architecture Resolves #63917. Also lets the compiler check for available GPR before hitting the stack. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D155502	2023-07-18 10:03:04 -07:00
Craig Topper	cdee88a2e0	[RISCV] Add isMoveReg to vmv1r/vmv2r/vmv4r/vmv8r.v. This allows TII isCopyInstrImpl to consider them copies. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D155140	2023-07-18 09:49:23 -07:00
Craig Topper	7767297b58	[RISCV] Test for D155140. NFC The vmv1r.v v8, v9 in the last block can be removed by late copy propagation. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D155527	2023-07-18 09:49:23 -07:00
Zhuojia Shen	b0093e13fc	[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre This patch optimizes a pair of LDRSWpre and LDRSWui (or LDURSWi) instructions into a single LDPSWpre instruction. This is a missing case in D99272. MIR test cases in D152564 are updated to verify the optimization. Differential Revision: https://reviews.llvm.org/D152407	2023-07-18 09:46:47 -07:00
Zhuojia Shen	94f76004d5	[AArch64] Add tests for merging LDRSWpre-LDR pairs This patch adds MIR test cases that test merging an LDRSWpre-LDR instruction pair into an LDPSWpre instruction. This optimization is currently missing and will be added a subsequent patch (D152407), so all test cases are no merge for now. Differential Revision: https://reviews.llvm.org/D152564	2023-07-18 09:46:46 -07:00
Simon Pilgrim	3ad4f92f83	[DAG] More aggressively (extract_vector_elt (build_vector x, y), c) iff element is zero constant We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us). This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily. Differential Revision: https://reviews.llvm.org/D155582	2023-07-18 17:31:34 +01:00
Simon Pilgrim	b8bda50932	[Sparc] Regenerate float-constants.ll test checks	2023-07-18 17:31:34 +01:00
Dinar Temirbulatov	fe22b9050c	[AArch64] Regenerate a couple of vector-shuffle tests. NFC As a request in https://reviews.llvm.org/D152205	2023-07-18 16:04:15 +00:00
Martin Storsjö	793a349e6f	Revert "[AArch64] Fix an immediate out of range for large realignments on Windows" This reverts commit b1d0bc0f4395c69097bc11b6ba8f821f621272a9. Builds with expensive checks show that 'sp' isn't a valid register in ADDXrr - an object file built without exprnsive checks enabled disassembles as "add x15, xzr, x16", instead of the intended "add x15, sp, x16".	2023-07-18 18:21:23 +03:00
John Brawn	343e204a52	[ARM] Replace TransferImpOps with copyImplicitOps In most places where TransferImpOps is currently used we just have one machine instruction, so it's doing the same thing as copyImplicitOps anyway. In those cases where we have more than one machine instruction the destination is written to in each instruction so any implicit defs should appear on all of them (and we shouldn't see any implicit refs as these pseudo-instruction don't have any register inputs), meaning the current use of TransferImpOps is incorrect and we should be using copyImplicitOps on all of the generated instructions. Differential Revision: https://reviews.llvm.org/D155301	2023-07-18 14:01:04 +01:00
Martin Storsjö	b1d0bc0f43	[AArch64] Fix an immediate out of range for large realignments on Windows Also add a missing FrameSetup flag on the existing add instruction. This fixes https://github.com/llvm/llvm-project/issues/63701. Differential Revision: https://reviews.llvm.org/D155447	2023-07-18 15:56:36 +03:00
Luke Lau	5eb7191421	[RISCV] Add VP patterns for vandn.[vv,vx] This builds upon D155433 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155434	2023-07-18 13:53:17 +01:00
Luke Lau	26ff4c6745	[RISCV] Add SDNode patterns for vandn.[vv,vx] Unfortunately we can't use the standard splat_vector and vnot PatFrags because they are preprocessed to vmv.v.x's, so we need to define helpers to catch those. We can't use SplatPat either because we need to nest another fragment inside of it. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155433	2023-07-18 13:53:14 +01:00
Matt Arsenault	c28e09c8d1	AMDGPU: Preserve flags in fdiv_fast lowering We were dropping the flags and thus blocking contract into potential fadd users. GlobalISel was already preserving the flags here. https://reviews.llvm.org/D155443	2023-07-18 06:57:07 -04:00
Matt Arsenault	4a81283b94	AMDGPU: Generate and add fdiv tests Prepare for new lowering strategies because we somehow didn't have enough of them already.	2023-07-18 06:38:05 -04:00
Matt Arsenault	cdfdfe7ccc	AMDGPU: Add some additional rcp/rsq tests	2023-07-18 06:37:15 -04:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Matt Arsenault	3f8ef57bed	MachineSink: Fix sinking VGPR def out of a divergent loop This fixes sinking a VGPR def out of a loop past the reconvergence point at the SI_END_CF. There was a prior fix which introduced blockPrologueInterferes (D121277) to fix the same basic problem for the post RA sink. This also had the special case isIgnorableUse case which was incorrect, because in some contexts the exec use is not ignorable. I'm thinking about a new way to represent this which will avoid needing hasIgnorableUse and isBasicBlockPrologue, which would function more like the exception handling. Fixes: SWDEV-407790 https://reviews.llvm.org/D155343	2023-07-18 06:15:50 -04:00
Matt Arsenault	d5ab379506	AMDGPU: Add baseline test for broken machine sinking	2023-07-18 06:15:50 -04:00
David Green	4214f15660	[AArch64] Regenerate a couple of mir GlobalISel tests. NFC See D155311	2023-07-18 08:28:27 +01:00
LiaoChunyu	65ffcc099c	[RISCV] Lower VP_CTLZ_ZERO_UNDEF/VP_CTTZ_ZERO_UNDEF/VP_CTLZ by converting to FP and extracting the exponent. D111904, D141585 made RISC-V customized lower vector ISD::CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF/CTLZ by converting to float and using the float result. Perhaps VP_CTLZ_ZERO_UNDEF/VP_CTTZ_ZERO_UNDEF/VP_CTLZ could use the similar feature. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155150	2023-07-18 15:25:59 +08:00
Konstantina Mitropoulou	4c42ab1199	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns CMP(A,C)\|\|CMP(B,C) => CMP(MIN/MAX(A,B), C) CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C) This first patch handles integer types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153502	2023-07-17 17:13:47 -07:00
Konstantina Mitropoulou	11cd92a70f	[NFC] Tests for future commit in DAGCombiner Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153479	2023-07-17 17:08:32 -07:00
Matt Arsenault	04185f0b0b	AMDGPU: Fix broken denormal constant folding of canonicalize This needs to consider the dynamic denormal mode. It should be possible to implement a runtime DAZ check with a canonicalize.	2023-07-17 19:54:20 -04:00
Matt Arsenault	bd2dca0f74	AMDGPU: Use hex floats instead of ugly bitcasting	2023-07-17 19:54:04 -04:00
Matt Arsenault	467df9c591	AMDGPU: Split and convert some rcp and rsq tests to generated checks	2023-07-17 17:34:29 -04:00
Matt Arsenault	296e24cd2e	DAG: Constant fold frexp nodes Special casing the nonfinite exponent value everywhere is kind of annoying.	2023-07-17 17:34:29 -04:00
Amy Kwan	8e0e442c1d	[AIX][TLS] Account for local-exec accesses in XCOFFObjectWriter This is a follow up to D149722 and aims to address https://github.com/llvm/llvm-project/issues/63885. Local-exec accesses were not previously accounted for in XCOFFObjectWriter. Specifically, the R_TLS_LE relocation was not previously handled, which lead to the incorrect value being written for the relocation target. Within this patch, the value being written is set to the symbol's virtual address and extra relocation tests are added. Differential Revision: https://reviews.llvm.org/D155415	2023-07-17 12:15:44 -05:00
Simon Pilgrim	78be5aebaa	[X86] Regenerate tail-call-casts.ll test coverage	2023-07-17 17:58:37 +01:00
Craig Topper	a64b3e92c7	[RISCV] Re-define sha256, Zksed, and Zksh intrinsics to use i32 types. Previously we returned i32 on RV32 and i64 on RV64. The instructions only consume 32 bits and only produce 32 bits. For RV64, the result is sign extended to 64 bits like *W instructions. This patch removes this detail from the interface to improve portability and consistency. This matches the proposal for scalar intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44 I've included IR autoupgrade support as well. I'll be doing this for other builtins/intrinsics that currently use 'long' in other patches. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D154647	2023-07-17 08:58:29 -07:00

... 74 75 76 77 78 ...

52796 Commits