llvm-project

Author	SHA1	Message	Date
Craig Topper	fdb87640ee	[LSR][TTI][RISCV] Disable terminator folding for RISC-V. This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe. It caused a miscompile in downstream testing. Spoke with Philip offline. We believe the issue is that LSR needs to make sure the Step of the other AddRec is non-zero. Reverting until Philip is back from vacation.	2023-12-27 15:13:32 -08:00
David Green	38c9390b59	[AArch64] Add an extra test for #75822 . NFC	2023-12-27 10:40:46 +00:00
Shao-Ce SUN	9f6bf00b25	[DAGCombine] Add DAG optimisation for BF16_TO_FP (#69426 ) fold bf16_to_fp(op & 0xffff) -> bf16_to_fp(op)	2023-12-27 17:20:54 +08:00
Min-Yih Hsu	2476e2a911	[M68k] Optimize for overflow arithmetics that will never overflow We lower overflow arithmetics to its M68kISD counterparts that produce results of {i16/i32, i8} in which the second resut represents CCR. In the event where we're certain there won't be an overflow, for instance 8 & 16-bit multiplications, we simply use zero in replacement of the second result. This patch replaces M68kISD::CMOV that takes this kind of zero or all-ones CCR as condition value with its corresponding operand value.	2023-12-26 20:55:23 -08:00
Min-Yih Hsu	6f85075ff7	[M68k] U/SMULd32d16 are not supposed to be communitive M68k only has 16-bit x 16-bit -> 32-bit variant for multiplications taking 16-bit operands. We still define two input operands for this class of instructions, and tie the first operand to the result value. The problem is that the two operands have different register classes (DR32 and DR16) hence making these instructions communitive produces invalid MachineInstr (though the final assembly will still be correct).	2023-12-26 20:55:22 -08:00
Freddy Ye	8ddb0fcff9	[X86] Correct operand order of UWRMSR. (#76389 )	2023-12-27 09:01:55 +08:00
Min-Yih Hsu	b80e1acc8c	[M68k] Improve codegen of overflow arithmetics The codegen logic for overflow arithmetics (e.g. llvm.uadd.overflow) was a mess; overflow multiplications were not even supported. This patch clean up the legalization of overflow arithmetics and add supports for common variants of overflow multiplications.	2023-12-26 11:08:11 -08:00
Ivan Kosarev	d51e06c73c	[AMDGPU][True16] Fix the VGPR register class for 16-bit values. (#76170 )	2023-12-26 11:34:16 +00:00
Jivan Hakobyan	1d76692cf8	[RISCV][MC] Add support for experimental Zimop extension (#75182 ) This implements experimental support for the Zimop extension as specified here: https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc. This change adds only assembly support. --------- Co-authored-by: ln8-8 <lyut.nersisyan@gmail.com> Co-authored-by: ln8-8 <73429801+ln8-8@users.noreply.github.com>	2023-12-26 17:21:38 +08:00
Vettel	dc1fadef23	[MCP] Enhance MCP copy Instruction removal for special case(reapply) (#74239 ) Machine Copy Propagation Pass may lose some opportunities to further remove the redundant copy instructions during the ForwardCopyPropagateBlock procedure. When we Clobber a "Def" register, we also need to remove the record from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics of the ClobberRegister function. This patch reapplies #70778 and addresses the corner case bug #73512 specific to the AMDGPU backend. Additionally, it refines the criteria for removing empty records from the copy maps, thereby enhancing overall safety. For more information, please see the C++ test case generated code in "vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.	2023-12-26 16:22:42 +08:00
Brandon Wu	64e63888dd	Recommit [RISCV] Update the interface of sifive vqmaccqoq (#74284 ) (#75768 ) The spec(https://sifive.cdn.prismic.io/sifive/60d5a660-3af0-49a3-a904-d2bbb1a21517_int8-matmul-spec.pdf) is updated.	2023-12-26 12:59:00 +08:00
Kai Luo	5cfc7b3342	[PowerPC] Add test after #75271 on PPC. NFC. (#75616 ) Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after coalescing on PPC. The case is reduced from failure in #75570. The failure is triggered after #75271 .	2023-12-26 00:21:56 +08:00
Acim Maravic	48f36c6e74	[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158 ) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2023-12-25 11:55:20 +01:00
Yeting Kuo	af837d44c7	[RISCV][DAG] Teach computeKnownBits consider SEW/LMUL/AVL for vsetvli. (#76158 ) This patch also add tests whose masks are too narrow to combine. I think it can help us to find out bugs caused by too large known bits.	2023-12-25 11:18:22 +08:00
Jim Lin	34727b01eb	[RISCV] Remove +experimental-zfbfmin from the testcases for Zvfbfmin intrinsics. NFC. (#76317 ) Zvfbfmin doesn't need Zfbfmin also enabled.	2023-12-25 10:04:55 +08:00
Momchil Velikov	4b6968952e	[AArch64] Implement spill/fill of predicate pair register classes (#76068 ) We are getting ICE with, e.g. ``` #include <arm_sve.h> void g(); svboolx2_t f0(int64_t i, int64_t n) { svboolx2_t r = svwhilelt_b16_x2(i, n); g(); return r; } ```	2023-12-22 15:54:12 +00:00
David Green	48b9106656	[AArch64] Add an strict fp reduction test. NFC	2023-12-22 13:25:00 +00:00
Matt Arsenault	f7c3627338	DAG: Implement promotion for strict_fpextend (#74310 ) Test is a placeholder, will be merged into the existing test after additional bug fixes for illegal f16 targets are fixed.	2023-12-22 17:15:52 +07:00
Matt Arsenault	0e46b49de4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86. PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667	2023-12-22 16:46:22 +07:00
wangpc	59eebb40fb	[RISCV] Fix macro-fusions.mir	2023-12-22 14:49:59 +08:00
Wang Pengcheng	f9c908862a	[RISCV] Split TuneShiftedZExtFusion (#76032 ) We split `TuneShiftedZExtFusion` into three fusions to make them reusable and match the GCC implementation[1]. The zexth/zextw fusions can be reused by XiangShan[2] and other commercial processors, but shifted zero extension is not so common. `macro-fusions-veyron-v1.mir` is renamed so it's not relevant to specific processor. References: [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html [2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode	2023-12-22 14:37:26 +08:00
Matt Arsenault	c7952d8860	AMDGPU: Add a few more bfloat codegen tests	2023-12-22 12:31:42 +07:00
Matt Arsenault	50ed3b1ecc	AMDGPU: Workaround a divergent return value bug in test	2023-12-22 12:31:42 +07:00
Vitaly Buka	0ccc1e7acd	Revert "[AArch64] Fold more load.x into load.i with large offset" Issue #76202 This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.	2023-12-21 21:12:40 -08:00
Jonas Paulsson	74a09bd1ec	[SystemZ] Test improvements for atomic load/store instructions (NFC). (#75630 ) Improve tests for atomic loads and stores, mainly by testing 128-bit atomic load and store instructions both with and w/out natural alignment.	2023-12-21 20:48:00 +01:00
Arthur Eubanks	7433b1ca3e	Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381 )" This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20. Now that explicit large globals are handled properly in the small code model.	2023-12-21 10:51:30 -08:00
Arthur Eubanks	2366d53d8d	[X86] Fix more medium code model addressing modes (#75641 ) By looking at whether a global is large instead of looking at the code model. This also fixes references to large data in the small code model. We now always fold any 32-bit offset into the addressing mode with the large code model since it uses 64-bit relocations.	2023-12-21 10:40:56 -08:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
David Li	f44079db22	[ISel] Add pattern matching for depositing subreg value (#75978 ) Depositing value into the lowest byte/word is a common code pattern. This patch improves the code generation for it to avoid redundant AND and OR operations.	2023-12-21 10:18:57 -08:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Tomas Matheson	5992ce90b8	[AArch64] Codegen support for FEAT_PAuthLR - Adds a new +pc option to -mbranch-protection that will enable the use of PC as a diversifier in PAC branch protection code. - When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) are used. Documentation for the relevant instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/ Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-21 14:18:33 +00:00
stephenpeckham	7026086073	[XCOFF] Use RLDs to print branches even without -r (#74342 ) This presents misleading and confusing output. If you have a function defined at the beginning of an XCOFF object file, and you have a function call to an external function, the function call disassembles as a branch to the local function. That is, `void f() { f(); g();}` disassembles as >00000000 <.f>: 0: 7c 08 02 a6 mflr 0 4: 94 21 ff c0 stwu 1, -64(1) 8: 90 01 00 48 stw 0, 72(1) c: 4b ff ff f5 bl 0x0 <.f> 10: 4b ff ff f1 bl 0x0 <.f> With this PR, the second call will display: `10: 4b ff ff f1 bl 0x0 <.g> ` Using -r can help, but you still get the confusing output: >10: 4b ff ff f1 bl 0x0 <.f> 00000010: R_RBR .g	2023-12-21 08:17:32 -06:00
Paschalis Mpeis	2e3d77d6ed	[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642 ) [TLI] Pass replace-with-veclib works with Scalable Vectors. The pass is heavily refactored. It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors. Improve tests for ArmPL and SLEEF Intrinsics: - Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines. - Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]` - Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.	2023-12-21 12:37:57 +00:00
zhongyunde 00443407	f568763641	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2023-12-21 18:54:15 +08:00
zhongyunde 00443407	32878c2065	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917	2023-12-21 18:54:14 +08:00
zhongyunde 00443407	4bad0cb359	[AArch64] Precommit tests for PR75343, NFC	2023-12-21 18:54:14 +08:00
David Green	c0931d4950	[AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT for each lane, allowing all the existing tablegen patterns to apply to them. A couple of tablegen patterns need to be altered to make sure the type of the constant operand is known, so that the patterns are recognized under global isel. Closes #75662	2023-12-21 09:22:23 +00:00
Yeting Kuo	9b561ca044	[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020 ) Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.	2023-12-21 15:03:36 +08:00
Brandon Wu	b3769adbc5	[RISCV] Fix wrong lmul for sf_vfnrclip (#76016 )	2023-12-21 13:24:26 +08:00
Florian Hahn	b1a5ee1feb	[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527 ) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to #73553	2023-12-20 16:56:15 +01:00
Simon Pilgrim	6ec350b483	[X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses Avoid generating extra constant vectors	2023-12-20 15:22:48 +00:00
Hassnaa Hamdi	f3dcc0cba9	[LLVM][AArch64][tblgen]: Match clamp pattern (#75529 ) Add isel pattern to replase min(max(v1,v2),v3) by clamp Add tests for uclamp, sclamp, bfclamp, fclamp.	2023-12-20 14:36:58 +00:00
Matt Arsenault	b01adc6bed	AMDGPU: Strengthen some bfloat tests Fix bitcast test, which was splitting apart phis intended to force bitcasts that survive all the way to selection. Disable the amdgpu-codegenprepare phi splitting, which defeats the technique of using a phi to ensure a bitcast reaches all the way to selection. Also add a variety of bfloat tests. These probably need revisiting to avoid the cast folding into argument loads. Also round out set of bfloat bitcast and ABI tests. Add codegen tests for more bf16 operations The promotion of these works contrary to the comment.	2023-12-20 19:33:45 +07:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Nikita Popov	bbe6c81f80	[RISCV] Add missing REQUIRES asserts to test (NFC)	2023-12-20 09:42:14 +01:00
Yeting Kuo	b7376c3196	[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014 ) performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat (frint X)).	2023-12-20 14:56:28 +08:00
Mariusz Sikora	9a41a80e76	[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744 ) Attached test will cause crash without this change. We should not remove isAssumeLikeIntrinsic instruction if it is used by other instruction.	2023-12-20 07:47:49 +01:00
Brandon Wu	fb51aae702	[RISCV] Add missing lmul info for SiFive extensions (#76006 )	2023-12-20 14:42:47 +08:00

... 28 29 30 31 32 ...

52796 Commits