llvm-project

Author	SHA1	Message	Date
Yeaseen	96c723374a	[llvm] Remove `br i1 undef` from some `llvm/test/CodeGen` tests (#128272 )	2025-02-23 09:23:33 +00:00
Yingwei Zheng	dbd219aef4	[DAGCombiner][X86] Correctly clean up high bits in `combinei64TruncSrlAdd` (#128353 ) A counterexample for original implementation: https://alive2.llvm.org/ce/z/7ieYLg This patch uses zext instead of anyext to fix the original issue. BTW, we should keep low `64 - shamt` bits instead of `shamt - 32`: https://alive2.llvm.org/ce/z/ruQP_Z Some codes are simplified to avoid confusion. Proof: https://alive2.llvm.org/ce/z/z_jdHD Closes https://github.com/llvm/llvm-project/issues/128309.	2025-02-23 12:57:45 +08:00
Matt Arsenault	ccad5e7744	AMDGPU: Respect amdgpu-no-agpr in functions and with calls (#128147 ) Remove the MIR scan to detect whether AGPRs are used or not, and the special case for callable functions. This behavior was confusing, and not overridable. The amdgpu-no-agpr attribute was intended to avoid this imprecise heuristic for how many AGPRs to allocate. It was also too confusing to make this interact with the pending amdgpu-num-agpr replacement for amdgpu-no-agpr. Also adds an xfail-ish test where the register allocator asserts after allocation fails which I ran into. Future work should reintroduce a more refined MIR scan to estimate AGPR pressure for how to split AGPRs and VGPRs.	2025-02-23 09:00:37 +07:00
Vitaly Buka	50b0669e84	Revert "[X86] combineBROADCAST_LOAD - merge across chains" (#128380 ) Reverts llvm/llvm-project#128209 Introduces "AddressSanitizer: use-after-poison".	2025-02-22 16:15:41 -08:00
Craig Topper	bac6e7b651	[RISCV][VLOpt] Put vmclr/vmset back in the RISCVVPseudo table. (#128293 ) This allows them to be supported by the VLOptimizer.	2025-02-22 15:30:35 -08:00
Craig Topper	9e8d11d2df	[X86] Check that the type is integer before calling isUnsignedIntSetCC in combineExtSetcc. (#128263 ) SETULT can be an unsigned less than integer compare or a unordered less than FP compare. We need to check the VT to distinguish them. Fixes on of the issues from #128237.	2025-02-22 10:10:51 -08:00
Simon Pilgrim	e21a1737f3	[X86] combineBROADCAST_LOAD - merge across chains (#128209 ) Remove the restriction when reusing wider BROADCAST_LOAD nodes that both nodes couldn't have uses of their load chains - use makeEquivalentMemoryOrdering to merge the chains instead.	2025-02-22 15:59:25 +00:00
Phoebe Wang	fa64a210b8	[X86][FP16] Adding lowerings for FP16 ISD::LRINT and ISD::LLRINT (#127382 ) Address comment in #126477	2025-02-22 21:17:26 +08:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Yingwei Zheng	646e4f2eed	[DAGCombiner] visitFREEZE: Early exit when N is deleted (#128161 ) `N` may get merged with existing nodes inside the loop. Early exit when it is deleted to avoid the crash. Alternative solution: use `DAGNodeDeletedListener` to refresh the value of N. Closes https://github.com/llvm/llvm-project/issues/128143.	2025-02-22 12:06:34 +08:00
Yingwei Zheng	3ec83f5774	[X86][DAGCombiner] Fix assertion failure in `combinei64TruncSrlAdd` (#128194 ) Closes https://github.com/llvm/llvm-project/issues/128158.	2025-02-22 12:05:59 +08:00
Matt Arsenault	1bb43068f1	PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052 ) This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle composing subregister extracts through reg_sequence.	2025-02-22 09:16:14 +07:00
Alex MacLean	79261d4aab	[NVPTX][InferAS] assume alloca instructions are in local AS (#121710 )	2025-02-21 14:32:54 -08:00
Brox Chen	61c6e0061c	[AMDGPU][True16][CodeGen] flat/global/scratch load/store pseudo for true16 (#127945 ) T16D16 table is implemented in https://github.com/llvm/llvm-project/pull/127673 this is a follow up patch to add load/store pseudo for: flat_store global_load/global_store scratch_load/scratch_store in true16 mode and updated the codegen test file	2025-02-21 17:06:48 -05:00
Brox Chen	c896f7bdaa	[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904 ) build_vector pattern in true16 SDAG	2025-02-21 14:02:12 -05:00
Matt Arsenault	0c50054820	Revert "RegAlloc: Fix verifier error after failed allocation (#119690 )" This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5. Different set of verifier errors appears after other regalloc failure tests with EXPENSIVE_CHECKS.	2025-02-22 00:23:21 +07:00
Simon Pilgrim	bd034ab111	[X86] combineX86ShuffleChain - always combine to a new VPERMV node if the root shuffle was a VPERMV node (#128183 ) Similar to what we already do for VPERMV3 nodes - if we're trying to create a new unary variable shuffle and we started with a VPERMV node then always create a new one if it reduces the shuffle chain depth	2025-02-21 16:10:46 +00:00
zhijian lin	481e1eba3a	[NFC] add a pre-commit test case for patch #127121 that hoists xxsplitib out of loop (#127701 ) This is a pre-commit test case for patch https://github.com/llvm/llvm-project/pull/127121 that hoists xxsplitib out of loop	2025-02-21 10:29:52 -05:00
Matt Arsenault	34167f9966	RegAlloc: Fix verifier error after failed allocation (#119690 ) In some cases after reporting an allocation failure, this would fail the verifier. It picks the first allocatable register and assigns it, but didn't update the liveness appropriately. When VirtRegRewriter relied on the liveness to set kill flags, it would incorrectly add kill flags if there was another overlapping kill of the virtual register. We can't properly assign the register to an overlapping range, so break the liveness of the failing register (and any other interfering registers) instead. Give the virtual register dummy liveness by effectively deleting all the uses by setting them to undef. The edge case not tested here which I'm worried about is if the read of the register is a def of a subregister. I've been unable to come up with a test where this occurs. https://reviews.llvm.org/D122616	2025-02-21 22:11:51 +07:00
Simon Pilgrim	884b79a478	[X86] Relax vbroadcast(vector load X) -> vbroadcast_load(X) to all types (#128039 ) There's no need for a AVX1-only 32/64-bit scalar size limit - if the X86ISD::VBROADCAST node type is supported, X86ISD::VBROADCAST_LOAD will be as well.	2025-02-21 12:49:34 +00:00
Akshat Oke	bd16a87d05	[AMDGPU][NewPM] Port SIPostRABundler to NPM (#123717 )	2025-02-21 16:05:58 +05:30
João Gouveia	0a913b5e3a	[X86] Fold some (truncate (srl (add X, C1), C2)) patterns to (add (truncate (srl X, C2)), C1') (#126448 ) Addresses the poor codegen identified in #123239 and a few extra cases. This transformation is correct for `eq` (https://alive2.llvm.org/ce/z/qZhwtT), `ne` (https://alive2.llvm.org/ce/z/6gsmNz), `ult` (https://alive2.llvm.org/ce/z/xip_td) and `ugt` (https://alive2.llvm.org/ce/z/39XQkX). Fixes #123239	2025-02-21 17:17:09 +08:00
David Green	db9876760f	[AArch64][GlobalISel] Add some gisel test coverage for existing select tests. NFC	2025-02-21 09:15:41 +00:00
Sudharsan Veeravalli	6757cf4e6f	[RISCV] [MachineOutliner] Analyze all candidates (#127659 ) #117700 made a change from analyzing all the candidates to analyzing just the first candidate before deciding to either delete or keep all of them. Even though the candidates all have the same instructions, the basic blocks in which they are present are different and we will need to check each of them before deciding whether to keep or erase them. Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the register (x5 in this case) is available from the end of the MBB to the beginning of the candidate and not checking this for each candidate led to incorrect candidates being outlined resulting in correctness issues in a few downstream benchmarks. Similarly, deleting all the candidates if the first one is not viable will result in missed outlining opportunities.	2025-02-21 12:53:13 +05:30
Phoebe Wang	3302bef5b4	[X86] Combine FRINT + FP_TO_SINT to LRINT (#126477 ) Based on Craig's suggestion on #126217 Alive2: https://alive2.llvm.org/ce/z/9XNpWt	2025-02-21 14:44:08 +08:00
Matt Arsenault	cc46d00a86	AMDGPU: Form v2f16 minimum3/maximum3 on gfx950 (#128123 )	2025-02-21 12:11:51 +07:00
Matt Arsenault	e729dc759d	AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121 ) Unfortunately we only have the vector versions of v2f16 minimum3 and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y).	2025-02-21 12:08:49 +07:00
Pravin Jagtap	7c2ebe5dbb	AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464 ) The Src0 operand width higher that 32-bits of cvt_scale opcodes operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.	2025-02-21 07:27:25 +05:30
Alex MacLean	f83ef281b5	[NVPTX] Remove redundant addressing mode instrs (#128044 ) Remove load and store instructions which do not include an immediate, and just use the immediate variants in all cases. These variants will be emitted exactly the same when the immediate offset is 0. Removing the non-immediate versions allows for the removal of a lot of code and would make any MachineIR passes simpler.	2025-02-20 14:51:06 -08:00
Philip Reames	43f2968a02	[RISCV] Recognize VLA shift pairs from shuffle masks (#127710 ) If we have a shuffle mask which can be represented as two slides + some conditional masking, we can emit a VLA sequence which is at most O(2*LMUL). This is essentially a generalization of the existing isElementRotate, but is staged to only introduce the new match for the moment. A follow up change will start consolidating code - see the notes below. A couple of notes: 1) I'm excluding bit rotates mostly to keep the diffs manageable. 2) The existing isElementRotate logic is nearly redundant after this change. However, we have some intersection between the bit rotate and element rotate matching. To keep things simple, I left that in place for now, and will merge/cleanup in a separate change. 3) The individual asVSlideup and asVSlidedown are closely related, but the former looks through extracts and the later changes VL. I'm leaving these in place for now, but hope to common them up a bit as well.	2025-02-20 07:50:49 -08:00
Viktoria Maximova	9ffab5637c	[SPIR-V] Initial implementation of SPV_INTEL_long_composites (#126545 ) This change introduces support of `OpTypeStructContinuedINTEL` instruction. Specification: https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_long_composites.html	2025-02-20 16:09:06 +01:00
Simon Pilgrim	a03f064b60	[X86] combineX86ShufflesRecursively - peek through one use bitcasts to find additional (free) extract_subvector nodes	2025-02-20 13:49:49 +00:00
yingopq	0c809ea336	[Mips] Reserve hardware register HWR2 (#127775 ) Fix pr https://github.com/llvm/llvm-project/pull/127553. x86_64 failed to run readcyclecounter.ll when enable expensive_check, it would error "Using an undefined physical register".	2025-02-20 20:53:30 +08:00
Piotr Fusik	0a8341fdb2	[RISCV] Avoid VMNOT by swapping VMERGE operands for mask extensions (#126751 ) Fold: (select (not m), 1, 0) -> (select m, 0, 1) (select (not m), -1, 0) -> (select m, 0, -1)	2025-02-20 13:53:21 +01:00
David Green	70ed381b16	[GlobalISel][AArch64] Fix fptoi.sat lowering. (#127901 ) The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select it appears the order of the operands was chosen badly. This switches the conditions used to keep the constant on the RHS.	2025-02-20 12:22:11 +00:00
Akshat Oke	9855d761f3	[AMDGPU][NewPM] Port SIOptimizeExecMaskingPreRA to NPM (#125351 )	2025-02-20 17:35:56 +05:30
Simon Pilgrim	505d35aad3	[X86] getFauxShuffleMask - relax one use limit for insert_subvector concat splat pattern (#127981 ) If we're splatting a subvector using a insert_subvector(insert_subvector(undef,sub,0),sub,c) pattern then permit multiuse of the sub as long as the insert_subvector nodes are the only users.	2025-02-20 12:04:41 +00:00
Simon Pilgrim	92a3192a96	[X86] vector-shuffle-v192.ll - regenerate VPTERNLOG comments	2025-02-20 11:58:45 +00:00
Simon Pilgrim	66cf2a88a4	[X86] sext-vsetcc.ll - regenerate VPTERNLOG comments	2025-02-20 11:58:45 +00:00
Piotr Fusik	9787240912	[RISCV][test] Add tests for extending negated mask	2025-02-20 11:37:03 +01:00
Dmitry Sidorov	55fa2fa348	[SPIR-V] Add SPV_INTEL_bindless_images extension (#127737 ) Adds instructions to convert convert unsigned integer handles to images, samplers and sampled images. Spec: https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_bindless_images.asciidoc --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>	2025-02-20 10:27:15 +01:00
Simon Pilgrim	62d77fcb3c	[X86] combineX86ShuffleChain - don't combine to VPERM2W/VPERM2B from just any single variable mask (#127914 ) Despite them being more expensive than other variable mask shuffles, we were combining shuffle chains to VPERM2W/VPERM2B if any shuffle in the chain was a variable shuffle - including very cheap shuffles like PSHUFB or AND mask patterns. This patch adjusts the BWI VPERMV3 threshold - it still always permits the merge if the chain (of 2 or more shuffles) contains any X86ISD::VPERMV/VPERMV3 shuffles (including DQ variants), but otherwise only reduces the depth threshold based off the number of other variable shuffles we'd fold away.	2025-02-20 09:11:29 +00:00
Diana Picus	611a648327	[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190 ) Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to explicitly preserve the inactive lanes of VGPRs of interest by adding them as dummy arguments. The code usually looks something like this: ``` define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) { entry: %c = call i1 @llvm.amdgcn.init.whole.wave() br i1 %c, label %shader, label %tail shader: [...] tail: %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader] [...] ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call ``` Unfortunately, this kind of phi node will get optimized away and the backend won't be able to figure out that it's ok to use the active lanes of `%inactive.vgpr*` inside `shader`. This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic, whose result can be used as a PHI operand instead of the poison. This will be selected to an IMPLICIT_DEF, which the backend can work with. At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values. Support for other types can be added later if needed.	2025-02-20 09:25:48 +01:00
Luke Lau	df96b56b9f	[RISCV] Move VMV0 elimination past machine SSA opts (#126850 ) This is the follow up to #125026 that keeps mask operands in virtual register form for as long as possible throughout the backend. The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer kicking in. The invariant that the mask COPY never has a subreg no longer holds after MachineCSE (it coalesces some copies), so it needed to be relaxed.	2025-02-20 12:41:05 +08:00
Luke Lau	c58011dc65	[RISCV][VLOPT] Peek through copies in checkUsers (#127656 ) Currently if a user of an instruction isn't a vector pseudo we bail. For simple non-subreg virtual COPYs, we can peek through their uses by using a worklist. This is extracted from a loop in TSVC2 (s273) that contains a fcmp + select, which produces a copy that doesn't seem to be coalesced away.	2025-02-20 12:01:06 +08:00
Matt Arsenault	37c341df28	Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 )" This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed. This is not a sound approach to dealing with this instruction change. The new behavior is a different opcode pair, not a modifier on the existing opcode.	2025-02-20 10:19:14 +07:00
Benjamin Maxwell	f178e51747	[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895 ) Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380 (`test_modf_ppcf128` is the test case that needed the additional legalization)	2025-02-20 09:50:16 +07:00
Craig Topper	b0e24d17f2	[RISCV] Use opaque pointers in some tests. NFC (#127906 )	2025-02-19 15:16:09 -08:00
David Tellenbach	0fe0968c93	[AArch64][FEAT_CMPBR] Codegen for Armv9.6-a compare-and-branch (#116465 ) This patch adds codegen for all Arm9.6-a compare-and-branch instructions, that operate on full w or x registers. The instruction variants operating on half-words (cbh) and bytes (cbb) are added in a subsequent patch. Since CB doesn't use standard 4-bit Arm condition codes but a reduced set of conditions, encoded in 3 bits, some conditions are expressed by modifying operands, namely incrementing or decrementing immediate operands and swapping register operands. To invert a CB instruction it's therefore not enough to just modify the condition code which doesn't play particularly well with how the backend is currently organized. We therefore introduce a number of pseudos which operate on the standard 4-bit condition codes and lower them late during codegen.	2025-02-19 13:58:20 -08:00
Craig Topper	26e375046d	Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (#127151 )" Tests have been re-generated with recent scheduler changes. Original message: SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.	2025-02-19 12:11:00 -08:00

1 2 3 4 5 ...

57510 Commits