llvm-project

Author	SHA1	Message	Date
Daniel Hoekwater	866ae69cfa	[AArch64] [BranchRelaxation] Optimize for hot code size in AArch64 branch relaxation On AArch64, it is safe to let the linker handle relaxation of unconditional branches; in most cases, the destination is within range, and the linker doesn't need to do anything. If the linker does insert fixup code, it clobbers the x16 inter-procedural register, so x16 must be available across the branch before linking. If x16 isn't available, but some other register is, we can relax the branch either by spilling x16 OR using the free register for a manually-inserted indirect branch. This patch builds on D145211. While that patch is for correctness, this one is for performance of the common case. As noted in https://reviews.llvm.org/D145211#4537173, we can trust the linker to relax cross-section unconditional branches across which x16 is available. Programs that use machine function splitting care most about the performance of hot code at the expense of the performance of cold code, so we prioritize minimizing hot code size. Here's a breakdown of the cases: Hot -> Cold [x16 is free across the branch] Do nothing; let the linker relax the branch. Cold -> Hot [x16 is free across the branch] Do nothing; let the linker relax the branch. Hot -> Cold [x16 used across the branch, but there is a free register] Spill x16; let the linker relax the branch. Spilling requires fewer instructions than manually inserting an indirect branch. Cold -> Hot [x16 used across the branch, but there is a free register] Manually insert an indirect branch. Spilling would require adding a restore block in the hot section. Hot -> Cold [No free regs] Spill x16; let the linker relax the branch. Cold -> Hot [No free regs] Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch. Ex: [Hot section] func.hot: ... hot code... func.restore: ... restore x16 ... B func.hot [Cold section] func.cold: ... spill x16 ... B func.restore Putting the restore block at the end of the function instead of just before the destination increases the cost of executing the store, but it avoids putting cold code in the middle of hot code. Since the restore is very rarely taken, this is a worthwhile tradeoff. Differential Revision: https://reviews.llvm.org/D156767	2023-09-06 20:44:40 +00:00
Florian Mayer	42a1d16179	Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 )" This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc. Broke ASAN bot.	2023-09-06 13:16:55 -07:00
Vladislav Dzhidzhoev	c39edd7b53	[AArch64][GlobalISel] Regenerate prelegalizercombiner-shuffle-vector.mir	2023-09-06 18:38:13 +02:00
Craig Topper	bb810d8fa0	[RISCV] Disable machine verifier in gisel-commandline-option.ll. NFC Hopefully this fixes the expensive checks build.	2023-09-06 09:32:32 -07:00
Simon Pilgrim	e4d0e12099	[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2) (REAPPLIED) Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold. This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well. Alive2: https://alive2.llvm.org/ce/z/2UpSbJ Differential Revision: https://reviews.llvm.org/D159198	2023-09-06 13:19:42 +01:00
Jay Foad	11171d81ae	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one.	2023-09-06 12:51:44 +01:00
Luke Lau	74f985b793	[RISCV] Remove -riscv-v-vector-bits-min in tests. NFC (#65404 ) V implies Zvl128b, but a lot of the fixed vector tests also redundantly specify -riscv-v-vector-bits-min=128. This patch removes them where there isn't another minimum vlen being tested for, and for cases where Zve* is being used Zvl128b was added to maintain the old test diff (and because an awkward vlen probably isn't interesting to test for). Other places where -risc-v-vector-bits-min were being used were replaced with Zvl.	2023-09-06 10:43:41 +01:00
Dmitri Gribenko	97bf104d97	Revert "[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)" This reverts commit b027ce0ab93060bc6cb79d5402d21520e8b93fb7. This commit breaks Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll.	2023-09-06 11:28:55 +02:00
Simon Pilgrim	b027ce0ab9	[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2) Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold. This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well. Alive2: https://alive2.llvm.org/ce/z/2UpSbJ Differential Revision: https://reviews.llvm.org/D159198	2023-09-06 10:06:21 +01:00
Kito Cheng	af9b25f9db	[RISCV] Optimize floating point scalar move and splat In D158086, we limit all floating point scalar move and splat can't fuse vsetvli with different SEW, and this patch try to relax the constraint as possible by introducing new SEW demand type: SEWGreaterThanOrEqualAndLessThan64, that allow SEW fused with larger SEW, but constraint it can't fused with SEW=64. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D158177	2023-09-06 16:39:30 +08:00
laichunfeng	71b5f57f0d	[RISCV] Adjust first sp size to use c.addi16sp. addi sp, sp, 512 may be used to recover the sp in the epilogue when stack size is larger than 2047(2^11 - 1), however, it can not be compressed using C extension, and addi sp, sp, 496 is able to be compressed, so try to use 496 as the ajust amount of the fisrt sp if function doesn't need extra instructions after adjust. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D159431	2023-09-06 14:26:52 +08:00
Ting Wang	71be020dda	[SelectionDAG][PowerPC] Memset reuse vector element for tail store On PPC there are instructions to store element from vector(e.g. stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail constant in memset and constant splat array initialization. This patch tries to explore these opportunities. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D138883	2023-09-06 01:52:38 -04:00
Pravin Jagtap	b230472f22	[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-09-06 10:20:34 +05:30
Craig Topper	2a7b8ab07c	[RISCV] Use add.uw for (or (and X, 0xFFFFFFFF), Y) if Y has zeroes in the lower 32 bits. (#65402 )	2023-09-05 21:05:53 -07:00
Amara Emerson	6c31f20fee	[GlobalISel] Fold fmul x, 1.0 -> x (#65379 )	2023-09-06 03:14:16 +08:00
Amy Kwan	f0b2f69541	[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses. Compiling with TLS variables requires -pthread, but if the user omits this option, the compiler will not show any obvious indication during compilation that -pthread is needed for programs using TLS variables. Instead, the user will experience a segmentation fault when running programs with TLS variables in them and without specifying -pthread. This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for local-exec accesses, in order to trigger an error from the linker to indicate that there is an undefined symbol to __tls_get_addr. Doing so will remind the user to compile/link with -pthread. Differential Revision: https://reviews.llvm.org/D151335	2023-09-05 12:15:14 -05:00
Simon Pilgrim	e086e0aeef	[X86] Add test coverage for new smulo folds added in D159406 Pulled from the InstCombine with_overflow.ll tests	2023-09-05 17:43:42 +01:00
Philip Reames	de34d39b66	[RISCV] Cap build vector cost to avoid quadratic cost at high LMULs Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold. For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b). buildvector_m1_via_stack.mca:Total Cycles: 1904 buildvector_m2_via_stack.mca:Total Cycles: 2104 buildvector_m4_via_stack.mca:Total Cycles: 2504 buildvector_m8_via_stack.mca:Total Cycles: 3304 buildvector_m1_via_vslide1down.mca:Total Cycles: 804 buildvector_m2_via_vslide1down.mca:Total Cycles: 1604 buildvector_m4_via_vslide1down.mca:Total Cycles: 6400 buildvector_m8_via_vslide1down.mca:Total Cycles: 25599 There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme. Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity. Differential Revision: https://reviews.llvm.org/D159332	2023-09-05 09:03:26 -07:00
Craig Topper	fa31ce5320	[RISCV][GISel] Add gisel-commandline-option.ll similar to AArch64. (#65299 ) This allows us to see the pass pipeline for GlobalISel.	2023-09-05 09:01:50 -07:00
Amara Emerson	08e04209d8	[GlobalISel] Commute G_FMUL and G_FADD constant LHS to RHS. (#65298 )	2023-09-05 23:48:34 +08:00
Luke Lau	2fc6fadeaf	[RISCV] Fix typo in test title. NFC	2023-09-05 15:57:18 +01:00
Vladislav Dzhidzhoev	13b7629a58	[GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v). When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N), it is possible to express it just as unused, d1 = unmerge v1. It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in https://reviews.llvm.org/D144670.	2023-09-05 16:14:44 +02:00
Vladislav Dzhidzhoev	7eeeeb0cc9	Revert "[GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v)." This reverts commit 6b37a65264bb4e7d400d5283a65f9e8e1575f2d7. Accindentally pushed before squashing.	2023-09-05 16:13:27 +02:00
Vladislav Dzhidzhoev	0e826f0e6d	Refactored, added MIR test.	2023-09-05 16:00:48 +02:00
Vladislav Dzhidzhoev	6b37a65264	[GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v). When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N), it is possible to express it just as unused, d1 = unmerge v1. It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in https://reviews.llvm.org/D144670.	2023-09-05 16:00:48 +02:00
Jingu Kang	67fc0d3d39	[AArch64] Remove copy instruction between uaddlv and dup If there are copy instructions between uaddlv and dup for transfer from gpr to fpr, try to remove them with duplane. Differential Revision: https://reviews.llvm.org/D159267	2023-09-05 14:41:28 +01:00
David Sherwood	50598f0ff4	[DAGCombiner][SVE] Add support for illegal extending masked loads In some cases where the same mask is used for multiple extending masked loads it can be more efficient to combine the zero- or sign-extend into the load even if it's not a legal or custom operation. This leads to splitting up the extending load into smaller parts, which also requires splitting the mask. For SVE at least this improves the performance of the SPEC benchmark x264 slightly on neoverse-v1 (~0.3%), and at least one other benchmark improves by around 30%. The uplift for SVE seems due to removing the dependencies (vector unpacks) introduced between the loads and the vector operations, since this should increase the level of parallelism. See tests: CodeGen/AArch64/sve-masked-ldst-sext.ll CodeGen/AArch64/sve-masked-ldst-zext.ll https://reviews.llvm.org/D159191	2023-09-05 10:41:21 +00:00
David Sherwood	64094e3e6d	[DAGCombiner] Pre-commit tests for D159191 I've added some missing tests for the following cases: 1. Zero- and sign-extends from unpacked vector types to wide, illegal types. For example, %aext = zext <vscale x 4 x i8> %a to <vscale x 4 x i64> 2. Normal loads combined with 1 3. Masked loads combined with 1 Differential Revision: https://reviews.llvm.org/D159192	2023-09-05 10:41:21 +00:00
Amara Emerson	12e4921709	[GlobalISel] Constant fold sitofp/uitofp of 0. (#65307 )	2023-09-05 17:33:57 +08:00
pvanhout	844c0da777	[TableGen][GlobalISel] Add MIR Pattern Builtins Adds a new feature to MIR patterns: builtin instructions. They offer some additional capabilities that currently cannot be expressed without falling back to C++ code. There are two builtins added with this patch, but more can be added later as new needs arise: - GIReplaceReg - GIEraseRoot Depends on D158714, D158713 Reviewed By: arsenm, aemerson Differential Revision: https://reviews.llvm.org/D158975	2023-09-05 08:19:07 +02:00
Qiu Chaofan	082c5d7f63	[PowerPC] Implement builtin for mffsl mffsl is available since ISA 3.0. The builtin is named with ppc prefix to follow our convention. For targets earlier than power9, GCC generates extra code to support the functionality, while this patch does not implement such behavior. Reviewed By: nemanjai, tuliom Differential Revision: https://reviews.llvm.org/D158065	2023-09-05 11:22:09 +08:00
Nicolai Hähnle	62790a8d4a	AMDGPU: Fix test from previous commit	2023-09-05 00:31:49 +02:00
Nicolai Hähnle	f5fb6ad2e5	AMDGPU: Precommit a test file Demonstrates bad scheduling for private load/store vs. buffer intrinsics.	2023-09-05 00:17:46 +02:00
Amara Emerson	91746d15d2	[GlobalISel] Fix G_PTR_ADD immediate chain combine using the wrong im… (#65271 )	2023-09-05 08:06:40 +08:00
Jay Foad	71ca53b6cf	[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275 )	2023-09-04 13:32:43 -04:00
Simon Pilgrim	e6971cbc06	[X86] combine-mulo.ll - add common CHECK prefix for SSE/AVX test runs	2023-09-04 16:42:48 +01:00
Amara Emerson	f51b7992c9	[GlobalISel] Precommit a ptradd combine test.	2023-09-04 08:27:20 -07:00
Vladislav Dzhidzhoev	a15144f2ba	[AArch64][GlobalISel] Lower G_EXTRACT_VECTOR_ELT with variable indices G_EXTRACT_VECTOR_ELT instructions with non-constant indices are not selected, so they need to be lowered. Fixes https://github.com/llvm/llvm-project/issues/65049. Reviewed By: Peter Differential Revision: https://reviews.llvm.org/D159096	2023-09-04 16:19:16 +02:00
sdesmalen-arm	dbf9b93f25	[AArch64][SME] Disable tail-call optimization for __arm_locally_streaming functions. (#65258 ) When calling a function which requires no streaming-mode change from an __arm_locally_streaming function, LLVM would otherwise emit: // function prologue smstart b streaming_compatible_function // tail call // never an smstop	2023-09-04 15:11:22 +01:00
John Brawn	fae3f9ec4f	[ARM] Fix prologue/epilogue for pacbti-m leaf functions R12 is callee-saved in functions with pacbti-m enabled, but this is done in assignCalleeSavedSpillSlots, meaning that in determineCalleeSaves we have to manually set CanEliminateFrame. This fixes a bug where in leaf functions with no other callee-saved registers the aut instruction wouldn't be emitted and stack offsets of arguments passed on the stack would be incorrect. Differential Revision: https://reviews.llvm.org/D157865	2023-09-04 13:46:01 +01:00
Sander de Smalen	702c3f56d3	[SME] Don't scavenge a spillslot in callee-save area in presence of streaming-mode changes. If no frame-pointer is available and the compiler has scavenged a spill-slot in the callee-save area, the compiler may be forced to emit an 'addvl' inside the streaming-mode-changing call sequence when it needs to fill (reload) an FP register being passed to the call. We can avoid this entirely by disabling stack-slot scavenging when there are streaming-mode-changing call-sequences in the function. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D159196	2023-09-04 10:14:44 +00:00
Luke Lau	6098d7d5f6	[RISCV] Lower shuffles as rotates without zvbb Now that the codegen for the expanded ISD::ROTL sequence has been improved, it's probably profitable to lower a shuffle that's a rotate to the vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't have the vror instruction. This patch relaxes the restriction on ISD::ROTL being legal in lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings, and a second time unconditionally just before it falls back to the vrgather. This way it doesn't interfere with any of the above patterns that may be more profitable than the expanded ISD::ROTL sequence. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159353	2023-09-04 09:35:12 +01:00
Amara Emerson	0065640f40	[GlobalISel] Look through a G_PTR_ADD's def register instead of it's source operand's uses when looking for load/store users. This was a simple logic bug during translation of the equivalent function in SelectionDAG: ``` for (SDNode Node : N->uses()) { if (auto LoadStore = dyn_cast<MemSDNode>(Node)) { ```	2023-09-04 00:28:57 -07:00
Amara Emerson	59cbee4599	[GlobalISel] Fix an incorrect ptradd reassoc test. NFC. The lookthrough int<->ptr cast tests and code were both wrongly checking the wrong register uses. This change is fixing and precommiting the test to prepare for the code fix.	2023-09-04 00:28:56 -07:00
Amara Emerson	69d8ca21af	[GlobalISel] Regenerate ptradd reassociation tests checks.	2023-09-04 00:03:38 -07:00
Matt Arsenault	65b40f273f	RegAlloc: Rename MLRegalloc* files to use consistent captalization The other regalloc related files use RegAlloc, not Regalloc.	2023-09-03 09:00:27 -04:00
Simon Pilgrim	d9ffd3219e	[X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero (REAPPLIED) We can just bitcast the pre-shifted mask as an integer and use TEST/BT directly. Reapplied with fix for 239ab16ec121 which didn't set the comparison type correctly	2023-09-02 17:45:17 +01:00
Simon Pilgrim	600b4634ac	[X86] Add test to check that an extracted bool element comparison is correctly extended when the bool vector is bitcast instead Thanks to @zequanwu for the reduced test case where 239ab16ec121 failed to correctly cast a compare-with-zero to the correct integer type	2023-09-02 17:34:12 +01:00
Matt Arsenault	1f52060000	AMDGPU: Use poison instead of undef in module lds pass	2023-09-02 11:33:26 -04:00
Xiang Li	c21cd168bb	[DirectX backend] avoid generate redundant bitcast in DXILPrepareModule (#65163 ) When emit NoOp bitcast for GEP Ptr Operand, should use SourceElementType instead of ResultElementType. Behavior Before Change Redundant bitcast like ` bitcast ptr addrspace(3) @gs to ptr addrspace(3)` will be generated for llvm/test/CodeGen/DirectX/typed_ptr.ll Behavior After Change No bitcast will be generated. Fixes https://github.com/llvm/llvm-project/issues/65183	2023-09-01 20:08:39 -04:00

1 2 3 4 5 ...

49852 Commits