llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	170c525d79	[X86] combineExtractVectorElt - fold extract(trunc(x),c) -> trunc(extract(x,c))	2024-04-08 11:01:19 +01:00
Pengcheng Wang	364028a1a5	[RISCV] Zimop/Zcmop are ratified Remove them from experimental. See also: https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc Reviewers: kito-cheng Reviewed By: kito-cheng Pull Request: https://github.com/llvm/llvm-project/pull/87966	2024-04-08 16:40:02 +08:00
David Green	9fd2e2c2fd	[DAG][AArch64] Support masked loads/stores with nontemporal flags (#87608 ) SVE has some non-temporal masked loads and stores. The metadata coming from the nodes is not copied to the MMO at the moment though, meaning it will generate a normal instruction. This patch ensures that the right flags are set if the instruction has non-temporal metadata.	2024-04-08 08:53:27 +01:00
David Green	ac321cbb03	[AArch64][GlobalISel] Legalize Insert vector element (#81453 ) This attempts to standardize and extend some of the insert vector element lowering. Most notably: - More types are handled by splitting illegal vectors. - The index type for G_INSERT_VECTOR_ELT is canonicalized to TLI.getVectorIdxTy(), similar to extact_vector_element. - Some of the existing patterns now have the index type specified to make sure they can apply to GISel too. - The C++ selection code has been removed, relying on tablegen patterns. - G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to use a i32 type, allowing the existing patterns to apply. - Variable index inserts are lowered in post-legalizer lowering, expanding into a stack store and reload.	2024-04-08 08:44:13 +01:00
Bevin Hansson	110c22fe12	[ExpandLargeFpConvert] Support bfloat. (#87619 ) The conversion expansions did not properly handle bfloat types. I'm not certain that these expansions are completely correct; I don't have any experience with AMDGPU or the ability to run anything to test it. Note that it doesn't seem like AMDGPU with GlobalISel can handle fptrunc of float to bfloat, which is needed for itofp. I've omitted the GISEL run for the bfloat case. This fixes #85379.	2024-04-08 09:07:55 +02:00
Pengcheng Wang	f3b5597364	[RISCV] Use larger copies when register tuples are aligned When the encoding of register tuples are aligned, we can use a copy with larger LMUL to reduce copies. Reviewers: preames, topperc, lukel97 Reviewed By: topperc, lukel97 Pull Request: https://github.com/llvm/llvm-project/pull/84455	2024-04-08 13:24:57 +08:00
Haohai Wen	cebf77fb93	[CodeGen][DebugInfo] Add missing DebugLoc for SplitCriticalEdge (#72192 ) In SplitCriticalEdge, DebugLoc of the branch instruction in new created MBB was set to empty. It should be set and we can find proper DebugLoc for it in most cases. This patch set it to non empty merged DebugLoc of current MBB branches.	2024-04-08 09:44:34 +08:00
Philip Reames	da675b922c	[RISCV] Expand test coverage of stack offsets between 2^11 and 2^15 Adds two sets of tests. First, one for prolog/epilogue insertions where the second stack adjustment can be done with shNadd for zba. Second, a set of tests with offsets off SP in the same ranges, but also adding varying alignments.	2024-04-07 15:22:25 -07:00
Jianjian Guan	bc8726b16b	[RISCV] Support codegen of vfmv.v.f for bfloat vector with both Zvfbfmin and Zfbfmin (#87318 ) vfmv, vfmerge should support bfloat vector when we have both Zvfbfmin and Zfbfmin, this patch tries to support vfmv first.	2024-04-07 10:41:47 +08:00
AtariDreams	8389b3bf60	[X86] Fix typo: QWORD alignment is greater than or equal to 8, not greater than 8 (#87819 ) Align(8) is QWORD aligned, but this was checking to see if alignment was greater than that, when it should have been checking for being greater than OR EQUAL to Align(8). This bug was introduced in https://github.com/llvm/llvm-project/commit/6a6af30d433d7 during the transition to the Align type.	2024-04-07 08:43:13 +08:00
darkbuck	8e98435ae9	[GISel][Combine] Enhance combining on G_BUILD_VECTOR Reviewers: aemerson, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/87831	2024-04-06 18:33:01 -04:00
Sizov Nikita	d38bff460a	[AArch64] SimplifyDemandedBitsForTargetNode - add AArch64ISD::BICi handling (#76644 ) Fold BICi if all destination bits are already known to be zeroes ```llvm define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) { %x0 = zext <8 x i8> %a0 to <8 x i16> %x1 = zext <8 x i8> %a1 to <8 x i16> %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1) %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511> ret <8 x i16> %res } declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>) ``` ``` haddu_known: // @haddu_known ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 uhadd v0.8h, v0.8h, v1.8h bic v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended ret ``` Fixes #53881 Fixes #53622	2024-04-06 21:41:24 +01:00
Matt Arsenault	4cb110a84f	[RFC] IR: Support atomicrmw FP ops with vector types (#86796 ) Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x bfloat> on some targets and address spaces. Note this only supports the proper floating-point operations; float vector typed xchg is still not supported. cmpxchg still only supports integers, so this inserts bitcasts for the loop expansion. I have support for fp vector typed xchg, and vector of int/ptr separately implemented but I don't have an immediate need for those beyond feature consistency.	2024-04-06 15:27:45 -04:00
Amara Emerson	60fc4ac67a	[GlobalISel] Don't form anyextending atomic loads. Until we can reliably check the legality and improve our selection of these, don't form them at all.	2024-04-05 13:34:59 -07:00
Craig Topper	4abb722ffa	[RISCV] Add tests for opportunities to reassociate to form more shXadd instructions. NFC These tests consist of patterns like (sh3add Z, (add X, (slli Y, 6))) that can be reassociated to form (sh3add (sh3add Y, Z), X).	2024-04-05 12:50:48 -07:00
Craig Topper	0a6a40d62e	[RISCV] Add Zca predicate to BrccCompressOpt patterns used for MinSize. Previously we only checked for C.	2024-04-05 12:39:39 -07:00
Craig Topper	e7e78274a6	[RISCV] Remove uses of sed from compress-opt-branch.ll. NFC sed was being used to use the same test functions with eq/ne branch condition. This commit duplicates the test functions so that we have a version with each condition. This allows us to remove 2 RUN lines. I plan to add a Zca testing to this file which now requires 1 new RUN line instead of 2.	2024-04-05 12:35:46 -07:00
Craig Topper	3c37f926a1	[RISCV] Fix comment in compress-opt-branch.ll to match description. NFC Test description says constant does not fit in 12 bits, but the constant used was -2048 which does fit in 12 bits. Update to -2049. Also remove uses of -NOT in favor of positive checks. One of the -NOT should have been using RESBROPT instead of "c.beqz" so that it would check for the absense of the correct instruction based on the sed replacement on the RUN line.	2024-04-05 11:52:46 -07:00
Simon Pilgrim	b861e2736a	[X86] pr45995.ll - add nounwind to silence cfi noise	2024-04-05 16:36:35 +01:00
Simon Pilgrim	6a6335fa39	[X86] bool-vector.ll - add nounwind to silence cfi noise	2024-04-05 16:36:34 +01:00
Michael Liao	a1b2f0cc44	Reland "[GlobalISel] Fix the infinite loop issue in `commute_int_constant_to_rhs`" - That test needs to disable combine rules by name and hence requires `asserts`.	2024-04-05 10:34:12 -04:00
AtariDreams	c5d000b1a8	[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908 ) Consider the following: ldr r0, [r4] ldr r7, [r0, #4] cmp r7, r3 bhi .LBB0_6 cmp r0, r2 push {r0} pop {r4} bne .LBB0_3 movs r0, r6 pop {r4, r5, r6, r7} pop {r1} bx r1 Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to. push {r0} ends up being popped to pop {r4}. movs r4, r0 would destroy the flags set by cmp right above. The compiler has no alternative in this case, except one: the only alternative is to transfer through a high register. However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register. This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.	2024-04-05 10:18:22 +01:00
Koakuma	697dd93ae3	[SPARC] Implement L and H inline asm argument modifiers (#87259 ) This adds support for using the L and H argument modifiers for twinword operands in inline asm code, such as in: ``` %1 = tail call i64 asm sideeffect "rd %pc, ${0:L} ; srlx ${0:L}, 32, ${0:H}", "={o4}"() ``` This is needed by the Linux kernel.	2024-04-05 04:34:07 +07:00
Victor Campos	74373c1bef	Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries" (#87699 ) Reverts llvm/llvm-project#79173 The testcase fails in non-asserts builds.	2024-04-04 21:29:21 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00
Daniil Kovalev	d97d560fbf	[AArch64][PAC][MC][ELF] Support PAuth ABI compatibility tag (#85236 ) Depends on #87545 Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in `.note.gnu.property` section depending on `aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm module flags.	2024-04-04 21:05:03 +03:00
Gulfem Savrun Yeniceri	be8fd86f6a	Revert "[GlobalISel] Fix the infinite loop issue in `commute_int_constant_to_rhs`" This reverts commit 1f01c580444ea2daef67f95ffc5fde2de5a37cec because combine-commute-int-const-lhs.mir test failed in multiple builders. https://lab.llvm.org/buildbot/#/builders/124/builds/10375 https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751607530180046481/overview	2024-04-04 16:39:31 +00:00
Craig Topper	51f1cb5355	[X86] Add or_is_add patterns for INC. (#87584 ) Should fix the cases noted in #86857	2024-04-04 08:04:21 -07:00
Piotr Sobczak	5b59ae423a	[DAG] Preserve NUW when reassociating (#87621 ) Similarly to the generic case below, preserve the NUW flag when reassociating adds with constants.	2024-04-04 16:47:25 +02:00
Simon Pilgrim	c1742525d0	[X86] evex-to-vex-compress.mir - update test checks missed in #87636	2024-04-04 15:42:29 +01:00
Victor Campos	5ad320abe3	[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#79173 ) Following https://github.com/llvm/llvm-project/pull/68313 this patch extends the idea to M-profile PACBTI. The Machine Scheduler can reorder instructions within a scheduling region depending on the scheduling policy set. If a BTI-clearing instruction happens to partake in one such region, it might be moved around, therefore ending up where it shouldn't. The solution is to mark all BTI-clearing instructions as scheduling region boundaries. This essentially means that they must not be part of any scheduling region, and as consequence never get moved: - PAC - PACBTI - BTI - SG Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in the compilation pipeline. As far as I know, currently it isn't possible to organically obtain code that's susceptible to the bug: - Instructions that write to SP are region boundaries. PAC seems to always be followed by the pushing of r12 to the stack, so essentially PAC is always by itself in a scheduling region. - CALL_BTI is expanded into a machine instruction bundle. Bundles are unpacked only after the last machine scheduler run. Thus setjmp and BTI can be separated only if someone deliberately run the scheduler once more. - The BTI insertion pass is run late in the pipeline, only after the last machine scheduling has run. So once again it can be reordered only if someone deliberately runs the scheduler again. Nevertheless, one can reasonably argue that we should prevent the bug in spite of the compiler not being able to produce the required conditions for it. If things change, the compiler will be robust against this issue. The tests written for this are contrived: bogus MIR instructions have been added adjacent to the BTI-clearing instructions in order to have them inside non-trivial scheduling regions.	2024-04-04 12:44:32 +01:00
Luke Lau	4e0b8eae4c	[RISCV] Add tests for vwsll for extends > .vf2. NFC These cannot be picked up by TableGen patterns alone and need to be handled by combineBinOp_VLToVWBinOp_VL	2024-04-04 18:43:15 +08:00
Simon Pilgrim	2d0087424f	[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480 ) Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds. This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements. There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads? Fixes #86968	2024-04-04 11:10:55 +01:00
Jay Foad	3cf539fb04	[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539 ) Call generateWaitcnt unconditionally at the end of SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to generate a new waitcnt instruction it has the effect of combining or removing redundant waitcnts that were already present. Tests show various small improvements in waitcnt placement.	2024-04-04 10:14:16 +01:00
Vyacheslav Levytskyy	47e996d89d	[SPIR-V] Fix OpVariable instructions place in a function (#87554 ) This PR: * fixes OpVariable instructions place in a function (see https://github.com/llvm/llvm-project/issues/66261), * improves type inference, * helps avoiding unneeded bitcasts when validating function call's This allows to improve existing and add new test cases with more strict checks. OpVariable fix refers to "All OpVariable instructions in a function must be the first instructions in the first block" requirement from SPIR-V spec.	2024-04-04 10:50:35 +02:00
Luke Lau	3a7b5223a6	[DAGCombiner][RISCV] Handle truncating splats in isNeutralConstant (#87338 ) On RV64, we legalize zexts of i1s to (vselect m, (splat_vector i64 1), (splat_vector i64 0)), where the splat_vectors are implicitly truncating. When the vselect is used by a binop we want to pull the vselect out via foldSelectWithIdentityConstant. But because vectors with an element size < i64 will truncate, isNeutralConstant will return false. This patch handles truncating splats by getting the APInt value and truncating it. We almost don't need to do this since most of the neutral elements are either one/zero/all ones, but it will make a difference for smax and smin. I wasn't able to figure out a way to write the tests in terms of select, since we need the i1 zext legalization to create a truncating splat_vector. This supercedes #87236. Fixed vectors are unfortunately not handled by this patch (since they get legalized to _VL nodes), but they don't seem to appear in the wild.	2024-04-04 12:36:15 +08:00
Luke Lau	07d5f49186	[RISCV] Add patterns for fixed vector vwsll (#87316 ) Fixed vectors have their sext/zext operands legalized to _VL nodes, so we need to handle them in the patterns. This adds a riscv_ext_vl_oneuse pattern since we don't care about the type of extension used for the shift amount, and extends Low8BitsSplatPat to handle other _VL nodes. We don't actually need to check the mask or VL there since none of the _VL nodes have passthru operands. The remaining test cases that are widening from i8->i64 need to be handled by extending combineBinOp_VLToVWBinOp_VL. This also fixes Low8BitsSplatPat incorrectly checking the vector size instead of the element size to determine if the splat value might have been truncated below 8 bits.	2024-04-04 11:30:23 +08:00
darkbuck	1f01c58044	[GlobalISel] Fix the infinite loop issue in `commute_int_constant_to_rhs` - When both operands are constant, the matcher runs into an infinite loop as the commutation should be applied only when LHS is a constant and RHS is not. Reviewers: arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/87426	2024-04-03 20:52:21 -04:00
Michael Maitland	63c925ca80	[RISCV][GISEL] Instruction selection for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type	2024-04-03 15:56:08 -07:00
Michael Maitland	188ca374ee	[RISCV][GISEL] Regbankselect for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type	2024-04-03 15:56:04 -07:00
Michael Maitland	35a9393a3f	[RISCV][GISEL] Instruction selection for G_ICMP	2024-04-03 15:47:34 -07:00
Michael Maitland	05f673bcef	[RISCV][GISEL] Regbank select for scalable vector G_ICMP	2024-04-03 15:47:34 -07:00
Michael Maitland	8aa3a77eaf	[RISCV][GISEL] Legalize G_ZEXT, G_SEXT, and G_ANYEXT, G_SPLAT_VECTOR, and G_ICMP for scalable vector types This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a legal mask type, then the instruction is legalized as the element-wise select, where the condition on the select is the mask typed source operand, and the true and false values are 1 or -1 (for zero/any-extension and sign extension) and zero. If the type is a legal integer or vector integer type, then the instruction is marked as legal. The legalization of the extends may introduce a G_SPLAT_VECTOR, which needs to be legalized in this patch for the extend test cases to pass. A G_SPLAT_VECTOR is legal if the vector type is a legal integer or floating point vector type and the source operand is sXLen type. This is because the SelectionDAG patterns only support sXLen typed ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL if the splat is all ones or all zeros respectivley. In the case of a non-constant mask splat, we legalize by promoting the scalar value to s8. In order to get the s8 element vector back into s1 vector, we use a G_ICMP. In order for the splat vector and extend tests to pass, we also need to legalize G_ICMP in this patch. A G_ICMP is legal if the destination type is a legal bool vector and the LHS and RHS are legal integer vector types.	2024-04-03 15:27:15 -07:00
David Green	52ae02db40	[AArch64] Add a test for non-temporal masked loads / stores. NFC	2024-04-03 19:31:25 +01:00
Michael Maitland	07d3f2a8de	[RISCV][GISEL] Run update_mir_test_checks on llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-xor.mir	2024-04-03 10:37:44 -07:00
Amaury Séchet	1aedf949e0	[NFC] Automatically generate indirect-branch-tracking-eh2.ll	2024-04-03 15:22:23 +00:00
Weining Lu	0f5f931a9b	[CodeGen] Fix test after #86049	2024-04-03 22:28:02 +08:00
aniplcc	d650fcd6bf	[DAG] SimplifyDemandedVectorElts - add ISD::AVGCEILS/AVGCEILU/AVGFLOORS/AVGFLOORU nodes (#86284 ) Fixes #84768	2024-04-03 15:00:50 +01:00
Simon Pilgrim	2bf7ddf06f	[X86] Add vector truncation tests for nsw/nuw flags Based off #85592 - our truncation -> PACKSS/PACKUS folds should be able to use the nsw/nuw flags to recognise when we don't need to mask/sext_inreg prior to the PACKSS/PACKUS nodes.	2024-04-03 13:35:55 +01:00
AinsleySnow	52b18430ae	[VP][DAGCombine] Use `simplifySelect` when combining vp.select. (#87342 ) Hi all, This patch is a follow-up of #79101. It migrates logic from `visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this patch we can do the following combinations: ``` vp.select undef, T, F --> T (if T is a constant), F otherwise vp.select <condition>, undef, F --> F vp.select <condition>, T, undef --> T vp.select false, T, F --> F vp.select <condition>, T, T --> T ``` I'm a total newbie to llvm and I'm sure there's room for improvements in this patch. Please let me know if you have any advice. Thank you in advance!	2024-04-03 07:45:50 -04:00

1 2 3 4 5 ...

52796 Commits