llvm-project

Author	SHA1	Message	Date
Craig Topper	8677aaa1a3	[RISCV][GISel] Add initial pre-legalizer combiners copying from AArch64.	2023-09-18 10:59:00 -07:00
Jon Roelofs	83e6d2edfc	Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 )" This reverts commit 003bcad9a8b21e15e3786a52b1dafa844075ab84. ARM folks say it regresses some of their benchmarks: https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162	2023-09-18 09:45:46 -07:00
Philip Reames	0722800289	[RISCV] Match constant indices of non-index type when forming strided ops (#65777 ) When checking to see if our index expressions can be converted into strided operations, we previously gave up if the index type wasn't an exact match for the intptrty for the address. Per gep semantics, this mismatch implies a sext or trunc cast to the respective index type. For constants, go ahead and evaluate that cast instead of giving up. Note that the motivation of this is mostly test cleanup. We canonicalize at IR such that the gep index will match the intptrty. This is mostly useful so that we can write both RV32 and RV64 tests from the same source. Its also helpful in preventing confusion - I've stumbled across this at least four times now and wasted time each one. Note: The test change for scatters unit stride cases contains a minor regression for rv32 and 64 bit indices. This is an artifact of order in which changes are landing. This will be addressed in a near future change for all configurations.	2023-09-18 09:41:34 -07:00
Philip Reames	bb7b8726a4	[RISCV] Merge some test checks rvv/fixed-vectors-masked-gather.ll [nfc]	2023-09-18 09:20:12 -07:00
pawosm-arm	be16b03e20	[AArch64] Remove the Z#_HI register definitions (#66353 ) The Z#_HI register definitions were created during the very early SVE enablement work and before the SVE calling convention was locked in. As they look entirely unused, they need to go.	2023-09-18 17:18:28 +01:00
Craig Topper	8f04d81ede	[SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore. If the SRL for Hi constant folds, but we don't remoe those bits from the Lo, we can end up with strange constant folding through DAGCombine later. I've only seen this with constants being lowered to constant pools during lowering on RISC-V.	2023-09-18 09:10:19 -07:00
Craig Topper	17a12a27ec	[RISCV] Add test case to show bad codegen for unaligned i64 store of a large constant. On the first split we create two i32 trunc stores and a srl to shift the high part down. The srl gets constant folded, but to produce a new i32 constant. But the truncstore for the low store still uses the original constant. This original constant then gets converted to a constant pool before we revisit the stores to further split them. The constant pool prevents further constant folding of the additional srls. After legalization is done, we run DAGCombiner and get some constant folding of srl via computeKnownBits which can peek through the constant pool load. This can create new constants that also need a constant pool.	2023-09-18 09:10:19 -07:00
Craig Topper	f71a9e8bb7	[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601 ) The intrinsic uses ImmArg so TargetConstant would be consistent with how other intrinsics are handled. This hides the constants from type legalization so we can remove the promotion support. isel patterns are updated accordingly.	2023-09-18 08:58:50 -07:00
Nikita Popov	38c59b9f53	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.	2023-09-18 17:24:53 +02:00
Jay Foad	d8d0588f66	[TwoAddressInstruction] Update LiveIntervals after INSERT_SUBREG with undef read (#66211 ) Update LiveIntervals after rewriting: %reg = INSERT_SUBREG undef %reg, %subreg, subidx to: undef %reg:subidx = COPY %subreg D113044 implemented this for the non-undef case.	2023-09-18 14:51:58 +01:00
Nikita Popov	4491f0b969	[IR] Remove unnecessary bitcast from CreateMalloc() This bitcast is no longer necessary with opaque pointers. This results in some annoying variable name changes in tests.	2023-09-18 14:58:16 +02:00
Sergei Barannikov	caaf61eb6e	[SDag] Fold saddo[_carry] with bitwise-not argument to ssubo[_carry] (#66571 ) Fold `(saddo (not a), 1)` to `(ssubo 0, a)` and `(saddo_carry (not a), b, c)` to `(ssubo_carry b, a, !c)`. Proof: https://alive2.llvm.org/ce/z/Lj49YM This is the same as https://reviews.llvm.org/D46505 and https://reviews.llvm.org/D59208, but for signed opcodes.	2023-09-18 14:45:41 +03:00
Jay Foad	102838d3f6	update_mir_test_checks.py: match undef vreg subreg definitions (#66627 ) Following on from D139466 which added support for dead vreg defs, this patch adds support for "undef" defs of subregs. Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which previously required manual tweaks to the autogenerated checks to fix an EXPENSIVE_CHECKS failure; see commit 8b7c1fbd9647a5a6ef246a6b5b2543ea0f5a2337	2023-09-18 12:14:46 +01:00
Piyou Chen	b83a1ed594	[RISCV] Only emit .option when extension is supported It maybe emit the .option directive without any follow up. Only emit the .option push/pop when there are supported extension difference between function and module. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159399	2023-09-18 00:30:13 -07:00
Piyou Chen	d861b3183c	[RISCV][NFC] precommit for D159399 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159400	2023-09-18 00:18:08 -07:00
wangpc	cedf2ea7b5	[RISCV] Teach RISCVMergeBaseOffset to handle BlockAddress We can get `BlockAddress` in user code via `Labels as Values` so we should be able to merge the access to `BlockAddress`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159429	2023-09-18 11:47:14 +08:00
wangpc	28efe4d38e	[RISCV] Add tests for merging base offset of BlockAddress We can get `BlockAddress` in user code via `Labels as Values`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159428	2023-09-18 11:47:13 +08:00
Vettel	ddae50d1e6	[RISCV] Combine trunc (sra sext (x), zext (y)) to sra (x, smin (y, scalarsizeinbits(y) - 1)) (#65728 ) For RVV, If we want to perform an i8 or i16 element-wise vector arithmetic right shift in the upper C/C++ program, the value to be shifted would be first sign extended to i32, and the shift amount would also be zero_extended to i32 to perform the vsra.vv instruction, and followed by a truncate to get the final calculation result, such pattern will later expanded to a series of "vsetvli" and "vnsrl" instructions later, this is because the RVV spec only support 2 * SEW -> SEW truncate. But for vector, the shift amount can also be determined by smin (Y, ScalarSizeInBits(Y) - 1)). Also, for the vsra instruction, we only care about the low lg2(SEW) bits as the shift amount. - Alive2: https://alive2.llvm.org/ce/z/u3-Zdr - C++ Test cases : https://gcc.godbolt.org/z/q1qE7fbha	2023-09-17 17:11:28 +08:00
David Green	2861ec84fc	[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP (#65897 ) The non-constant bit/bif/bsp already work through tablegen patterns, this patch handles the constant case, mirroring the basic support for `or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded to either BIT, BIF or BSL depending on the best register allocation. G_BIT can be replaced with G_BSP as a more general alternative.	2023-09-17 09:50:12 +01:00
Yingwei Zheng	e042ff7eef	[SDAG][RISCV] Avoid expanding is-power-of-2 pattern on riscv32/64 with zbb This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb. Clang vs gcc: https://godbolt.org/z/rc3s4hjPh Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156390	2023-09-17 02:56:09 +08:00
Yingwei Zheng	b423e1f05d	[SDAG][RISCV] Avoid neg instructions when lowering atomic_load_sub with a constant rhs This patch avoids creating (sub x0, rhs) when lowering atomic_load_sub with a constant rhs. Comparison with GCC: https://godbolt.org/z/c5zPdP7j4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158673	2023-09-16 17:09:41 +08:00
Philip Reames	c663401f69	[RISCV] Prefer vrgatherei16 for shuffles (#66291 ) If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers can overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.	2023-09-15 15:57:23 -07:00
Philip Reames	ff2622b5ac	[RISCV] Optimize gather/scatter to unit-stride memop + shuffle (#66279 ) If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not bother to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.	2023-09-15 15:54:32 -07:00
Craig Topper	ac182deee8	[RISCV][GlobalISel] Select ALU GPR instructions Some instruction selection patterns required for ALU GPR instructions have already been automatically imported from existing TableGen descriptions - this patch simply adds testing for them. The first of the GIComplexPatternEquiv definitions required to select the shiftMaskXLen ComplexPattern has been added. Some instructions require special handling due to i32 not being a legal type on RV64 in SelectionDAG so we can't reuse SelectionDAG patterns. Co-authored-by: Lewis Revill <lewis.revill@embecosm.com> Reviewed By: nitinjohnraj Differential Revision: https://reviews.llvm.org/D76445	2023-09-15 15:49:38 -07:00
Philip Reames	37aa07ad31	[RISCV] Move narrowIndex to be a DAG combine over target independent nodes In D154687, we added a transform to narrow indexed load/store indices of the form (shl (zext), C). We can move this into a generic transform over the target independent nodes instead, and pick up the fixed vector cases with no additional work required. This is an alternative to D158163. Performing this transform points out that we weren't eliminating zero_extends via the the generic DAG combine. Adjust the (existing) callbacks so that we do. This change removes the existing transform on the target specific intrinsic nodes. If anyone has a use case this impacts, please speak up. Note: Reviewed as part of a stack of changes in PR# 66405.	2023-09-15 15:02:14 -07:00
Mircea Trofin	0af95c3262	[mlgo] Fix regalloc tests Post - D156491 or cbdccb3. Just re-based reference outputs.	2023-09-15 17:27:34 -04:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Philip Reames	52b33ff760	[RISCV] Avoid toggling VL for hidden splat case in constant buildvector lowering We have the analogous case in the single insert path. The reasoning here is that if the original VL fits in LMUL1, we'd prefer to clobber a few extra dead lanes than to force two VL toggles. VTYPE toggles are generally cheaper than VL toggles.	2023-09-15 12:33:21 -07:00
Jon Roelofs	003bcad9a8	[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 ) The indirect lowering hinders the outliner's ability to see that sequences are in fact common, since the sequence similarity is rendered opaque by the register callee. The size savings from making them indirect seems to be dwarfed by the outliner's savings from de-duplication. rdar://115178034 rdar://115459865	2023-09-15 10:04:56 -07:00
Vladislav Dzhidzhoev	4e970d7bd8	[AArch64][GlobalISel] Select llvm.aarch64.neon.st* intrinsics (#65491 ) Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp	2023-09-15 16:35:21 +02:00
David Green	b0f0aa852d	[AArch64] Guard against a invalid size request in performVecReduceAddCombine With both +sve and +dotprod, and a scalable vecreduce(sext) we could attempt to access the number of elements of a scalable vector. Guard against this for now, until scalable dotprod are properly supported.	2023-09-15 14:04:21 +01:00
Nikita Popov	47324cfd7d	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply after fixing a clang bug this exposed in D158972 and adjusting a number of tests that failed for 32-bit targets. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-09-15 14:51:50 +02:00
Vladislav Dzhidzhoev	c464896dbe	[AArch64][GlobalISel] Select llvm.aarch64.neon.ld* intrinsics (#65630 ) Similar to llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp.	2023-09-15 14:03:48 +02:00
Benjamin Kramer	3454cf67bd	Revert "[MachineLICM] Handle Subloops" This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It accesses MI after it has been hoisted.	2023-09-15 13:20:31 +02:00
Matthew Devereau	9bbbfbc7fd	[AArch64][SME] Emit Zero instruction for NewZA functions [The ACLE](https://github.com/ARM-software/acle/pull/268) Demands that functions with the aarch64_pstate_za_new attribute set all bits of the ZA register to zero upon entry.	2023-09-15 11:40:30 +01:00
Jay Foad	ceb68eea8c	[AMDGPU] Remove repeated -mtriple options from RUN lines (#66486 )	2023-09-15 11:29:24 +01:00
Weining Lu	0a692b6b96	[LoongArch] Fix incorrect instruction 'and' in pattern It should be `andi`, but not `and`. Address buildbot failure: https://lab.llvm.org/buildbot/#/builders/42/builds/11634	2023-09-15 16:16:06 +08:00
Martin Storsjö	7a91bbbb00	[GlobalISel] Check for unsupported Windows features on invoke (#65864 ) This matches what is done on calls, since cc981d285d1aa33df201605b9a3e22dd2311ead2 (extended for another case in 5a751e747dbf2c267e944aa961e21de7a815e7eb). Apply both those cases on invoke just like is done for call. Also update the preexisting comment which was left without update in 5a751e747dbf2c267e944aa961e21de7a815e7eb. This fixes github issue #61941.	2023-09-15 11:14:40 +03:00
Pierre van Houtryve	e9e3868707	[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346 ) Addresses the FIXME for both DAGISel and GISel.	2023-09-15 08:11:01 +02:00
Rainer Orth	715fc4fc60	[Sparc] Don't emit __multi3 on 32-bit SPARC (#66362 ) LLVM fails to build on 32-bit Solaris/SPARC: several programs fail to link due to undefined references to `__multi3`. This reference is from `lib/libLLVMScalarOpts.a(LoopStrengthReduce.cpp.o)`. However, This function exists neither in the 32-bit `libgcc.a` nor in `libclang_rt.builtins-sparc.a`. It's only defined in their 64-bit counterparts. The same issue affects several 32-bit targets, e.g. 32-bit PowerPC as described in Issue #54460. The fix is the same: inhibit the libcall for 32-bit compilations. This patch does just that, regenerating the affected testcases. It allows the build to complete. Tested on `sparc-sun-solaris2.11`.	2023-09-15 07:31:59 +02:00
Arthur Eubanks	1feb00a28c	[X86] Introduce a large data threshold for the medium code model Currently clang's medium code model treats all data as large, putting them in a large data section and using more expensive instruction sequences to access them. Following gcc's -mlarge-data-threshold, which allows putting data under a certain size in a normal data section as opposed to a large data section. This allows using cheaper code sequences to access some portion of data in the binary (which will be implemented in LLVM in a future patch). And under the medium codel mode, only put data above the large data threshold into large data sections, not all data. Reviewed By: MaskRay, rnk Differential Revision: https://reviews.llvm.org/D149288	2023-09-14 15:09:25 -07:00
Kuba (Brecka) Mracek	454cc36630	[AArch64] Relax binary format switch in AArch64MCInstLower::LowerSymbolOperand to allow non-Darwin Mach-O files (#66011 ) Trying to use a arm64-apple-none-macho target triple today crashes with an assertion, this patch fixes that.	2023-09-14 11:12:30 -07:00
Jingu Kang	5ec9699c4d	[MachineLICM] Handle Subloops Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outermost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-09-14 18:07:31 +01:00
David Green	74724902ba	[AArch64] Split Ampere1Write_Arith into rr/ri and rs/rx InstRWs. (#66384 ) The ampere1 scheduling model uses IsCheapLSL predicates for ADDXri and ADDWrr instructions, which only have 3 operands. In attempting to check that the third is a shift, the predicate can attempt to access an out of bounds operand, hitting an assert. This splits the rr/ri instructions (which can never have shifts) from the rs/rx instructions to ensure they both work correctly. Ampere1Write_1cyc_1AB was chosen for the rr/ir instructions to match the cheap case. This also sets CompleteModel = 0 for the ampere1 scheduling model, as at runtime under debug it will attempt to check that as well as all instructions having scheduling info, there is information for each output operand. DefIdx 1 exceeds machine model writes for renamable $w9, renamable $w8 = LDPWi renamable $x8, 0 (Try with MCSchedModel.CompleteModel set to false)incomplete machine model	2023-09-14 16:29:30 +01:00
Manos Anagnostakis	008f26b12e	[AArch64] New subtarget features to control ldp and stp formation (#66098 ) On some AArch64 cores, including Ampere's ampere1 and ampere1a architectures, load and store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded. Based on that, this patch introduces four new subtarget features, two for controlling ldp and two for controlling stp, to cover the ampere1 and ampere1a alignment needs and to enable optional fine-grained control over ldp and stp generation in general. The latter can be utilized by another cpu, if there are possible benefits with a different policy than the default provided by the compiler. More specifically, for each of the ldp and stp respectively we have: - disable-ldp/disable-stp: Do not emit ldp/stp. - ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source pointer is aligned to at least double the alignment of the type. Therefore, for -mcpu=ampere1 and -mcpu=ampere1a ldp-aligned-only/stp-aligned-only become the defaults, because of the benefit from the alignment, whereas for the rest of the cpus the default behaviour of the compiler is maintained.	2023-09-14 16:58:39 +02:00
Paul Walker	c7d65e4466	[IR] Enable load/store/alloca for arrays of scalable vectors. Differential Revision: https://reviews.llvm.org/D158517	2023-09-14 13:49:01 +00:00
paulwalker-arm	8ba5820e7a	[SVE] Ensure SVE call operands passed via memory are correctly initialised. (#66070 ) The stores created when passing operands via memory don't typically maintain the chain, because they can be done in any order. Instead, a new chain is created based on all collated stores. SVE parameters passed via memory don't follow this idiom and try to maintain the chain, which unfortunately can result in them being incorrectly deadcoded when the chain is recreated. This patch brings the SVE side in line with the non-SVE side to ensure no stores become lost whilst also allowing greater flexibility when ordering the stores.	2023-09-14 12:58:58 +01:00
David Green	adc5509186	[AArch64] Add LRINT/LLRINT/LROUND/LLROUND FP16 lowering without fullfp16 (#66174 ) We apparently somehow had lowering for the STRICT nodes without any handling for the normal operations. This makes sure we support the LRINT and LROUND intrinsics for fp16 when +fullfp16 is not present.	2023-09-14 09:36:03 +01:00
Weining Lu	419f90e93a	[LoongArch] Support llvm.is.fpclass for f32 and f64 is_fpclass (fj, mask) -> sltu (r0, and (movfr2gr.[sd] (fclass.[sd] fj), (to_fclass_mask mask))) [1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_fclass_sd Reviewed By: wangleiat Differential Revision: https://reviews.llvm.org/D159183	2023-09-14 15:43:58 +08:00
Jianjian Guan	c31dda4e6e	[RISCV] Update Zicntr and Zihpm to version 2p0 (#66323 )	2023-09-14 15:43:50 +08:00

1 2 3 4 5 ...

50014 Commits