llvm-project

Author	SHA1	Message	Date
Yeting Kuo	ab94fbba57	[RISCV] Prefer Zcmp push/pop instead of save-restore calls. (#66046 ) Zcmp push/pop can reduce more code size then save-restore calls. There are two reasons, 1. Call for save-restore calls needs 4-8 bytes, but Zcmp push/pop only needs 2 bytes. 2. Zcmp push/pop can also handles small shift of sp.	2023-09-20 09:16:29 +08:00
Sergei Barannikov	dd477ebd23	[Sparc] Remove LEA instructions (NFCI) (#65850 ) LEA_ADDri and LEAX_ADDri are printed / encoded the same way as ADDri. I had to change the type of simm13Op so that it can be used in both 32- and 64-bit modes. This required the changes in operands of some InstAliases.	2023-09-20 03:34:39 +03:00
DianQK	96ea48ff5d	[SimplifyCFG] Hoist common instructions on Switch. Sink common instructions are not always performance friendly. We need to implement hoist common instructions on switch instruction to solve the following problem: ``` define i1 @foo(i64 %a, i64 %b, i64 %c, i64 %d) { start: %test = icmp eq i64 %a, %d br i1 %test, label %switch_bb, label %exit switch_bb: ; preds = %start switch i64 %a, label %bb0 [ i64 1, label %bb1 i64 2, label %bb2 ] bb0: ; preds = %switch_bb %0 = icmp eq i64 %b, %c br label %exit bb1: ; preds = %switch_bb %1 = icmp eq i64 %b, %c br label %exit bb2: ; preds = %switch_bb %2 = icmp eq i64 %b, %c br label %exit exit: ; preds = %bb2, %bb1, %bb0, %start %result = phi i1 [ false, %start ], [ %0, %bb0 ], [ %1, %bb1 ], [ %2, %bb2 ] ret i1 %result } ``` The pre-commit test is D156617. Reviewed By: XChy, nikic Differential Revision: https://reviews.llvm.org/D155711	2023-09-20 07:21:49 +08:00
Austin Kerbow	60a227c464	[AMDGPU] Use inreg for hint to preload kernel arguments This patch is the first in a series that adds support for pre-loading kernel arguments into SGPRs. The command-line argument 'amdgpu-kernarg-preload-count' is used to specify the number of arguments sequentially from the first that we should attempt to preload, the default is 0. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156852	2023-09-19 15:13:38 -07:00
Mircea Trofin	d8873df4dc	[AsmPrint] Dump raw frequencies in `-mbb-profile-dump` (#66818 ) We were losing the function entry count, which is useful to check profile quality. For the original cases where we want entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.	2023-09-19 14:37:06 -07:00
Philip Reames	86b32c4b55	[RISCV] Match strided load via DAG combine (#66800 ) This change matches a masked.stride.load from a mgather node whose index operand is a strided sequence. We can reuse the VID matching from build_vector lowering for this purpose. Note that this duplicates the matching done at IR by RISCVGatherScatterLowering.cpp. Now that we can widen gathers to a wider SEW, I don't see a good way to remove this duplication. The only obvious alternative is to move thw widening transform to IR, but that's a no-go as I want other DAGs to run first. I think we should just live with the duplication - particularly since the reuse is isSimpleVIDSequence means the duplication is somewhat minimal.	2023-09-19 14:10:52 -07:00
Arthur Eubanks	1a8c69176e	[X86] Use RIP-relative addressing for data under large data threshold for medium code model Since those data are assumed to be within the relocation offset limit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D150297	2023-09-19 11:14:45 -07:00
Philip Reames	fc95de38d9	[RISCV] Require alignment when forming gather with larger element type This fixes a bug in my 928564caa5de8b07cede51e45499934777b9938c that didn't get noticed in review. I found it when looking at the strided load case (upcoming patch), and realized the previous commit was buggy too. p.s. Sorry for the slightly confusing test diff. I'd apparently used the wrong mask for the aligned positive test; it was actually unaligned. Didn't seem worthy of a separate precommit.	2023-09-19 11:00:42 -07:00
Philip Reames	de37d965da	[RISCV] Expand test coverage for widening gather and strided load idioms While I'm here, cleanup a few implemented todos.	2023-09-19 10:43:40 -07:00
Craig Topper	bbe3ee061f	[RISCV] Add more instructions for the short forward branch optimization. (#66789 ) This adds the shifts and the immediate forms of the instructions that were already supported. There are still more instructions that can be predicated, but this is the rest of what we had in our downstream.	2023-09-19 10:21:39 -07:00
Yingwei Zheng	93fde2ea1b	[RISCV] Add a pass to rewrite rd to x0 for non-computational instrs whose return values are unused When AMOs are used to implement parallel reduction operations, typically the return value would be discarded. This patch adds a peephole pass `RISCVDeadRegisterDefinitions`. It rewrites `rd` to `x0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO. Comparison with GCC: https://godbolt.org/z/bKaxnEcec Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158759	2023-09-20 01:02:19 +08:00
Craig Topper	82676d49d3	[RISCV] Fix bad isel predicate handling for Ztso. (#66739 ) The predicates inside the AMOPat class were being overridden by the Predicates = [HasStdExtA] at the instantiation.	2023-09-19 08:57:49 -07:00
Jay Foad	44e997a158	[TwoAddressInstruction] Use isPlainlyKilled in processTiedPairs (#65976 ) Calling isPlainlyKilled instead of directly checking for a kill flag should make processTiedPairs behave the same with LiveIntervals (i.e. when compiling with -early-live-intervals) as it does with LiveVariables.	2023-09-19 16:44:20 +01:00
Luke Lau	22d0bd8632	[DAGCombiner] Combine vp.strided.store with unit stride to vp.store (#66774 ) This is the VP equivalent of #66677. If we have a strided store where the stride is equal to the element width, we can just use a regular VP store.	2023-09-19 16:43:50 +01:00
Luke Lau	469f6b9b4c	[DAGCombiner] Combine vp.strided.load with unit stride to vp.load (#66766 ) This is the VP equivalent of #65674. We already combine MGATHER loads with unit stride to MLOAD, so this extends it for EXPERIMENTAL_VP_STRIDED_LOAD.	2023-09-19 16:39:28 +01:00
Philip Reames	188d5c7442	[RISCV] Add a combine to form masked.store from unit strided store Add a DAG combine to form a masked.store from a masked_strided_store intrinsic with stride equal to element size. This is the store analogy to PR #65674. As seen in the tests, this does pickup a few cases that we'd previously missed due to selection ordering. We match strided stores early without going through the recently added generic mscatter combines, and thus weren't recognizing the unit strided store.	2023-09-19 07:45:35 -07:00
Mircea Trofin	a21d4abc89	[mlgo] Fix tests post PR #66334	2023-09-19 07:34:20 -07:00
Natalie Chouinard	116f7a2dcb	[SPIRV] Test basic float and int types (#66282 ) Add Int16, Int64 and Float64 capabilities as always available for Vulkan (since 1.0), and add tests covering most of the basic types from clang/test/CodeGenHLSL/basic_types.hlsl except for half floats. Depends on D156049	2023-09-19 10:24:53 -04:00
Natalie Chouinard	4abe3f18e2	[SPIRV] Fix bug in emitting GLSL ext inst names Lookup extended instruction numbers in the given instruction set so that correct names are now emitted for GLSL.std.450 instructions as well as OpenCL.std. Add a single test to verify correct abs intrinsic names are emitted when targetting logical SPIR-V. Depends on D156424 Differential Revision: https://reviews.llvm.org/D159227	2023-09-19 13:44:13 +00:00
wangpc	61d819dd52	[RISCV] Add tests for memory constraint A We should not optimize it in D158062. This adds the test coverage. And unneeded attributes `nonnull` and `inbounds` are removed. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D159530	2023-09-19 19:51:04 +08:00
Luke Lau	73c2cb5999	[RISCV] Merge RV32/RV64 CHECK lines in strided vp load/store tests. NFC	2023-09-19 12:24:32 +01:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Jay Foad	1d305f95d6	[AMDGPU] Fix line endings in a test	2023-09-19 11:09:03 +01:00
Michal Paszkowski	2616c279d5	[SPIR-V] Preserve pointer address space for load/gep instructions Differential Revision: https://reviews.llvm.org/D158761	2023-09-19 01:42:42 -07:00
JinGu Kang	59c3dcafd8	[AArch64] Remove copy instruction between uaddlv with v4i16/v8i16 and dup (#66508 ) If there are copy instructions between uaddlv with v4i16/v8i16 and dup for transfer from gpr to fpr, try to remove them with duplane. It is a follow-up patch of https://reviews.llvm.org/D159267	2023-09-19 09:05:12 +01:00
Michal Paszkowski	ec7baca17e	[SPIR-V] Remove -opaque-pointers=0 from LITs, fixes for opaque pointers support Differential Revision: https://reviews.llvm.org/D156049	2023-09-19 00:50:42 -07:00
Matt Arsenault	1328a8534b	AMDGPU: Fix handling of -0 in round lowering (#65761 )	2023-09-19 09:14:17 +03:00
Fangrui Song	af935cf0ee	[CodeLayout] Fix X1_Y_X2 and Y_X2_X1 testing for jumps from Y (#66592 ) The CHECK2 test in code_placement_ext_tsp_large.ll now has the same result as the CHECK test: when chain(0,2,3,4,1) is merged with chain(8), the result is now chain(0,2,3,4,8,1). Ideally we should have test coverage for -ext-tsp-chain-split-threshold=1, but it seems challenging to craft one. Perhaps the default value of -ext-tsp-chain-split-threshold can be decreased as the -ext-tsp-enable-chain-split-along-jumps heuristic is now more powerful.	2023-09-18 22:50:17 -07:00
Wang Pengcheng	3017545e63	[RISCV] Fix inline asm error for block address (#66640 ) After commit cedf2ea, `RISCVMergeBaseOffset` can handle `BlockAddress` currently. But we didn't handle it in `PrintAsmMemoryOperand` so we get `invalid operand in inline asm` error. This patch fixes the error.	2023-09-19 11:46:43 +08:00
Philip Reames	928564caa5	[RISCV] Combine a gather to a larger element type (#66694 ) If we have a gather load whose indices correspond to valid offsets for a gather with element type twice that our source, we can reduce the number of indices and perform the operation at the larger element type. This is generally profitable since we half VL - and these operations are linear in VL. This may require some additional VL/VTYPE toggles, but this appears to be worthwhile on the whole.	2023-09-18 16:55:38 -07:00
weiguozhi	9a04bc4c43	[AArch64] Move LDR_PXI from isStoreToStackSlot to isLoadFromStackSlot (#65658 ) LDR_PXI is a load instruction, so it should be in isLoadFromStackSlot.	2023-09-18 15:52:41 -07:00
Philip Reames	e52c558813	[RISCV] Narrow indices of fixed vector gather/scatter nodes Doing so allows the use of smaller constants overall, and may allow (for some small vector constants) avoiding the constant pool entirely. This can result in extra VTYPE toggles if we get unlucky. This was reviewed under PR #66405.	2023-09-18 11:49:14 -07:00
Craig Topper	8677aaa1a3	[RISCV][GISel] Add initial pre-legalizer combiners copying from AArch64.	2023-09-18 10:59:00 -07:00
Jon Roelofs	83e6d2edfc	Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 )" This reverts commit 003bcad9a8b21e15e3786a52b1dafa844075ab84. ARM folks say it regresses some of their benchmarks: https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162	2023-09-18 09:45:46 -07:00
Philip Reames	0722800289	[RISCV] Match constant indices of non-index type when forming strided ops (#65777 ) When checking to see if our index expressions can be converted into strided operations, we previously gave up if the index type wasn't an exact match for the intptrty for the address. Per gep semantics, this mismatch implies a sext or trunc cast to the respective index type. For constants, go ahead and evaluate that cast instead of giving up. Note that the motivation of this is mostly test cleanup. We canonicalize at IR such that the gep index will match the intptrty. This is mostly useful so that we can write both RV32 and RV64 tests from the same source. Its also helpful in preventing confusion - I've stumbled across this at least four times now and wasted time each one. Note: The test change for scatters unit stride cases contains a minor regression for rv32 and 64 bit indices. This is an artifact of order in which changes are landing. This will be addressed in a near future change for all configurations.	2023-09-18 09:41:34 -07:00
Philip Reames	bb7b8726a4	[RISCV] Merge some test checks rvv/fixed-vectors-masked-gather.ll [nfc]	2023-09-18 09:20:12 -07:00
pawosm-arm	be16b03e20	[AArch64] Remove the Z#_HI register definitions (#66353 ) The Z#_HI register definitions were created during the very early SVE enablement work and before the SVE calling convention was locked in. As they look entirely unused, they need to go.	2023-09-18 17:18:28 +01:00
Craig Topper	8f04d81ede	[SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore. If the SRL for Hi constant folds, but we don't remoe those bits from the Lo, we can end up with strange constant folding through DAGCombine later. I've only seen this with constants being lowered to constant pools during lowering on RISC-V.	2023-09-18 09:10:19 -07:00
Craig Topper	17a12a27ec	[RISCV] Add test case to show bad codegen for unaligned i64 store of a large constant. On the first split we create two i32 trunc stores and a srl to shift the high part down. The srl gets constant folded, but to produce a new i32 constant. But the truncstore for the low store still uses the original constant. This original constant then gets converted to a constant pool before we revisit the stores to further split them. The constant pool prevents further constant folding of the additional srls. After legalization is done, we run DAGCombiner and get some constant folding of srl via computeKnownBits which can peek through the constant pool load. This can create new constants that also need a constant pool.	2023-09-18 09:10:19 -07:00
Craig Topper	f71a9e8bb7	[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601 ) The intrinsic uses ImmArg so TargetConstant would be consistent with how other intrinsics are handled. This hides the constants from type legalization so we can remove the promotion support. isel patterns are updated accordingly.	2023-09-18 08:58:50 -07:00
Nikita Popov	38c59b9f53	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.	2023-09-18 17:24:53 +02:00
Jay Foad	d8d0588f66	[TwoAddressInstruction] Update LiveIntervals after INSERT_SUBREG with undef read (#66211 ) Update LiveIntervals after rewriting: %reg = INSERT_SUBREG undef %reg, %subreg, subidx to: undef %reg:subidx = COPY %subreg D113044 implemented this for the non-undef case.	2023-09-18 14:51:58 +01:00
Nikita Popov	4491f0b969	[IR] Remove unnecessary bitcast from CreateMalloc() This bitcast is no longer necessary with opaque pointers. This results in some annoying variable name changes in tests.	2023-09-18 14:58:16 +02:00
Sergei Barannikov	caaf61eb6e	[SDag] Fold saddo[_carry] with bitwise-not argument to ssubo[_carry] (#66571 ) Fold `(saddo (not a), 1)` to `(ssubo 0, a)` and `(saddo_carry (not a), b, c)` to `(ssubo_carry b, a, !c)`. Proof: https://alive2.llvm.org/ce/z/Lj49YM This is the same as https://reviews.llvm.org/D46505 and https://reviews.llvm.org/D59208, but for signed opcodes.	2023-09-18 14:45:41 +03:00
Jay Foad	102838d3f6	update_mir_test_checks.py: match undef vreg subreg definitions (#66627 ) Following on from D139466 which added support for dead vreg defs, this patch adds support for "undef" defs of subregs. Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which previously required manual tweaks to the autogenerated checks to fix an EXPENSIVE_CHECKS failure; see commit 8b7c1fbd9647a5a6ef246a6b5b2543ea0f5a2337	2023-09-18 12:14:46 +01:00
Piyou Chen	b83a1ed594	[RISCV] Only emit .option when extension is supported It maybe emit the .option directive without any follow up. Only emit the .option push/pop when there are supported extension difference between function and module. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159399	2023-09-18 00:30:13 -07:00
Piyou Chen	d861b3183c	[RISCV][NFC] precommit for D159399 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159400	2023-09-18 00:18:08 -07:00
wangpc	cedf2ea7b5	[RISCV] Teach RISCVMergeBaseOffset to handle BlockAddress We can get `BlockAddress` in user code via `Labels as Values` so we should be able to merge the access to `BlockAddress`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159429	2023-09-18 11:47:14 +08:00
wangpc	28efe4d38e	[RISCV] Add tests for merging base offset of BlockAddress We can get `BlockAddress` in user code via `Labels as Values`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159428	2023-09-18 11:47:13 +08:00
Vettel	ddae50d1e6	[RISCV] Combine trunc (sra sext (x), zext (y)) to sra (x, smin (y, scalarsizeinbits(y) - 1)) (#65728 ) For RVV, If we want to perform an i8 or i16 element-wise vector arithmetic right shift in the upper C/C++ program, the value to be shifted would be first sign extended to i32, and the shift amount would also be zero_extended to i32 to perform the vsra.vv instruction, and followed by a truncate to get the final calculation result, such pattern will later expanded to a series of "vsetvli" and "vnsrl" instructions later, this is because the RVV spec only support 2 * SEW -> SEW truncate. But for vector, the shift amount can also be determined by smin (Y, ScalarSizeInBits(Y) - 1)). Also, for the vsra instruction, we only care about the low lg2(SEW) bits as the shift amount. - Alive2: https://alive2.llvm.org/ce/z/u3-Zdr - C++ Test cases : https://gcc.godbolt.org/z/q1qE7fbha	2023-09-17 17:11:28 +08:00

... 54 55 56 57 58 ...

52796 Commits