llvm-project

Author	SHA1	Message	Date
Craig Topper	20f544d047	[RISCV][GISel] Instruction selection for G_JUMP_TABLE and G_BRJT. (#71987 )	2023-11-18 12:33:25 -08:00
Craig Topper	8ad4df8327	[RISCV][GISel] Add s32 G_SELECT instruction select test for RV64. NFC	2023-11-18 12:29:04 -08:00
David Green	396e650ef3	[AArch64] Add some testing for BE shuffles. NFC	2023-11-18 20:09:58 +00:00
Craig Topper	0154e53bf3	[RISCV][GISel] Remove the rv32/rv64 subdirectories for legalizer tests. NFC Add -rv32 -rv64 as suffix to test name. First step towards trying to merge the content of these tests.	2023-11-18 11:25:09 -08:00
David Green	303a7835ff	[GreedyRA] Improve RA for nested loop induction variables (#72093 ) Imagine a loop of the form: ``` preheader: %r = def header: bcc latch, inner inner1: .. inner2: b latch latch: %r = subs %r bcc header ``` It can be possible for code to spend a decent amount of time in the header<->latch loop, not going into the inner part of the loop as much. The greedy register allocator can prefer to spill _around_ %r though, adding spills around the subs in the loop, which can be very detrimental for performance. (The case I am looking at is actually a very deeply nested set of loops that repeat the header<->latch pattern at multiple different levels). The greedy RA will apply a preference to spill to the IV, as it is live through the header block. This patch attempts to add a heuristic to prevent that in this case for variables that look like IVs, in a similar regard to the extra spill weight that gets added to variables that look like IVs, that are expensive to spill. That will mean spills are more likely to be pushed into the inner blocks, where they are less likely to be executed and not as expensive as spills around the IV. This gives a 8% speedup in the exchange benchmark from spec2017 when compiled with flang-new, whilst importantly stabilising the scores to be less chaotic to other changes. Running ctmark showed no difference in the compile time. I've tried to run a range of benchmarking for performance, most of which were relatively flat not showing many large differences. One matrix multiply case improved 21.3% due to removing a cascading chains of spills, and some other knock-on effects happen which usually cause small differences in the scores.	2023-11-18 09:55:19 +00:00
Daniil	424c4249cc	[SimplifyCFG] Add optimization for switches of powers of two (#70977 ) Optimization reduces the range for switches whose cases are positive powers of two by replacing each case with count_trailing_zero(case). Resolves #70756	2023-11-18 15:14:14 +08:00
Craig Topper	35ad44ebe4	[RISCV][GISel] Allow G_SELECT to have s32 type on RV64.	2023-11-17 17:12:27 -08:00
Craig Topper	d5ab48e583	[AArch64] Simplify legalizer info for G_JUMP_TABLE and G_BRJT. (#71962 ) Remove s64 as a valid type for G_JUMP_TABLE since I think it is always a pointer? Replace custom predicate for G_BRJT with a legalFor that checks 2 types.	2023-11-17 16:44:24 -08:00
Arthur Eubanks	635756e4f3	[X86] Place data in large sections for large code model (#70265 ) This allows better interoperability mixing small/medium/large code model code since large code model data can be put into separate large sections. And respect large data threshold under large code model. gcc also does this: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html. See https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU.	2023-11-17 15:47:28 -08:00
Philip Reames	144b2f579e	[RISCV] Start vslide1down sequence with a dependency breaking splat (#72691 ) If we are using entirely vslide1downs to initialize an otherwise undef vector, we end up with an implicit_def as the source of the first vslide1down. This register has to be allocated, and creates false dependencies with surrounding code. Instead, start our sequence with a vmv.v.x in the hopes of creating a dependency breaking idiom. Unfortunately, it's not clear this will actually work as due to the VL=0 special case for T.A. the hardware has to work pretty hard to recognize that the vmv.v.x actually has no source dependence. I don't think we can reasonable expect all hardware to have optimized this case, but I also don't see any downside in prefering it.	2023-11-17 12:02:58 -08:00
peterbell10	4263b2ecf8	[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672 ) In openai/triton#2483 I've encountered a bug in the NVPTX codegen. Given `load<8 x half>` followed by `fpext to <8 x float>` we get ``` ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8]; ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15]; ``` Which loads float16 values into float registers without any conversion and the result is simply garbage. This PR brings `v8f16` and `v8bf16` into line with the other vector types by expanding it to load + cvt. cc @manman-ren @Artem-B @jlebar	2023-11-17 09:51:50 -08:00
Simon Pilgrim	bfbfd1caa4	[X86] combineLoad - try to reuse existing constant pool entries for smaller vector constant data If we already have a YMM/ZMM constant that a smaller XMM/YMM has matching lower bits, then ensure we reuse the same constant pool entry. Extends the similar combines we already have to reuse VBROADCAST_LOAD/SUBV_BROADCAST_LOAD constant loads. This is a mainly a canonicalization, but should make it easier for us to merge constant loads in a future commit (related to both #70947 and better X86FixupVectorConstantsPass usage for #71078).	2023-11-17 17:48:37 +00:00
Stanislav Mekhanoshin	7057f8f676	[AMDGPU] Pre-commit fdot2 test. NFC. (#72622 ) This test exposes a bug where we violate constant bus restriction.	2023-11-17 09:32:38 -08:00
Antonio Frighetto	88d0ceb689	[AArch64] Additional test coverage for PR67879 (NFC) Introduce further test exercizing `isAArch64FrameOffsetLegal`.	2023-11-17 17:32:50 +01:00
Sacha Coppey	aeedc07637	[IR] Add GraalVM calling conventions Adds GraalVM calling conventions. The only difference with the default calling conventions is that GraalVM reserves two registers for the heap base and the thread. Since the registers are then accessed by name, getRegisterByName has to be updated accordingly. This patch implements the calling conventions only for X86, AArch64 and RISC-V. For X86, the reserved registers are X14 and X15. For AArch64, they are X27 and X28. For RISC-V, they are X23 and X27. This patch has been used by the LLVM backend of GraalVM's Native Image project in production for around 4 months with no major issues. Differential Revision: https://reviews.llvm.org/D151107	2023-11-17 16:30:09 +00:00
Nemanja Ivanovic	227607190e	[RISCV] Fix crash in PEI with empty entry block with Zcmp (#72117 ) We check the opcode of the first instruction in the block where the prologue is inserted without checking if the iterator points to any instructions. When the basic block is empty, that causes a crash. One way the prologue block can be empty is when it starts with a call to __builtin_readcyclecounter on RV32 since that produces a loop. Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2023-11-17 16:18:44 +01:00
Nemanja Ivanovic	0765f6451f	[RISCV] Use correct register class for Z[df]inx inline asm (#71872 ) Allocate a register of the correct register class for inline asm constraint "r" when used for FP values with -Zfinx/-Zdinx. --------- Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2023-11-17 16:17:48 +01:00
Philip Reames	8f81c605f5	[RISCV] Remove custom instruction selection for VFCVT_RM and friends (#72540 ) We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?	2023-11-17 07:07:37 -08:00
Simon Pilgrim	2ed15877e7	[X86] Ensure asm comments only print the constant values for the vector load's register width We were printing the entire Constant, which if we were loading from a wider constant pool entry meant that we were confusing the asm comment with upper bits that aren't actually part of the load result	2023-11-17 14:30:30 +00:00
Jessica Del	b1e039f3b7	[AMDGPU] - Add constant folding for s_quadmask (#72381 ) If the input is a constant we can constant fold the `s_quadmask` intrinsic.	2023-11-17 15:24:23 +01:00
Simon Pilgrim	0b0440613f	[X86] vec_fabs.ll - regenerate checks and add common AVX512 prefixes	2023-11-17 10:31:19 +00:00
Simon Pilgrim	a66085c84c	[X86] vec_fabs.ll - sort tests into 128/256/512-bit vector types	2023-11-17 10:31:19 +00:00
Simon Pilgrim	58253dcbcd	[X86] getTargetConstantBitsFromNode - bail if we're loading from a constant vector element type larger than the target value size This can be improved upon by just truncating the constant value, but the crash needs to be addressed first. Fixes #72539	2023-11-17 10:01:31 +00:00
Philip Reames	233971b475	[RISCV] Fix typo in a test and regen another to reduce test diff	2023-11-16 14:28:16 -08:00
Philip Reames	1aa493f064	[RISCV] Further expand coverage for insert_vector_elt patterns	2023-11-16 14:14:31 -08:00
David Li	ac3779e92e	Enable Custom Lowering for fabs.v8f16 on AVX (#71730 ) [X86]: Enable custom lowering for fabs.v8f16 on AVX Currently, custom lowering of fabs.v8f16 requires AVX512FP16, which is too restrictive. For v8f16 fabs lowering, no instructions in AVX512FP16 are needed. Without the fix, horribly inefficient code is generated without AVX512FP16. Note instcombiner generates calls to intrinsics @llvm.fabs.v8f16 when simplifyping AND <8 x half> operations.	2023-11-16 13:47:31 -08:00
Philip Reames	73e963379e	[RISCV] Add test coverage for partial buildvecs idioms Test coverage for an upcoming set of changes	2023-11-16 13:33:12 -08:00
Craig Topper	927f6f1858	[RISCV] Use bset+addi for (not (sll -1, X)). This is an alternative to #71420 that handles i32 on RV64 safely by pre-promoting the pattern in DAG combine.	2023-11-16 11:14:53 -08:00
Craig Topper	4eaf986be4	[RISCV] Add test cases for (not (sll -1, X)) for Zbs. NFC We can use (ADDI (BSET X0, X), -1).	2023-11-16 11:14:53 -08:00
Momchil Velikov	4ac5b0da8d	Revert "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This reverts commit 13fe0386454d2f4c9bad4e20fc59699d1a49b8cf. May have broken an LLDB test https://lab.llvm.org/buildbot/#/builders/96/builds/48609	2023-11-16 17:07:39 +00:00
Valery Pykhtin	667ba7f8f3	[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. (#69957 ) For the following testcase: undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32 %3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ... GCNRewritePartialRegUses produced: %4:vgpr_32 = COPY undef %1:sgpr_32 dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ... Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This patch fixes that. Note that GCNRewritePartialRegUses pass isn't enabled by default yet.	2023-11-16 16:56:46 +01:00
Jay Foad	be2388c0d9	[AMDGPU] Prefer v_madak_f32 over v_madmk_f32 to reduce vgpr pressure (#72506 ) As explained in the comment in SIInstrInfo::FoldImmediate, if we have a choice between v_madak_f32 and v_madmk_f32 we should choose the former so that the literal that is not folded into the instruction can be materialized in an sgpr instead of a vgpr.	2023-11-16 12:50:26 +00:00
Momchil Velikov	13fe038645	[MachineSink][AArch64] Enable sink-and-fold by default (#72132 ) Enable the optimisation by default for AArch64 after a compile time regressoin fix in e8209b2486d8	2023-11-16 12:12:56 +00:00
Igor Kirillov	63917e1975	[MachineLICM] Allow hoisting loads from invariant address (#70796 ) Sometimes, loads can appear in a loop after the LICM pass is executed the final time. For example, ExpandMemCmp pass creates loads in a loop, and one of the operands may be an invariant address. This patch extends the pre-regalloc stage MachineLICM by allowing to hoist invariant loads from loops that don't have any stores or calls and allows load reorderings.	2023-11-16 11:12:10 +00:00
Matt Devereau	e8dd7ecbc4	Revert "[AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (#71795 )" This reverts commit cc1244980b74f45a06e2002a33444ce757b577aa.	2023-11-16 11:01:27 +00:00
Valery Pykhtin	24c3cd1a51	[AMDGPU] Update rewrite-partial-reg-uses tests. NFC. (#72499 )	2023-11-16 11:48:39 +01:00
Jessica Del	af05f9ff06	[AMDGPU] - Add constant folding for s_bitreplicate (#72366 ) If the input is a constant, we can constant fold the s_bitreplicate operation.	2023-11-16 09:08:00 +01:00
Christudasan Devadasan	ce7fd498ed	[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140 ) We adjust the insertion point at the BB top for spills/copies during RA to ensure they are placed after the exec restore instructions required for the divergent control flow execution. This is, however, required only for the vector operations. The insertions for scalar registers can still go to the BB top.	2023-11-16 10:30:03 +05:30
LiaoChunyu	71a7108ee9	[RISCV][MC] MC layer support for xcvmem and xcvelw extensions This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P. Several other extensions have been merged. Spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, Nandni Jamnadas, @PaoloS, @simoncook, @xmj, @realqhc, @melonedo, @adeelahmad81299 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158824	2023-11-16 09:46:11 +08:00
Fangrui Song	103811a27a	[RISCV,GISel] Unconditionally use MO_PLT for calls (#72355 ) All known linkers handle R_RISCV_CALL and R_RISCV_CALL_PLT in the same way (GNU ld since https://sourceware.org/pipermail/binutils/2020-August/112750.html). MO_CALL is for R_RISCV_CALL, a deprecated relocation type. We don't migrate away from MO_CALL yet. For GISel we don't have the output difference concern and should weigh more on simplicity.	2023-11-15 15:18:47 -08:00
Michael Maitland	dbd884cd3d	[RISCV][GISEL] Add vector RegisterBanks and vector support in getRegBankFromRegClass Vector Register banks are created for the various register vector register groupings. getRegBankFromRegClass is implemented to go from vector TargetRegisterClass to the corresponding vector RegisterBank.	2023-11-15 15:08:29 -08:00
Michael Maitland	725e599637	[RISCV][GISEL] Add support for scalable vector types in lowerReturnVal (#71587 ) Scalable vector types from LLVM IR are lowered into physical vector registers in MIR based on calling convention for return instructions.	2023-11-15 17:30:53 -05:00
Craig Topper	c281a6add5	[RISCV] Add isel pattern for int_riscv_vfmv_s_f with scalar FP constant operand. Use vmv_s_x instead of the constant will be materialized in a GPR. This avoids going from GPR to FPR to vector. We already did this for RISCVISD::VFMV_S_F_VL and probably we should just turn int_riscv_vfmv_s_f into RISCVISD::VFMV_S_F_VL, but I'd like to see some improvements to RISCVInsertVSETVLI first.	2023-11-15 10:51:43 -08:00
Craig Topper	084f5c26a4	[RISCV] Add tests cases to show missed opportunity to turn vfmv.s.f into vmv.s.x when source is FP constant materialized in GPR. We end up creating the constant in GPR, move to FPR, then move to vector. We should go directly from GPR to vector.	2023-11-15 10:51:43 -08:00
Artem Belevich	4f33331317	[NVPTX] split long-running wmma.py test into smaller chunks. (#72331 )	2023-11-15 09:55:47 -08:00
Craig Topper	1c033aaac9	[RISCV] Add IsSignExtendingOpW to AMO*_W instructions. (#72349 )	2023-11-15 09:39:31 -08:00
Craig Topper	e12677db8b	[RISCV] Add test cases showing missed opportunity to remove sext.w after amo*.w. NFC We should tell RISCVOptWInstrs that these instructions sign extend their results.	2023-11-15 09:37:09 -08:00
Simon Pilgrim	de41396895	[DAG] foldABSToABD - add support for abs(sub(sign_extend_inreg(),sign_extend_inreg())) patterns Partial fix for ABDS regressions on D152928	2023-11-15 15:49:30 +00:00
petar-avramovic	95dd0b04d2	AMDGPU/SILowerI1Copies process phi incomings in specific order (#72375 ) When merging lane masks, value from block that is always visited first (PrevReg in buildMergeLaneMasks) needs to exist because we do on-the-fly constant folding. For PrevReg to exist, basic block that should contain PrevReg definition must be processed first. Sort the incomings such that incoming values that dominate other incoming values are processed first. Sorting of phi incomings makes no changes for phis created by SDAG because SDAG adds phi incomings as it selects basic blocks in reversed post order traversal. This change is required by upcoming lane mask merging implementation for GlobalISel that leaves phi incomings as they are in IR.	2023-11-15 16:27:51 +01:00
Tavian Barnes	75cf672b12	[SDAG] Simplify is-power-of-2 codegen (#72275 ) When x is not known to be nonzero, ctpop(x) == 1 is expanded to x != 0 && (x & (x - 1)) == 0 resulting in codegen like leal -1(%rdi), %eax testl %eax, %edi sete %cl testl %edi, %edi setne %al andb %cl, %al But another expression that works is (x ^ (x - 1)) > x - 1 which has nicer codegen: leal -1(%rdi), %eax xorl %eax, %edi cmpl %eax, %edi seta %al	2023-11-15 22:26:34 +09:00

... 38 39 40 41 42 ...

52796 Commits