llvm-project

Author	SHA1	Message	Date
Sudharsan Veeravalli	c4645ffeda	[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169 ) The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and write CSRs. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-28 12:46:15 +05:30
Pengcheng Wang	d36a4c0715	[RISCV] Rename some Feature* to Tune* (#117966 ) These features should be tune features.	2024-11-28 15:01:49 +08:00
Craig Topper	80afdbe6a5	[RISCV] Use RISCVSubtarget::is64Bit() instead of hasFeature(RISCV::Feature64Bit). NFC	2024-11-27 14:02:15 -08:00
Philip Reames	c6f2d35c4d	Fix a build warning introduce by my febbf910	2024-11-27 13:41:29 -08:00
Felipe Magno de Almeida	e3fdc3aa81	[RISCV] Allow hoisting VXRM writes out of loops speculatively (#110044 ) Change the intersect for the anticipated algorithm to ignore unknown when anticipating. This effectively allows VXRM writes speculatively because it could do a VXRM write even when there's branches where VXRM is unneeded. The importance of this change is because VXRM writes causes pipeline flushes in some micro-architectures and so it makes sense to allow more aggressive hoisting even if it causes some degradation for the slow path. An example is this code: ``` typedef unsigned char uint8_t; __attribute__ ((noipa)) void foo (uint8_t dst, int i_dst_stride, uint8_t src1, int i_src1_stride, uint8_t *src2, int i_src2_stride, int i_width, int i_height ) { for( int y = 0; y < i_height; y++ ) { for( int x = 0; x < i_width; x++ ) dst[x] = ( src1[x] + src2[x] + 1 ) >> 1; dst += i_dst_stride; src1 += i_src1_stride; src2 += i_src2_stride; } } ``` With this patch, the code above generates a hoisting VXRM writes out of the outer loop.	2024-11-27 13:31:39 -08:00
Philip Reames	febbf9105f	[RISCV] Match vcompress during shuffle lowering (#117748 ) This change matches a subset of vcompress patterns during shuffle lowering. The subset implemented requires a contiguous prefix of demanded elements followed by undefs. This subset was chosen for two reasons: 1) which elements to spurious demand is a non-obvious problem, and 2) my first several attempts at implementing the general case were buggy. I decided to go with the simple case to start with. vcompress scales better with LMUL than a general vrgather, and at least the SpaceMit X60, has higher throughput even at m1. It also has the advantage of requiring smaller vector constants at one bit per element as opposed to vrgather which is a minimum of 8 bits per element. The downside to using vcompress is that we can't fold a vselect into it, as there is no masked vcompress variant. For reference, here are the relevant throughputs from camel-cdr's data table on BP3 (X60): vrgather.vv v8,v16,v24 4.0 16.0 64.0 256.0 vcompress.vm v8,v16,v24 3.0 10.0 36.0 136. vmerge.vvm v8,v16,v24,v0 2.0 4.0 8.0 16.0 The largest concern with the extra vmerge is that we locally increase register pressure. If we do have masking, we also have a passthru, without the ability to fold that into the vcompress, we need to keep it alive a bit longer. This can hurt at e.g. m8 where we have very few architectural registers. As compared with the vrgather.vv sequence, this is only one additional m1 VREG - since we no longer need the index vector. It compares slightly worse against vrgatherie16.vv which can use index vectors smaller than other operands. Note that we could potentially fold the vmerge if only tail elements are being preserved; I haven't investigated this. It is unfortunately hard given our current lowering structure to know if we're emitting a shuffle where masking will follow. Thankfully, it doesn't seem to show up much in practice, so I think we can probably ignore it. This patch only handles single source compress idioms at the moment. This is an effort to avoid interacting with other patches on review for changing how we canonicalize length changing shuffles.	2024-11-27 13:23:18 -08:00
Craig Topper	175051b05e	[RISCV][GISel] Support libcalls for f32/f64 acos/asin/atan/atan2/cosh/sinh/tanh.	2024-11-27 12:23:12 -08:00
Craig Topper	d7643e8610	[RISCV][GISel] Support f32/f64 llvm.exp10 intrinsics.	2024-11-27 10:24:33 -08:00
Craig Topper	50dfb0772b	[RISCV] Support f32/f64 libcalls for sin/cos/pow/log/log2/log10/exp/exp2 Test cases copied from SelectionDAG.	2024-11-26 23:35:52 -08:00
Brandon Wu	4a7dbede6b	[RISCV] Support `svukte` extension (#115657 ) This is the extension for "Address-Independent Latency of User-Mode Faults to Supervisor Addresses". Spec: https://github.com/riscv/riscv-isa-manual/pull/1564, https://lf-riscv.atlassian.net/browse/RVS-2977 The spec states that the `svukte` depends on `sv39`, but we don't have `sv39` yet, so I didn't add it to the implied list.	2024-11-27 10:54:57 +08:00
Craig Topper	43b6b78771	[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660 ) LegalizerHelp only supported f128 libcalls and incorrectly assumed that the destination register for the G_FCMP was s32.	2024-11-26 15:48:49 -08:00
Mark Goncharov	80df56e03b	Reapply "[RISCV] Implement tail call optimization in machine outliner" (#117700 ) This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`. It was failed due to previously merged commit `[TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR #108889)`. So, regenerated `compress-opt-select` test.	2024-11-26 23:39:45 +08:00
Mehdi Amini	f94bd3c933	Revert "[RISCV] Implement tail call optimization in machine outliner" (#117710 ) Reverts llvm/llvm-project#115297 Bots are broken	2024-11-26 13:45:47 +01:00
Mark Goncharov	29062329f3	[RISCV] Implement tail call optimization in machine outliner (#115297 ) Following up issue #89822, this patch adds opportunity to use tail call in machine outliner pass. Also it enables outline patterns with X5(T0) register.	2024-11-26 12:30:37 +03:00
LiqinWeng	c3377af4c3	[RISCV][CostModel] add cost for cttz/ctlz under the non-zvbb (#117515 )	2024-11-26 11:40:52 +08:00
Philip Reames	6657d4bd70	[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889 ) This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving. ``` define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb: %i = icmp eq i32 %arg1, 1 br i1 %i, label %bb2, label %bb5 bb2: ; preds = %bb %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4 %i4 = load i32, ptr %i3, align 4 br label %bb5 bb5: ; preds = %bb2, %bb %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ] ret i32 %i6 } ``` Right now, we codegen this as: ``` li a3, 1 li a2, 13 bne a1, a3, .LBB0_2 lw a2, 4(a0) .LBB0_2: mv a0, a2 ret ``` In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we would split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.	2024-11-25 18:59:31 -08:00
Pengcheng Wang	6633916ef5	[RISCV] Remove getPostRAMutations (#117527 ) We are using `PostMachineScheduler` instead of `PostRAScheduler` since #68696. The hook `getPostRAMutations` is only used in `PostRAScheduler` so it is actually dead code for RISC-V now.	2024-11-26 10:55:43 +08:00
LiqinWeng	dd7aabf7c0	[TTI][RISCV] Deduplicate type-based VP costing of vpcmp/vpcast (#117520 ) Refered to: https://github.com/llvm/llvm-project/pull/115983	2024-11-26 10:49:24 +08:00
Craig Topper	c2bb056482	[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix integer promotion of STRICT_FLDEXP in type legalizer. (#117633 ) A special case in type legalization wasn't accounting for different operand numbering between FLDEXP and STRICT_FLDEXP. AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for it.	2024-11-25 16:12:45 -08:00
Kazu Hirata	8e510b8472	[RISCV] Fix a warning This patch fixes: llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp:476:25: error: unused variable 'ST' [-Werror,-Wunused-variable]	2024-11-25 12:57:05 -08:00
Philip Reames	d733fa1c90	[RISCV] Consolidate VLS codepaths in stack frame manipulation [nfc] (#117605 ) We can move the logic from adjustStackForRVV into adjustReg, which results in the remaining logic being trivially inlined to the two callers and allows a duplicate copy of the same logic in eliminateFrameIndex to be pruned.	2024-11-25 12:40:37 -08:00
Craig Topper	ed6749a405	[RISCV] Promote frexp with Zfh. The default expansion tries to create an illegal integer type after legalization.	2024-11-25 10:27:37 -08:00
Craig Topper	29828b26fa	[RISCV] Fix double counting scalar CSRs with Zcmp when emitting cfi_offset for RVV CSRs. (#117408 ) getCalleeSavedStackSize() already contains RVPushStackSize. Don't subtract it again.	2024-11-25 10:03:48 -08:00
Raphael Moreira Zinsly	d88ed9357a	[NFC][RISCV] Refactor allocation of the stack space (#116625 ) Separates the stack allocations from prologue in preparation for the stack clash protection support.	2024-11-25 09:36:15 -08:00
Craig Topper	20bd029a40	[RISCV] Promote fldexp with Zfh. (#117396 ) The default expansion tries to create i16 operations after type legalization. Fixes #117349	2024-11-25 09:08:56 -08:00
David Sherwood	9b76e7fc60	Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 )" (#117556 ) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.	2024-11-25 13:49:21 +00:00
Luke Lau	15fadeb2aa	[RISCV] Add cost for @llvm.experimental.vp.splat (#117313 ) This is split off from #115274. There doesn't seem to be an easy way to share this with getShuffleCost since that requires passing in a real insert_element operand to get it to recognise it's a scalar splat. For i1 vectors we can't currently lower them so it returns an invalid cost. --------- Co-authored-by: Shih-Po Hung <shihpo.hung@sifive.com>	2024-11-25 11:28:46 +01:00
LiqinWeng	db14010405	[RISCV][TTI] Implement cost of intrinsic abs with LMUL (#115813 )	2024-11-25 17:35:58 +08:00
David Sherwood	22ec44f509	[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031 ) For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.	2024-11-25 09:25:01 +00:00
Craig Topper	3fb0bea859	[RISCV][GISel] Add register class to some isel output patterns so they can be imported. This makes (fcopysign X, (fneg Y)) patterns work.	2024-11-24 19:29:52 -08:00
hev	e26af0938c	[llvm] Add `BasicTTIImpl::areInlineCompatible` for target feature subset checks (#117493 ) This patch moves the `areInlineCompatible` implementation from multiple subclasses (`AArch64TTIImpl`, `RISCVTTIImpl`, `WebAssemblyTTIImpl`) to the base class `BasicTTIImpl`. The new implementation checks whether the callee's target features are a subset of the caller's, enabling consistent behavior across targets. Subclasses now simply delegate to the base implementation, reducing code duplication and improving maintainability.	2024-11-25 11:22:49 +08:00
Craig Topper	bb5bbe523d	[RISCV][GISel] Support s32/s64 G_FSUB/FDIV/FNEG without F/D extensions. Use libcalls for G_FSUB/FDIV. Use integer operations for G_FNEG. Copy most of the IR tests for arithmetic from SelectionDAG.	2024-11-24 18:22:12 -08:00
LiqinWeng	48b13ca48b	[RISCV][CostModel] cost of vector cttz/ctlz under ZVBB (#115800 )	2024-11-24 09:18:18 +08:00
Craig Topper	213b849c5e	[RISCV][GISel] Use libcalls for some FP instructions when F/D aren't present. This is based on what fails when adding integer only RUN lines to float-intrinsics.ll and double-intrinsics.ll. We're still missing a lot of test cases that SelectionDAG has. These will be added in future patches.	2024-11-23 11:43:14 -08:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Pengcheng Wang	4da960b898	[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 ) We can get these information via `sys_riscv_hwprobe`. This can be used to implement `__builtin_cpu_is`.	2024-11-22 22:58:54 +08:00
Mikhail Goncharov	d1dae1e861	Revert "[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 )" chain This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8. This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de. This reverts commit 775148f2367600f90d28684549865ee9ea2f11be. multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076	2024-11-22 14:09:13 +01:00
Pengcheng Wang	775148f236	[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 ) We can get these information via `sys_riscv_hwprobe`. This can be used to implement `__builtin_cpu_is`.	2024-11-22 19:54:45 +08:00
Craig Topper	f84fc44f1a	[RISCV][GISel] Make s16->s32 G_ANYEXT/SEXT/ZEXT legal.	2024-11-21 22:45:25 -08:00
Jim Lin	bd15c7c1ca	[RISCV] Make A implies Zaamo and Zalrsc (#116907 ) Ref: https://github.com/riscv/riscv-isa-manual/blob/main/src/a-st-ext.adoc.	2024-11-22 10:35:38 +08:00
Craig Topper	8e65b72691	[RISCV] Fix double counting CSRs with Zcmp in RISCVFrameLowering::getFrameIndexReference. (#117207 ) The Zcmp callee saved registers are already accounted for in getCalleeSavedStackSize(). Subtracting RVPushStackSize subtracts them a second time leading to incorrect stack offsets during frame index elimination. This should have been removed in 0de2b26942f890a6ec84cd75ac7abe3f6f2b2e37 when Zcmp handling was changed. Prior to that, RVPushStackSize was not included in getCalleeSavedStackSize(). The commit message at the time noted that Zcmp+RVV was likely broken.	2024-11-21 13:53:15 -08:00
Craig Topper	29afbd5893	[RISCV] Add DAG combine to convert (iX ctpop (bitcast (vXi1 A))) into vcpop.m. (#117062 ) This only handles the simplest case where vXi1 is a legal vector type. If the vector type isn't legal we need to go through type legalization, but the pattern gets much harder to recognize after that. Either because ctpop gets expanded due to Zbb not being enabled, or the bitcast becoming a bitcast+extractelt, or the ctpop being split into multiple ctpops and adds, etc.	2024-11-21 11:12:07 -08:00
Min-Yih Hsu	0165f8817c	[RISCV] Fix the worst case for VSHA2MS in SiFive P400/P600 scheduling models (#116893 ) For each RVV instruction we should have a single WriteRes assignment to the worst case scheduling class. This assignment is usually equal to that of the largest LMUL + smallest SEW. My #114317 accidentally made two of these assignments on `WriteVSHA2MSV_WorstCase`. This won't affect our MachineScheduler nor most of our llvm-mca use cases (assuming you populate the correct LMUL and SEW), yet it's not ideal either. This patch fixes this issue by assigning the correct numbers and resource mapping to `WriteVSHA2MSV_WorstCase`, which is equal to that of largest LMUL + _largest_ SEW (Zvknh's scheduling properties are special). I also added a MCA test to make sure we always pick up the correct worst case numbers for P600's scheduling model. Original issue was reported by @reidtatge	2024-11-21 10:59:46 -08:00
Craig Topper	cdd1e27124	[X86][RISCV] Don't emit JumpTableDebugInfo unless triple is OSBinFormatCOFF. (#117083 ) This makes the override in RISCV and X86 consistent with the base class implementation of expandIndirectJTBranch.	2024-11-21 09:38:16 -08:00
Craig Topper	e9c561e934	[RISCV][GISel] Add atomic load/store test. Add additional atomic load/store isel patterns."	2024-11-20 22:23:18 -08:00
Craig Topper	4087b871c5	[RISCV][GISel] Move G_BRJT expansion to legalization (#73711 ) Instead of custom selecting a bunch of instructions, we can expand to generic MIR during legalization.	2024-11-20 13:43:36 -08:00
Sam Elliott	408659c5b5	[RISCV] Merge GPRPair and GPRF64Pair (#116094 ) As suggested by Craig, this tries to merge the two sets of register classes created in #112983, GPRPair* and GPRF64Pair*. - I added some explicit annotations to `RISCVInstrInfoD.td` which fixed the type inference issues I was seeing from tablegen for select patterns. - I've had to make the behaviour of `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` cover more cases, because you cannot bitcast to/from untyped (the bitcast would otherwise have been inserted automatically by TargetLowering code). - I apparently didn't need to change `getNumRegisters` again, which continues to tell me there's a bug in the code for tied inputs. I added some more test coverage of this case but it didn't seem to help find the asserts I was finding before - I think the difference is between the default behaviour for integers which doesn't apply to floats. - There's still a difference between BuildGPRPair and BuildPairF64 (and the same for SplitGPRPair and SplitF64). I'm not happy with this, I think it's quite confusing, as they're very similar, just differing in whether they give a `untyped` or a `f64`. I haven't really worked out how the DAGCombiner copes if one meets the other, I know we have some of this for the f64 variants already, but they're a lot more complex than the GPRPair variants anyway.	2024-11-20 10:08:55 +00:00
Craig Topper	2bf6751522	[RISCV] Add IsRV32 some patterns in RISCVInstrInfoXTHead.td. This restores the code to its original state before I experimented with making i32 a legal type.	2024-11-19 21:41:14 -08:00
Petr Penzin	41c86ca714	[RISCV] Add TT-Ascalon-d8 processor (#115100 ) Ascalon is an out-of-order CPU core from Tenstorrent. Overview: https://tenstorrent.com/ip/tt-ascalon Adding 8-wide version, -mcpu=tt-ascalon-d8. Scheduling model will be added in a separate PR. --------- Co-authored-by: Anton Blanchard <antonb@tenstorrent.com>	2024-11-19 14:20:55 -08:00
Craig Topper	eff60d83b0	[RISCV][GISel] Make extended loads and truncating stores with s16 register type and s8 memory type legal. This addresses some failures I've seen in testing on real code.	2024-11-19 11:57:35 -08:00

1 2 3 4 5 ...

5717 Commits