llvm-project

Author	SHA1	Message	Date
Craig Topper	c84a43ff3b	[RISCV] Fold (sext_inreg (xor (setcc), -1), i1) -> (add (setcc), -1). (#153855 ) This improves all 3 vendor extensions that make sext_inreg i1 legal Fixes #153781.	2025-08-15 12:55:18 -07:00
Nikita Popov	01bc742185	[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817 ) This ensures that the required fields are set, and also makes the construction more convenient.	2025-08-15 18:06:07 +02:00
Nikita Popov	d1952baa5d	[CodeGen] Remove unnecessary setTypeListBeforeSoften() parameter (NFC) It does not make sense to set the softening type list without setting IsSoften=true.	2025-08-14 10:04:56 +02:00
Sergey Kachkov	bdddff2488	[RISCV][RVV] Prohibit conversion of scalar store to single-element vse if vmv.x.s has multiple uses (#152112 ) Godbolt example: https://godbolt.org/z/ThdfP475a In the example single-element vse is used to store reduction result instead of scalar store ([this optimization was introduced by this patch](https://reviews.llvm.org/D109482)). However, vmv.x.s can't be eliminated here because it has other uses (e.g. CopyToReg), so it seems more profitable to use scalar store (we already have store value in a scalar register, and can save one vsetvli which is likely to be required for single-element vse). The proposed solution is to this transform only if vmv.x.s has one use (in store instruction)	2025-08-13 13:10:27 +03:00
Sam Elliott	7317e3c9dd	[NFC][RISCV] Correct signed/unsigned in Comment	2025-08-12 16:17:22 -07:00
Sam Elliott	9b93ccbcbe	[RISCV] Fix Immediate Check for Xqcibi UGT (#153141 ) The check should be about unsigned 16-bit immediates, not signed ones. This is not a bug per-se, as the old codegen was correct for the uint16_max case, it just didn't end up using `qc.e.bgeui`, which we would prefer it did.	2025-08-12 11:06:00 -07:00
Min-Yih Hsu	c065ed3912	[RISCV] Add intrinsics for strided segment stores with fixed vectors (#152038 ) These are the strided versions of `riscv.segN.store.mask` intrinsics.	2025-08-08 14:08:08 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Kazu Hirata	62fc0028bf	[Target] Remove unnecessary casts (NFC) (#152262 ) value() already returns uint64_t.	2025-08-06 07:11:07 -07:00
Craig Topper	73685583c8	[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593 ) There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.	2025-08-05 16:12:42 -07:00
Luke Lau	5a80274cae	[RISCV] Reuse lowerToScalableOp for more nodes. NFC (#151911 ) A lot of fixed-length custom lowerings just involve inserting the operands into a scalable container and extracting the result out, and lowerToScalableOp already does this. We just need to teach it to handle operands with different element types (but same vector element count), and we can reuse it for vselect/zext/sext/setcc/fcopysign.	2025-08-05 00:36:36 +08:00
Simon Pilgrim	f72c8dce15	[RISCV] canCreateUndefOrPoisonForTargetNode - RISCVISD::SELECT_CC is only for integer comparisons - which can't create poison (#151943 ) The result type is irrelevant - its the comparison type that matters	2025-08-04 16:31:06 +01:00
Gergely Futo	1454db130a	[RISCV] Support resumable non-maskable interrupt handlers (#148134 ) The `rnmi` interrupt attribute value has been added for the `Smrnmi` extension. --------- Co-authored-by: Sam Elliott <sam@lenary.co.uk>	2025-08-04 10:54:50 +02:00
Min-Yih Hsu	401e72c830	[RISCV] Add intrinsics for strided segment loads with fixed vectors (#151611 ) These intrinsics are the strided version of `llvm.riscv.segN.load` intrinsics.	2025-08-01 10:13:46 -07:00
Prabhu Rajasekaran	17ccb849f3	[llvm] Extract and propagate callee_type metadata Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575	2025-07-30 14:56:39 -07:00
Pengcheng Wang	058d96f2d6	[RISCV] Support PreserveMost calling convention (#148214 ) This adds the simplest implementation of `PreserveMost` calling convention and we preserve `x5-x31` (except x6/x7/x28) registers. Fixes #148147.	2025-07-30 16:30:48 +08:00
Luke Lau	2a5ac19605	Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 )" This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves the tests that were added. The original commit mistakenly assumed that if regular bf16/f16 loads and stores could be lowered without zvfbfmin/zvfhmin, then so too could masked loads/stores and gathers/scatters. However SelectionDAG can't actually type-legalize masked.load/stores since it needs to be done in ScalarizeMaskedMemIntrinPass. This was causing crashes on IREE because we now returned true for isLegalMaskedLoadStore. The original intent of this was to remove a discrepancy in the loop vectorizer tests whenever predication was enabled, but this has gone away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we need to reapply this patch.	2025-07-30 13:29:47 +08:00
Philip Reames	ce23830508	[RISCV] Combine a vsse from a vsseg with one active segment (#151198 ) This is a rewrite of the current strided store optimization to be a DAG combine. This allows it to kick in slightly more broadly, in particular for the scalable lowering paths.	2025-07-29 14:05:48 -07:00
Philip Reames	fe93f75cc6	[RISCV] Address post commit style suggestion	2025-07-29 13:23:21 -07:00
Philip Reames	73245b06b3	[RISCV] Rewrite deinterleave load as vlse optimization as DAG combine (#150049 ) This reworks an existing optimization on the fixed vector (shuffle based) deinterleave lowering into a DAG combine. This has the effect of making it kick in much more widely - in particular on the deinterleave intrinsic (i.e. scalable) path, deinterleaveN (without load) lowering, but also the intrinsic lowering paths.	2025-07-29 07:52:39 -07:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Jim Lin	4e3266fb6e	[RISCV] Implement load/store support for XAndesBFHCvt (#150350 ) We use `lh` to load 2 bytes from memory into a gpr, then mask this gpr with -65536 to emulate nan-boxing behavior, and then the value in gpr is moved to fpr using `fmv.w.x`. To move the value back from fpr to gpr, we use `fmv.x.w` and finally, `sh` is used to store the lower 2 bytes back to memory. If zfh is enabled at the same time, we can just use flh/fsw to load/store bf16 directly.	2025-07-25 11:29:17 +08:00
Craig Topper	ae36702807	[RISCV] Guard against out of bound shifts in expandMul. (#150464 ) Spotted while reviewing #150211. If we're multiplying by -3 in i32 MulAmt contains 4,294,967,293 since we zero extend to uint64_t. Adding 3 to this gives 0x100000000 which is a power of 2 and the log2 of that is 32, but we can't shift left by 32 in an i32. Detect this case and skip the transform. We could use 0, but we don't handle the case for i64 so this seemed more consistent. Normally we don't hit this case because decomposeMulByConstant handles it, but that's disabled by Xqciac. And after #150211 the code in expandMul is now unreachable for this case.	2025-07-24 17:27:48 -07:00
Sudharsan Veeravalli	d3937e2d12	[RISCV] Pass sign-extended value to isInt check in expandMul (#150211 ) In the `isInt` check that was added in #147661 we were passing the zero-extended `uint64_t` value instead of the sign-extended one.	2025-07-25 05:47:09 +05:30
Craig Topper	a69cddef43	[RISCV] Add TUPLE_INSERT and TUPLE_EXTRACT to verifyTargetNode. (#150148 ) Verify that the index is an i32 target constant which is what we get from intrinsic lowering. All other inserts and extracts should be the same.	2025-07-22 18:28:11 -07:00
Alex Bradbury	33df888217	[RISCV] Teach RISCVTargetLowering::isFPImmLegal about fli+fneg (#149075 ) There was a mismatch between isFPImmlegal and the cases that are handled by lowerConstantFP. isFPImmLegal didn't check for the case where we support `fli` of a negated constant (and so can lower to fli+fneg). This has very minimal impact (42 insertion, 47 deletions across an rv22u64_zfa llvm-test-suite build including SPEC CPU 2017) but is added here for completeness. See the PR thread https://github.com/llvm/llvm-project/pull/149075 for furrther discussion about the degree to which isFPImmLegal and lowerConstantFP are consistent. We ultimately agreed it makes sense to add fli+fneg, but there may be other future cases where it doesn't make sense to match.	2025-07-22 14:22:26 +01:00
Serge Pavlov	372e99938f	Remove unused variable (#149115 )	2025-07-16 11:28:57 -04:00
Serge Pavlov	c71b92d09f	[RISCV][FPE] Remove unused variable (#149054 ) It was added by me in 905bb5bddb690765cab5416d55ab017d7c832eb3, which committed PR https://github.com/llvm/llvm-project/pull/148569.	2025-07-16 19:56:31 +07:00
Serge Pavlov	905bb5bddb	[RISCV][FPEnv] Lowering of fpmode intrinsics (#148569 ) The change implements custom lowering of `get_fpmode`, `set_fpmode` and `reset_fpmode` for RISCV target. The implementation is aligned with the functions `fegetmode` and `fesetmode` in GLIBC.	2025-07-16 16:02:15 +07:00
Jim Lin	3e4153c97b	[RISCV] Implement Builtins for XAndesBFHCvt extension. (#148804 ) XAndesBFHCvt provides two builtins functions for converting between float and bf16. Users can use them to convert bf16 values loaded from memory to float, perform arithmetic operations, then convert them back to bf16 and store them to memory. The load/store and move operations for bf16 will be handled in a later patch.	2025-07-16 16:13:31 +08:00
Philip Reames	c7d1eae4fc	[RISCV] Use masked segment LD/ST intrinsics in (de)interleaveN lowering [nfc] (#148966 ) Follow up on the work from e5bc7e7d, and extend it to the lowering used for interleave and deinterleave when we can't combine with a nearby memory operation.	2025-07-15 17:12:08 -07:00
Kazu Hirata	7c83d66719	[llvm] Remove unused includes (NFC) (#148768 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-14 22:19:14 -07:00
Sudharsan Veeravalli	0ae1506847	[RISCV] Add ISel patterns for Xqciac QC_SHLADD instruction (#148256 ) Add a couple of patterns to generate the Xqciac QC_SHLADD shift left and add immediate instruction.	2025-07-14 16:43:41 +05:30
Craig Topper	5a95ec6dc1	[RISCV] Add riscv_vlm/vsm to RISCVTargetLowering::getTgtMemIntrinsic. (#148265 )	2025-07-11 16:59:47 -07:00
Min-Yih Hsu	bf94c8ddb3	[RISCV][NFC] Split InterleavedAccess related TLI hooks into a separate file (#148040 ) There have been discussions on splitting RISCVISelLowering.cpp. I think InterleavedAccess related TLI hooks would be some of the low hanging fruit as it's relatively isolated and also because X86 is already doing it. NFC.	2025-07-11 11:04:41 -07:00
Sudharsan Veeravalli	9de657abaf	[RISCV] Add ISel patterns for Xqciac QC.MULIADD instruction (#147661 ) Add basic isel patterns for the multiple accumulate QC.MULIADD instruction. While most case work with just the TD file pattern, there are few cases which need to be handled in ISelLowering depending on the immediate we are multiplying with: - imm + 1 , imm - 1, 1 - imm, -1 - imm are a power of 2 --> these become slli and add/sub - immediate is 2^n - 2 ^m --> this becomes (add/sub (shl X, C1), (shl X, C2)) - imm - 2, imm - 4, imm - 6 is a power of 2 --> these use shxadd when zba is enabled The patch does not decompose mul if Xqciac is present, for the above conditions. There could be cases where this may not beneficial which I plan to address in follow up patches.	2025-07-11 12:16:11 +05:30
quic_hchandel	66969c9494	[RISCV] Add ISel patterns for Qualcomm uC Xqcics extension (#146675 ) Add CodeGen support for conditional select instructions in this extension	2025-07-11 10:27:13 +05:30
Ramkumar Ramachandra	19c2fb2325	[ISel/RISCV] Custom-lower vector [l]lround (#147713 ) Lower it just like the vector [l]lrint, using vfcvt, with the right rounding mode. Updating costs to account for this custom-lowering is left to a companion patch.	2025-07-10 10:33:46 +01:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
Philip Reames	7bf439d260	[IA] Partially revert interface change from 4a66ba As noted in post commit review, the API change here was not required. I'd apparently confused myself when teasing apart patches from my development branch.	2025-07-09 12:02:52 -07:00
Philip Reames	4a66ba2a4d	[IA] Support deinterleave intrinsics w/ fewer than N extracts (#147572 ) For the fixed vector cases, we already support this, but the deinterleave intrinsic cases (primary used by scalable vectors) didn't. Supporting it requires plumbing through the Factor separately from the extracts, as there can now be fewer extracts than the Factor. Note that the fixed vector path handles this slightly differently - it uses the shuffle and indices scheme to achieve the same thing.	2025-07-09 09:41:07 -07:00
Ryan Buchner	8905b1c38f	[RISCV] Efficiently lower (select %cond, andn (f, x), f) using zicond (#147369 ) The following case is now optimized: (select c, (and f, ~x), f) -> (andn f, (czero_eqz x, c))	2025-07-09 09:32:54 -04:00
Ramkumar Ramachandra	9c97b38d44	[ISel/RISCV] Custom-promote [b]f16 in [l]lrint (#146507 ) Extend lowerVectorXRINT to also do a FP_EXTEND_VL when the source element type is [b]f16, and wire up this custom-promote. Updating the cost-model to not give these an invalid cost is left to a companion patch.	2025-07-09 10:24:38 +01:00
Luke Lau	7c812ea01a	[RISCV] Avoid vl toggles when lowering vector_splice/experimental_vp_splice and add +vl-dependent-latency tuning feature (#146746 ) When vectorizing a loop with a fixed-order recurrence we use a splice, which gets lowered to a vslidedown and vslideup pair. However with the way we lower it today we end up with extra vl toggles in the loop, especially with EVL tail folding, e.g: .LBB0_5: # %vector.body # =>This Inner Loop Header: Depth=1 sub a5, a2, a3 sh2add a6, a3, a1 zext.w a7, a4 vsetvli a4, a5, e8, mf2, ta, ma vle32.v v10, (a6) addi a7, a7, -1 vsetivli zero, 1, e32, m2, ta, ma vslidedown.vx v8, v8, a7 sh2add a6, a3, a0 vsetvli zero, a5, e32, m2, ta, ma vslideup.vi v8, v10, 1 vadd.vv v8, v10, v8 add a3, a3, a4 vse32.v v8, (a6) vmv2r.v v8, v10 bne a3, a2, .LBB0_5 Because the vslideup overwrites all but UpOffset elements from the vslidedown, we currently set the vslidedown's AVL to said offset. But in the vslideup we use either VLMAX or the EVL which causes a toggle. This increases the AVL of the vslidedown so it matches vslideup, even if the extra elements are overridden, to avoid the toggle. A new tuning feature +vl-dependent-latency has been added which keeps the old behaviour for microarchitectures that dynamically dispatch uops based on vl, e.g. sifive-x280. +vl-dependent-latency can be reused for the recently proposed Ovlt optimization directive if/when it's ratified: https://lists.riscv.org/g/tech-privileged/message/2487 If we wanted to aggressively optimise for vl at the expense of introducing more toggles we could probably look at doing this in RISCVVLOptimizer.	2025-07-09 11:09:13 +08:00
Craig Topper	be19a27cc5	[RISCV] Correct stride for strided load/store of vectors of pointers in lowerInterleavedLoad/lowerInterleavedStore. (#147598 ) We need to use DataLayout to get the size if the element type is a pointer.	2025-07-08 18:24:50 -07:00
Philip Reames	bdf7812855	[RISCV] Consolidate intrinsic ID tables [nfc]	2025-07-07 13:27:53 -07:00
Ramkumar Ramachandra	499e656cac	[ISel/RISCV] Modernize loops (NFC) (#147281 )	2025-07-07 17:03:08 +01:00
Matt Arsenault	d8ef156379	DAG: Remove verifyReturnAddressArgumentIsConstant (#147240 ) The intrinsic argument is already marked with immarg so non-constant values are rejected by the IR verifier.	2025-07-07 16:28:47 +09:00
Jim Lin	61529d9e36	[RISCV] Remove implied extension Zvfhmin for XAndesVPackFPH (#146861 ) XAndesVPackFPH can actually be used independently without requiring Zvfhmin. Therefore, we remove the implicitly required Zvfhmin extension from XAndesVPackFPH and imply that the f extension is sufficient.	2025-07-04 10:16:20 +08:00
Craig Topper	e35cf02e54	[RISCV] Pass RISCVSubtarget to translateSetCCForBranch. NFC	2025-07-03 13:34:46 -07:00

1 2 3 4 5 ...

2068 Commits