llvm-project

Author	SHA1	Message	Date
Philip Reames	7e64ade2ef	[RISCV] Extend zvqdot matching to handle reduction trees (#138965 ) Now that we have matching for vqdot in it's basic variants, we can extend the matcher to handle reduction trees instead of individual reductions. This is important as we canonicalize reductions by performing a tree in the vector domain before the root reduction instruction. The particular approach taken here has the unfortunate implication that non-matches visit the entire reduction tree once for each time the reduction root is visited in DAG. While conceptually problematic for compile time, this is probably fine in practice as we should only visit the root once per pass of DAGCombine. I don't really see a better solution - suggestions welcome. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-05-09 08:09:46 -07:00
Philip Reames	80370465d9	[DAG] Add wrappers for insert_vector_elt and extract_vector_elt [nfc] (#139141 ) As with the recently added subvector variants, provide the unsigned index operand to simplify a bunch of code. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-05-09 06:37:58 -07:00
Craig Topper	501dcab68e	[RISCV] Limit VLEN in getOptimalMemOpType to prevent creating invalid MVTs. (#139116 ) We only guarantee that types that are 1024 bytes or smaller exist in the MVT enum. Fixes #139075.	2025-05-08 16:27:36 -07:00
Craig Topper	a25d38ddb7	[RISCV] Correct the SDTypeProfile for RISCVISD::PROBED_ALLOCA (#139135 )	2025-05-08 16:26:23 -07:00
Philip Reames	92d9492292	[RISCV] Address post-commit review feedback on 1ac489c8e As noted by @s-barannikov, the last argument wasn't reflected in the type profile for the SDNode, nor was it being used by the patterns.	2025-05-08 14:30:54 -07:00
Philip Reames	21130d3f06	[RISCV] One last migration to getInsertSubvector [nfc]	2025-05-08 11:26:42 -07:00
Philip Reames	54bb2295c3	[RISCV] Migrate getConstant indexed insert/extract subvector to new API (#139111 ) Note that this change is possibly not NFC. The prior routines used getConstant with XLenVT. The new wrappers will used getVectorIdxConstant instead. Digging through the code, the type used for the index will be the integer of pointer width from DL. For typical RV32 and RV64 configurations the pointer will be of equal width to XLEN, but you could have a 32b pointer on an RV64 machine.	2025-05-08 11:11:55 -07:00
Min-Yih Hsu	81786b9185	[RISCV][NFC] Remove unused variable Remove unused variable in RISCVTargetLowering	2025-05-08 10:09:04 -07:00
Philip Reames	a2b28a6812	[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector Follow up to 6e654caab and cf2f5585. I'd apparently missed two cases.	2025-05-08 09:59:24 -07:00
Philip Reames	cf2f558501	[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvector Follow up to 6e654caab, use the new routines in more places. Note that I've excluded from this patch any case which uses a getConstant index instead of a getVectorIdxConstant index just to minimize room for error. I'll get those in a separate follow up.	2025-05-08 09:40:45 -07:00
Min-Yih Hsu	808a5f15d7	[RISCV] Remove`riscv.segN.load/store` in favor of their mask variants (#137045 ) RISCVVectorPeepholePass would replace instructions with all-ones mask with their unmask variant, so there isn't really a point to keep separate versions of intrinsics. Note that `riscv.segN.load/store.mask` does not take pointer type (i.e. address space) as part of its overloading type signature, because RISC-V doesn't really use address spaces other than the default one.	2025-05-08 09:27:26 -07:00
Philip Reames	6e654caabe	[DAG] Add wrappers for insert and extract sub-vector [nfc] (#137230 ) Mechanical change to introduce the new wrappers, and add enough users to make the usage pattern clear. Once this lands, I'm going to do a further pass to adjust more callsites as separate changes. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-05-08 06:49:37 -07:00
Min-Yih Hsu	63fcce6611	[IA][RISCV] Add support for vp.load/vp.store with shufflevector (#135445 ) Teach InterleavedAccessPass to recognize vp.load + shufflevector and shufflevector + vp.store. Though this patch only adds RISC-V support to actually lower this pattern. The vp.load/vp.store in this pattern require constant mask.	2025-05-07 15:51:19 -07:00
Philip Reames	5f7213e0b7	[RISCV] Fix a build error in 1ac489c8 Running a merge and test cycle only works if you remember to actually commit the final result...	2025-05-07 08:17:23 -07:00
Philip Reames	1ac489c8e3	[RISCV] Initial codegen support for zvqdotq extension (#137039 ) This patch adds pattern matching for the basic usages of the dot product instructions introduced by the experimental zvqdotq extension. It specifically only handles the case where the pattern is feeding a i32 sum reduction as we need to reassociate the reduction tree to use these instructions. The vecreduce_add (sext) and vecreduce_add (zext) cases are included mostly to exercise the VX matchers. For the generic matching, we fail to match due to an order of combine issue which results in the bitcast being separated from the splat. I chose to do this lowering as an early combine so as to avoid having to integrate the entire logic into the reduction lowering flow. In particular, that would get a lot more complicated as we extend this to handle add-trees feeding the reductions.	2025-05-07 08:15:44 -07:00
Nikita Popov	b492ec5899	[ErrorHandling] Add reportFatalInternalError + reportFatalUsageError (NFC) (#138251 ) This implements the result of the discussion at: https://discourse.llvm.org/t/rfc-report-fatal-error-and-the-default-value-of-gencrashdialog/73587 There are two different use cases for report_fatal_error, so replace it with two functions reportFatalInternalError() and reportFatalUsageError(). The former indicates a bug in LLVM and generates a crash dialog. The latter does not. The names have been suggested by rnk and people seemed to like them. This replaces a lot of the usages that passed an explicit value for GenCrashDiag. I did not bulk replace remaining report_fatal_error usage -- they probably require case by case review for which function to use.	2025-05-05 12:10:03 +02:00
Philip Reames	38c2833701	[RISCV] Use early return to simplify VLA shuffle lowering [nfc]	2025-05-01 07:47:27 -07:00
Pengcheng Wang	8f75747935	[RISCV][NFC] Convert some predicates to TIIPredicate (#129658 ) These predicates can also be used in macro fusion and scheduling model.	2025-04-30 16:59:05 +08:00
Craig Topper	900505900e	[RISCV] Check the VT for R and cR inline asm constraints is 2*xlen. (#137749 ) Fixes #137726.	2025-04-29 08:12:32 -07:00
Jan Górski	d7e631c7cd	[RISCV] Remove `AND` mask generated by `( zext ( atomic_load ) )` by replacing the load with `zextload` for orderings not stronger then monotonic. (#136502 ) Extends changes from [ff687af](`ff687af04f`). Fixes https://github.com/llvm/llvm-project/issues/131476. This patch adds a DAG combine to replace an `AND` of an `ATOMIC_LOAD` with a full-bit mask (e.g. `0xFF`, `0xFFFF`, etc.) which is generated as a result of `(zext (atomic_load))`, by a zero-extended load, provided the atomic operation is monotonic or weaker.	2025-04-28 08:35:51 -07:00
YunQiang Su	e9a34e4236	[RISCV] Support vectorizing FMINIMUMNUM and FMAXIMUMNUM (#135727 ) RISC-V V extension support vfmax and vfmin, which follow IEEE754-2019. We can use them directly.	2025-04-27 19:10:02 +08:00
Sam Elliott	cfc5baf6e6	[RISCV] SiFive CLIC Support (#132481 ) This Change adds support for two SiFive vendor attributes in clang: - "SiFive-CLIC-preemptible" - "SiFive-CLIC-stack-swap" These can be given together, and can be combined with "machine", but cannot be combined with any other interrupt attribute values. These are handled primarily in RISCVFrameLowering: - "SiFive-CLIC-stack-swap" entails swapping `sp` with `sf.mscratchcsw` at function entry and exit, which holds the trap stack pointer. - "SiFive-CLIC-preemptible" entails saving `mcause` and `mepc` before re-enabling interrupts using `mstatus`. To save these, `s0` and `s1` are first spilled to the stack, and then the values are read into these registers. If these registers are used in the function, their values will be spilled a second time onto the stack with the generic callee-saved-register handling. At the end of the function interrupts are disabled again before `mepc` and `mcause` are restored. This Change also adds support for the following two experimental extensions, which only contain CSRs: - XSfsclic - for SiFive's CLIC Supervisor-Mode CSRs - XSfmclic - for SiFive's CLIC Machine-Mode CSRs The latter is needed for interrupt support. The CFI information for this implementation is not correct, but I'd prefer to correct this in a follow-up. While it's unlikely anyone wants to unwind through a handler, the CFI information is also used by debuggers so it would be good to get it right. Co-authored-by: Ana Pazos <apazos@quicinc.com>	2025-04-25 17:12:27 -07:00
Matt Arsenault	91865ac9ba	Use isa instead of !dyn_cast (#137344 )	2025-04-25 19:11:56 +02:00
Philip Reames	9062a38d5d	[RISCV] Add codegen support for ri.vinsert.v.x and ri.vextract.x.v (#136708 ) These instructions are included in XRivosVisni. They perform a scalar insert into a vector (with a potentially non-zero index) and a scalar extract from a vector (with a potentially non-zero index) respectively. They're very analogous to vmv.s.x and vmv.x.s respectively. The instructions do have a couple restrictions: 1) Only constant indices are supported w/a uimm5 format. 2) There are no FP variants. One important property of these instructions is that their throughput and latency are expected to be LMUL independent.	2025-04-25 07:59:40 -07:00
Philip Reames	b278aa3197	[RISCV] Make xrivosvizip interleave2 and deinterleave2 undef safe (#136733 ) We're duplicating uses here, so we need to freeze the inputs. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2025-04-24 08:25:12 -07:00
Sergei Barannikov	7af555e524	[ARM][RISCV] Partially revert #101786 (#137120 ) The change as is breaks the Linux kernel build as pointed out in the comments.	2025-04-24 10:13:05 +03:00
Luke Lau	717efc0a99	[RISCV] Support disjoint RISCVISD::OR_VL in combineOp_VLToVWOp_VL (#136820 ) This handles combining fixed-length disjoint ors to vwadd[u].wv, as was done for scalable vectors in #86929. vwadd[u].vv patterns need to be handled separately with a pattern in a separate patch due to the extends being sunk, see #136716.	2025-04-23 18:43:55 +08:00
Sergei Barannikov	11a3de7e98	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786 ) This is a reland of #99752 with the bug fixed (see test diff in the third commit in this PR). All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type of the argument, which can be wider than `int`. The fix is to make DAG legalizer pass the correct return type to `makeLibCall` and sign-extend the result afterwards. Original commit message: The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly. Pull Request: https://github.com/llvm/llvm-project/pull/101786	2025-04-23 12:43:05 +03:00
David Green	98b6f8dc69	[CostModel] Remove optional from InstructionCost::getValue() (#135596 ) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.	2025-04-23 07:46:27 +01:00
Philip Reames	4dbf67de40	[RISCV] Lower SEW<=32 vector_deinterleave(2) via vunzip2{a,b} (#136463 ) This is a continuation from 22d5890c and adds the neccessary logic to handle SEW!=64 profitably. The interesting case is needing to handle e.g. a single m1 which is split via extract_subvector into two operands, and form that back into a single m1 operation - instead of letting the vslidedown by vlenb/Constant sequence be generated. This is analogous to the getSingleShuffleSrc for vnsrl, and we can share a bunch of code.	2025-04-22 09:26:36 -07:00
Philip Reames	901ac60db7	[RISCV] Use ri.vzip2{a,b} for interleave2 if available (#136364 ) If XRivosVizip is available, the ri.vzip2a and ri.vzip2b instructions can be used perform a interleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors). Fixed vectors go through shuffle lowering and the zip2a (but not zip2b) case is already handled there..	2025-04-22 08:25:17 -07:00
quic_hchandel	8f8853a574	[RISCV] Add ISel patterns for Xqcia instructions (#136548 ) This patch adds instruction selection patterns for generating the integer arithmetic instructions.	2025-04-22 15:40:03 +05:30
Philip Reames	722d5890cc	[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available (#136321 ) If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering). Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.	2025-04-18 10:40:15 -07:00
Philip Reames	f2ecd86e34	[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342 ) This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.	2025-04-18 07:46:31 -07:00
Philip Reames	50f9b34b53	[RISCV] Prefer vmv.s.x for build_vector a, undef, ..., undef (#136164 ) If we have a build vector which could be either a splat or a scalar insert, prefer the scalar insert. At high LMUL, this reduces vector register pressure (locally, the use will likely still be aligned), and the amount of work performed for the splat.	2025-04-17 19:51:35 -07:00
Philip Reames	7866fc2bd9	[RISCV] Rewrite vrgather.vx undef, (vmv.s.x), 0, v0 as vmv.v.x (#136010 ) This extends the DAG combine introduced in 336b2909 to handle the case where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If the vrgather splats the single source element, and has no passthru we can replace it with a vmv.v.x - which will in turn usually get folded into a vmerge if a select follows.	2025-04-17 10:06:43 -07:00
Jim Lin	0439a4eca7	[RISCV] Add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM for CV immediate branch (#135771 ) If there is another branch instruction also with immediate operand, but it is used to specify which bit to be tested is set or clear. We only check whether operand2 is immediate or not here. There are no way to distinguish between them. So add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM that we can know what kinds of immediate branch instruction are matched in Select_* Pseudo.	2025-04-16 10:16:31 +08:00
Philip Reames	a6b424e1e7	[RISCV] Extend redundant vrgather.vx peephole to vfmv.v.f (#135503 ) Extend the transform introduced in 336b290 to vfmv.v.f. This is fairly trivial and would have been in the original commit except I hadn't written the FP tests yet. If the vrgather.vi is preceeded by a vfmv.v.f which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect.	2025-04-14 18:13:12 -07:00
Philip Reames	336b290923	[RISCV] Use a DAG combine to prune pointless vrgather.vi (#135392 ) If the vrgather.vi is preceeded by a vmv.v.x which writes a superset of the lanes writen by the vrgather, and the vrgather has no passthru, then the vrgather has no semantic effect. This is the start of a mini-series of patches around rewriting vrgather.vi/vx preceeded by vmv.v.x, vfmf.v.f, vmv.s.x, etc... Starting with the simplest, but also lowest impact. One point I'd like a second oppinion on is the out of bounds semenatic change. As far as I can tell, all the indices are in bounds by construction. The doc change is as much as I couldn't figure out how to test the alternative as anything else.	2025-04-11 20:02:53 -07:00
Sergey Kachkov	bc2a5b5466	[RISCV] Explicitly set FRM defs as non-dead to prevent their reordering with instructions that may use it (#135176 ) Fixes #135172. The proposed solution is to conservatively reset dead flag from all $frm defs in AdjustInstrPostInstrSelection.	2025-04-11 15:07:51 +03:00
Philip Reames	f40001372b	[RISCV] Lower a shuffle which is nearly identity except one replicated element (#135292 ) This can be done with a vrgather.vi/vx, and (possibly) a register move. The alternative is to do a vrgather.vv with a full width index vector. We'd already caught the two operands forms of this shuffle; this patch specifically handles the single operand form. Unfortunately only in abstract, it would be nice if we canonicalized shuffles in some way wouldn't it?	2025-04-10 19:45:04 -07:00
Philip Reames	d34437e9e1	[RISCV] Recognize a zipeven/zipodd requiring larger SEW (#134923 ) This is a follow up to f8ee58a3c, and improves code generation for the XRivosVizip extension. If we have a slide pair which could be a zipeven or zipodd if the shuffle was widened, widen the shuffle and then mask the zipeven or zipodd. This is basically working around an order of matching issue; we match the slide pair variants before trying widening. I considered whether we should just widen slide pairs without any consideration of the zip idioms, but the resulting codegen changes look mostly like churn, and have no clear evidence of profitability.	2025-04-10 02:29:04 -07:00
Jim Lin	d28b4d8916	[RISCV] Lower BUILD_VECTOR with i64 type to VID on RV32 if possible (#132339 ) The element type i64 of the BUILD_VECTOR is not legal on RV32. It doesn't catch the VID pattern after being legalized for i64. So try to customized lower it to VID during type legalization.	2025-04-09 13:33:58 +08:00
Philip Reames	c1e95b2e5e	[RISCV] Fix matching bug in VLA shuffle lowering (#134750 ) Fix https://github.com/llvm/llvm-project/issues/134126. The matching code was previous written as if we were mutating the indices to replace undef elements with preferred values, but the actual lowering code just took a prefix of the index vector. This resulted in us using undef indices for lanes which should have been defined, resulting in incorrect codegen. Longer term, we probably should rewrite the mask, but this seemed like an easier tactical fix.	2025-04-08 07:20:25 -07:00
Simon Pilgrim	387a8859cf	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFCI.	2025-04-07 10:29:16 +01:00
Luke Lau	d62d15e298	[RISCV] Undo unprofitable zext of icmp combine (#134306 ) InstCombine will combine this zext of an icmp where the source has a single bit set to a lshr plus trunc (`InstCombinerImpl::transformZExtICmp`): ```llvm define <vscale x 1 x i8> @f(<vscale x 1 x i64> %x) { %1 = and <vscale x 1 x i64> %x, splat (i64 8) %2 = icmp ne <vscale x 1 x i64> %1, splat (i64 0) %3 = zext <vscale x 1 x i1> %2 to <vscale x 1 x i8> ret <vscale x 1 x i8> %3 } ``` ```llvm define <vscale x 1 x i8> @reverse_zexticmp_i64(<vscale x 1 x i64> %x) { %1 = trunc <vscale x 1 x i64> %x to <vscale x 1 x i8> %2 = lshr <vscale x 1 x i8> %1, splat (i8 2) %3 = and <vscale x 1 x i8> %2, splat (i8 1) ret <vscale x 1 x i8> %3 } ``` In a loop, this ends up being unprofitable for RISC-V because the codegen now goes from: ```asm f: # @f .cfi_startproc # %bb.0: vsetvli a0, zero, e64, m1, ta, ma vand.vi v8, v8, 8 vmsne.vi v0, v8, 0 vsetvli zero, zero, e8, mf8, ta, ma vmv.v.i v8, 0 vmerge.vim v8, v8, 1, v0 ret ``` To a series of narrowing vnsrl.wis: ```asm f: # @f .cfi_startproc # %bb.0: vsetvli a0, zero, e64, m1, ta, ma vand.vi v8, v8, 8 vsetvli zero, zero, e32, mf2, ta, ma vnsrl.wi v8, v8, 3 vsetvli zero, zero, e16, mf4, ta, ma vnsrl.wi v8, v8, 0 vsetvli zero, zero, e8, mf8, ta, ma vnsrl.wi v8, v8, 0 ret ``` In the original form, the vmv.v.i is loop invariant and is hoisted out, and the vmerge.vim usually gets folded away into a masked instruction, so you usually just end up with a vsetvli + vmsne.vi. The truncate requires multiple instructions and introduces a vtype toggle for each one, and is measurably slower on the BPI-F3. This reverses the transform in RISCVISelLowering for truncations greater than twice the bitwidth, i.e. it keeps single vnsrl.wis. Fixes #132245	2025-04-04 19:05:59 +01:00
Luke Lau	711b15d179	[RISCV] Mark subvector extracts from index 0 as cheap (#134101 ) Previously we only marked fixed length vector extracts as cheap, so this extends it to any extract at index 0 which should just be a subreg extract. This allows extracts of i1 vectors to be considered for DAG combines, but also scalable vectors too. This causes some slight improvements with large legalized fixed-length vectors, but the underlying motiviation for this is to actually prevent an unprofitable DAG combine on a scalable vector in an upcoming patch.	2025-04-02 17:57:13 +01:00
Ryan Buchner	fa2a6d68c6	[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922 ) Fixes #130510. In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for cases where the (X ^ Y) will be re-used. If a constant is being used for the XOR before a branch, ensure that it is small enough to fit within a 12-bit immediate field. Otherwise, the equality check is more efficient than the check against 0, see the following: ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 xor a0, a0, a1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 beq a0, a1, .LBB0_2 # %bb.1: xor a0, a0, a1 ret .LBB0_2: ``` Similarly, if the XOR is between 1 and a size one integer, we should still fold away the XOR since that comparison can be optimized as a comparison against 0. ``` # %bb.0: slt a0, a0, a1 xor a0, a0, 1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: slt a0, a0, a1 bnez a0, .LBB0_2 # %bb.1: xor a0, a0, 1 ret .LBB0_2: ``` One question about my code is that I used a hard-coded value for the width of a RISCV ALU immediate. Do you know of a way that I can gather this from the `context`, I was unable to devise one.	2025-04-02 09:56:09 -07:00
Kazu Hirata	86c382514e	[Target] Construct SmallVector with ArrayRef (NFC) (#134019 )	2025-04-01 21:59:19 -07:00
Stefan Pintilie	8d69e953b5	[RISCV] Add combine for shadd family of instructions. (#130829 ) For example for the following situation: %6:gpr = SLLI %2:gpr, 2 %7:gpr = ADDI killed %6:gpr, 24 %8:gpr = ADD %0:gpr, %7:gpr If we swap the two add instrucions we can merge the shift and add. The final code will look something like this: %7 = SH2ADD %0, %2 %8 = ADDI %7, 24	2025-03-31 10:02:12 -04:00

1 2 3 4 5 ...

2046 Commits