llvm-project

Author	SHA1	Message	Date
Luke Lau	ec5d17b587	[RISCV] Explicitly check for passthru in doPeepholeMaskedRVV. NFC We were previously checking a combination of the vector policy op and the opcode to determine if we needed to skip copying the passthru from a masked pseudo to an unmasked pseudo. However we can just do this by checking RISCVII::isFirstDefTiedToFirstUse, which is a proxy for whether or not a pseudo has a passthru operand. This should hopefully remove the need for the changes in #123106	2025-01-16 11:28:05 +08:00
Piotr Fusik	cfe5a0847a	[RISCV] Enable Zbb ANDN/ORN/XNOR for more 64-bit constants (#122698 ) This extends PR #120221 to 64-bit constants that don't match the 12-low-bits-set pattern.	2025-01-14 09:15:14 +01:00
Sergei Barannikov	ce393beddf	[RISCV] Pattern-match frameindex (#120917 )	2024-12-23 13:19:00 +03:00
Piotr Fusik	6e7312bda6	[RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR (#120221 ) (and X, (C<<12\|0xfff)) -> (ANDN X, ~C<<12) (or X, (C<<12\|0xfff)) -> (ORN X, ~C<<12) (xor X, (C<<12\|0xfff)) -> (XNOR X, ~C<<12) Emits better code, typically by avoiding an `ADDI HI, -1` instruction. Co-authored-by: Craig Topper <craig.topper@sifive.com>	2024-12-19 21:38:20 +01:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Craig Topper	3b3394baec	[RISCV] Use Log2SEW=0 for VMNAND/VMSET created for riscv_vmsge(u) intrinsics. (#119767 ) These instructions should always be created with Log2SEW=0 and an LMUL based on SEW=8. This is used by the vsetvli pass to know these instructions only care about the ratio and not the specific value. Looks like I fixed riscv_vmsge(u)_mask intrinsics years ago, but forgot the unmasked intrinsics. I'm working on an enhancement to our MachineVerifier checks that will require VMNAND and VMSET to have Log2SEW=0.	2024-12-13 11:00:12 -08:00
Craig Topper	0c94915d34	[RISCV] Use _B* suffix for vector mask logic pseudo instructions. (#119787 ) Replace LMUL suffixes with _B1, _B2, etc. This matches what we do for other mask only instructions like VCPOP_M, VFIRST_M, VMSBF_M, VLM, VSM, etc. Now all pseudoinstructions that use Log2SEW=0 will be consistently named.	2024-12-12 21:11:01 -08:00
Craig Topper	88c18da37d	[RISCV] Rename suffixes on VCPOP/VMSBF/VMSET/etc pseudos. NFC (#119785 ) These are suffixed with B1, B2, B4, B8, B16, B32, or B64 which I think these were supposed to match the naming of the vbool types from C where the number should be SEW/LMUL. So the smallest mask is 64 and the largest is 1. This provides a compact syntax for describing the 7 possible ratios between LMUL and SEW. We had the instruction names in the opposite order.	2024-12-12 16:17:21 -08:00
Piotr Fusik	22d26ae304	[RISCV] Optimize (slli (srli (slli X, C1), C1), C2) -> (srli (slli X, C1), C1-C2) (#119567 ) Masking out most significant bits can be done with shl followed by srl with same shift amount. If this is followed by a shl, we could instead srl by a smaller amount of bits. This transform is already implemented in tablegen for masking out 32 most significant bits. Emits better code for e.g. float index(float p, int i) { return p + (i & (1 << 30) - 1); }	2024-12-12 07:58:36 +01:00
Sam Elliott	408659c5b5	[RISCV] Merge GPRPair and GPRF64Pair (#116094 ) As suggested by Craig, this tries to merge the two sets of register classes created in #112983, GPRPair* and GPRF64Pair*. - I added some explicit annotations to `RISCVInstrInfoD.td` which fixed the type inference issues I was seeing from tablegen for select patterns. - I've had to make the behaviour of `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` cover more cases, because you cannot bitcast to/from untyped (the bitcast would otherwise have been inserted automatically by TargetLowering code). - I apparently didn't need to change `getNumRegisters` again, which continues to tell me there's a bug in the code for tied inputs. I added some more test coverage of this case but it didn't seem to help find the asserts I was finding before - I think the difference is between the default behaviour for integers which doesn't apply to floats. - There's still a difference between BuildGPRPair and BuildPairF64 (and the same for SplitGPRPair and SplitF64). I'm not happy with this, I think it's quite confusing, as they're very similar, just differing in whether they give a `untyped` or a `f64`. I haven't really worked out how the DAGCombiner copes if one meets the other, I know we have some of this for the f64 variants already, but they're a lot more complex than the GPRPair variants anyway.	2024-11-20 10:08:55 +00:00
Craig Topper	ac17b50f50	[RISCV] Use getSignedTargetConstant. NFC	2024-11-18 12:20:44 -08:00
Sam Elliott	4615cc38f3	[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983 ) This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair` to `GPRF64Pair`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.`(rv32) and `amocas.q.`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.	2024-11-18 17:45:58 +00:00
Jianjian Guan	a6f8af676a	[RISCV] Improve vmsge and vmsgeu selection (#115435 ) Select vmsge(u) vs, C to vmsgt(u) vs, C-1 if C is not in the imm range and not the minimum value. Fix https://github.com/llvm/llvm-project/issues/114505.	2024-11-13 15:05:08 +08:00
Kazu Hirata	82d5dd28b4	[RISCV] Remove unused includes (NFC) (#115814 ) Identified with misc-include-cleaner.	2024-11-11 22:54:54 -08:00
Pengcheng Wang	3850801ca5	[RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable We seem to forget these two instructions. Reviewers: preames, frasercrmck, lukel97, topperc Reviewed By: lukel97 Pull Request: https://github.com/llvm/llvm-project/pull/115162	2024-11-07 15:41:46 +08:00
Craig Topper	0167a92e28	[RISCV] Use unsigned instead of int64_t for two small positive shift amounts. NFC	2024-10-30 13:10:55 -07:00
Yingwei Zheng	944f4adcd2	[RISCV][ISel] Select `binvi` for pattern `icmp eq/ne X, pow2` (#110957 ) This patch selects `binvi` for pattern `icmp eq/ne X, pow2` when zbs is available.	2024-10-03 15:13:02 +08:00
Craig Topper	bc91f3cdd5	[RISCV] Add 32 bit GPR sub-register for Zfinx. (#108336 ) This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to #107446 which adds a 16 bit register class.	2024-10-01 22:09:27 -07:00
Craig Topper	8a7843ca0f	[RISCV] Add 16 bit GPR sub-register for Zhinx. (#107446 ) This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next.	2024-09-26 22:56:12 -07:00
Piotr Fusik	cc7b24a4d1	[NFC] Fix typos in comments (#109765 )	2024-09-24 11:19:56 +02:00
Craig Topper	079f31c11f	[RISCV] Move the rest of Zfa FLI instruction handling to lowerConstantFP. (#109217 ) We already moved the fneg case. This moves the rest so we can drop the custom isel.	2024-09-19 15:16:10 -07:00
Craig Topper	de6d7a6c30	[RISCV] Expand Zfa fli+fneg cases during lowering instead of during isel. (#108316 ) Most of the constants fli can generate are positive numbers. We can use fli+fneg to generate their negative versions. Previously, we considered such negative constants as "legal" and let isel generate the fli+fneg. However, it is useful to expose the fneg to DAG combines to fold with fadd to produce fsub or with fma to produce fnmadd, fnmsub, or fmsub. This patch moves the fneg creation to lowering so that the fneg will be visible to the last DAG combine. I might move the rest of Zfa handling from isel to lowering as a follow up. Fixes #107772.	2024-09-11 22:31:45 -07:00
Craig Topper	8c17ed1512	[RISCV] Generalize RISCVDAGToDAGISel::selectFPImm to handle bitcasts from int to FP. (#108284 ) selectFPImm previously handled cases where an FPImm could be materialized in an integer register. We can generalize this to cases where a value was in an integer register and then copied to a scalar FP register to be used by a vector instruction. In the affected test, the call lowering code used up all of the FP argument registers and started using GPRs. Now we use integer vector instructions to consume those GPRs instead of moving them to scalar FP first.	2024-09-11 21:13:26 -07:00
Luke Lau	2949720c2e	[RISCV] Move vmerge same mask peephole to RISCVVectorPeephole (#106108 ) We currently fold a vmerge.vvm into its true operand if the true operand is a masked pseudo with the same mask. We can move this over to RISCVVectorPeephole by instead splitting it up into a smaller peephole which converts it to a vmv.v.v first. The existing foldVMV_V_V peephole will then take care of folding it if needed. This is very similar to the existing all-ones mask peephole and we could potentially do it inside of it. I opted to put it in a separate peephole to make it easier to reason about, given that the duplication is small, but I could be persuaded either way.	2024-09-06 08:59:13 +08:00
Craig Topper	0c1500ef05	[RISCV] Fix another RV32 Zdinx load/store addressing corner case. RISCVExpandPseudoInsts makes sure the offset is divisible by 8 so we need to enforce that during isel.	2024-09-04 23:26:40 -07:00
Brandon Wu	22f98740b6	[llvm][RISCV] Support RISCV vector tuple CodeGen and Calling Convention (#97995 ) This patch handles target lowering and calling convention. For target lowering, the vector tuple type represented as multiple scalable vectors is now changed to a single `MVT`, each `MVT` has a corresponding register class. The load/store of vector tuples are handled as the same way but need another vector insert/extract instructions to get sub-register group. Inline assembly constraint for vector tuple type can directly be modeled as "vr" which is identical to normal vector registers. For calling convention, it no longer needs an alternative algorithm to handle register allocation, this makes the code easier to maintain and read. Stacked on https://github.com/llvm/llvm-project/pull/97994	2024-08-31 19:28:36 +08:00
Luke Lau	dbbfc952f0	[RISCV] Separate ActiveElementsAffectResult into VL and Mask flags (#106517 ) In #106110 we had to mark v[f]slide1down.vx as ActiveElementsAffectResult since the elements in the body depend on VL. However it doesn't depend on the mask, so this was overly conservative and broke the vmerge peephole. We can recover this by splitting up ActiveElementsAffectResult into VL and Mask bits, so we can more accurately model v[f]slide1down.vx and re-enable the peephole.	2024-08-30 07:46:06 +08:00
Craig Topper	5b41eb3a6d	[RISCV] Fix more boundary cases in immediate selection for Zdinx load/store on RV32. (#105874 ) In order to support -unaligned-scalar-mem properly, we need to be more careful with immediates of global variables. We need to guarantee that adding 4 in RISCVExpandingPseudos won't overflow simm12. Since we don't know what the simm12 is until link time, the only way to guarantee this is to make sure the base address is at least 8 byte aligned. There were also several corner cases bugs in immediate folding where we would fold an immediate in the range [2044,2047] where adding 4 would overflow. These are not related to unaligned-scalar-mem.	2024-08-25 14:27:08 -07:00
Craig Topper	0381e01424	Recommit "[RISCV] Add isel optimization for (and (sra y, c2), c1) to recover regression from #101751 . (#104114 )" Fixed an incorrect cast. Original message: If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) followed by a SHXADD with c4 as the X amount. Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). Alive2: https://alive2.llvm.org/ce/z/AwhheR	2024-08-23 09:37:48 -07:00
Hans Wennborg	858afe90aa	Revert "[RISCV] Add isel optimization for (and (sra y, c2), c1) to recover regression from #101751 . (#104114 )" This caused an assert to fire: llvm/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From &) [To = llvm::ConstantSDNode, From = llvm::SDValue]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed. see comment on the PR. > If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If > c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) > followed by a SHXADD with c4 as the X amount. > > Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). > Alive2: https://alive2.llvm.org/ce/z/AwhheR This reverts commit 514481736cf943464125ef34570a7df0a19290de.	2024-08-23 16:42:04 +02:00
Craig Topper	514481736c	[RISCV] Add isel optimization for (and (sra y, c2), c1) to recover regression from #101751 . (#104114 ) If c1 is a shifted mask with c3 leading zeros and c4 trailing zeros. If c2 is greater than c3, we can use (srli (srai y, c2 - c3), c3 + c4) followed by a SHXADD with c4 as the X amount. Without Zba we can use (slli (srli (srai y, c2 - c3), c3 + c4), c4). Alive2: https://alive2.llvm.org/ce/z/AwhheR	2024-08-20 09:41:46 -07:00
Craig Topper	e6ceb29ab6	[RISCV] Use getAllOnesConstant/getSignedConstant.	2024-08-17 00:18:41 -07:00
Luke Lau	aba3476111	[RISCV] Move vmv.v.v peephole from SelectionDAG to RISCVVectorPeephole (#100367 ) This is split off from #71764, and moves only the vmv.v.v part of performCombineVMergeAndVOps to work on MachineInstrs. In retrospect trying to handle PseudoVMV_V_V and PseudoVMERGE_VVM in the same function makes the code quite hard to read, so this just does it in a separate peephole. This turns out to be simpler since for PseudoVMV_V_V we don't need to convert the Src instruction to a masked variant, and we don't need to create a fake all ones mask.	2024-08-17 00:49:27 +08:00
Wang Yaduo	1900810b6f	[RISCV] Simplify (srl (and X, Mask), Const) to TH_EXTU (#102802 )	2024-08-16 14:23:00 +08:00
Craig Topper	294ed6a1eb	[RISCV] Use if init statement to reduce scope of variable. NFC	2024-08-14 08:37:50 -07:00
Craig Topper	0a4e1c518b	[RISCV] Add some Zfinx instructions to hasAllNBitUsers.	2024-08-08 17:33:54 -07:00
Craig Topper	15895daa68	[RISCV] Limit (and (sra x, c2), c1) -> (srli (srai x, c2-c3), c3) isel in some cases. (#102034 ) If x is a shl by 32 and c1 is an simm12, we would prefer to use a SRAIW+ANDI. This prevents selecting the slli to a separate slli instruction. Fixes regression from #101868	2024-08-06 20:27:14 -07:00
Craig Topper	cfd13cbac1	[RISCV] Improve variable scoping in custom isel for ISD::AND. Give the (and (srl/shl X, C2), C1) handling its owns private `C1` variable it can modify using known zeros. This may be out of sync with N1C->getZExtValue(). Add a separate const C1 for (and (sra X, C2), C1) and (and X, C). This copy will always be in sync with N1C->getZExtValue(). Remove the IsC1Mask and IsC1ANDI variables and compute them at their usage. Use N1C->getSExtValue() when calling isInt. This shouldn't be a functional change since we already checked that it was a mask. In order for it to be a mask and a negative number, it would need to be -1 which should have been removed by DAG combine.	2024-08-05 16:02:57 -07:00
Craig Topper	5bc99fb515	[RISCV] Select (and (sra x, c2), c1) as (srli (srai x, c2-c3), c3). (#101868 ) If c1 is a mask with c3 leading zeros and c3 is larger than c2. Fixes regression reported in #101751.	2024-08-04 22:35:38 -07:00
Luke Lau	a5b65399a7	[RISCV] Move ActiveElementsAffectResult to TSFlags. NFC (#101123 ) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695442138, RISCVMaskedPseudoInfo currently stores two things, whether or not a masked pseudo has an unmasked variant, and whether or not it's element-wise. These are separate things, so this patch splits the latter out into the underlying instruction's TSFlags to help make the semantics of #100367 more clear. To the best of my knowledge the only non-element-wise instructions in V are: - vredsum.vs and other reductions - vcompress.vm - vms*f.m - vcpop.m and vfirst.m - viota.m In vector crypto the instructions that operate on element groups are conservatively marked (this might be fine to relax later given since non-EGS multiple vls are reserved), as well as the SiFive extensions and XTHeadVdot.	2024-08-05 12:08:03 +08:00
Craig Topper	766f68d17a	[RISCV] Invert if conditions in the switch in RISCVDAGToDAGISel::hasAllNBitUsers. NFC Make "break" consistently the "if" body and the "return false" the last thing in each case. This makes it easier to add different conditions for different operands of some instructions and makes everything more consistent.	2024-08-03 17:23:55 -07:00
Craig Topper	1a9acd786d	[RISCV] Capitalize some variable names. NFC	2024-08-03 13:37:53 -07:00
Luke Lau	d01c0514ab	[RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (#101152 ) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform. This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in #100367	2024-07-31 00:28:52 +08:00
Luke Lau	b1542afd0b	[RISCV] Rename merge operand -> passthru. NFC (#100330 ) We sometimes call the first tied dest operand in vector pseudos the merge operand, and other times the passthru. Passthru seems to be more common, and it's what the C intrinsics call it[^1], so this renames all usages of merge to passthru to be consistent. It also helps prevent confusion with vmerge.vvm in some of the peephole optimisations. [^1]: https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc#the-passthrough-vd-argument-in-the-intrinsics	2024-07-30 17:47:00 +08:00
Craig Topper	ad80265874	[RISCV] Qualify all XCV predicates with !is64Bit. (#101074 ) The tablegen patterns all have isRV32. I did not check if any of them could naively support RV64. Fixes #101067 and probably other bugs like it we haven't found yet.	2024-07-29 21:52:57 -07:00
Craig Topper	7647f88234	[RISCV] Add isel special case for (and (srl X, c2), c1) -> (slli_uw (srli x, c2+c3), c3). (#100966 ) Where c1 is a shifted mask with 32 set bits and c3 trailing zeros. Fixes #100936.	2024-07-29 09:08:39 -07:00
Luke Lau	9e9924cc2e	[RISCV] Don't fold vmerge.vvm or vmv.v.v into vredsum.vs if AVL changed (#99006 ) When folding, we currently check if the pseudo's result is not lanewise (e.g. vredsum.vs or viota.m) and bail if we're changing the mask. However we also need to check for the AVL too. This patch bails if the AVL changed for these pseudos, and also renames the pseudo table property to be more explicit.	2024-07-17 12:50:17 +08:00
Luke Lau	3f83a69bcb	[RISCV] Allow folding vmerge into masked ops when mask is the same (#97989 ) We currently only fold a vmerge into a masked true operand if the vmerge has an all-ones mask, since we end up keeping the mask from the true operand. But if the masks are the same then we can still fold, because vmerge and true have the same passthru. If an element was masked off in the original vmerge, it will also be masked off in the resulting true, and will have the same passthru value. The motivation for this is to lower masked VP loads and stores with passthrus to masked RVV instructions. Normally you can express a masked RVV instruction with a mask undisturbed passthru via a combination of a VP op with an all-ones mask and a vp.merge. But for loads and stores you need the same mask on the VP op as well as the vp.merge.	2024-07-09 12:12:02 +08:00

1 2 3 4 5 ...

477 Commits