llvm-project

Author	SHA1	Message	Date
Luke Lau	eafe31b293	[RISCV] Don't lose elements from False in vmerge -> vmv peephole (#149720 ) In the vmerge peephole, we currently allow different AVLs for the vmerge and its true operand. If vmerge's VL > true's VL, vmerge can "preserve" elements from false that would otherwise be clobbered with a tail agnostic policy on true. mask 1 1 1 1 0 0 0 0 true x x x x\|. . . . AVL=4 vmerge x x x x f f\|. . AVL=6 If we convert this to vmv.v.v we will lose those false elements: mask 1 1 1 1 0 0 0 0 true x x x x\|. . . . AVL=4 vmv.v.v x x x x . .\|. . AVL=6 Fix this by checking that vmerge's AVL is <= true's AVL. Should fix #149335	2025-07-22 15:21:42 +08:00
Craig Topper	8b9e760271	[RISCV] Don't use RVInstIBase for P-ext plui/pli. NFC (#149940 ) These instructions don't have an rs1 field unlike other instructions that use RVInstIBase. Rename the classes to not use Unary since we have historically used that for a single register operand.	2025-07-21 21:44:21 -07:00
Brandon Wu	24bf4aea0c	[RISCV][llvm] Handle vector callee saved register correctly (#149467 ) In TargetFrameLowering::determineCalleeSaves, any vector register is marked as saved if any of its subregister is clobbered, this is not correct in vector registers. We only want the vector register to be marked as saved only if all of its subregisters are clobbered. This patch handles vector callee saved registers in target hook.	2025-07-21 17:49:34 -07:00
Craig Topper	5e8e03d859	[RISCV] Simplify RVPUnary tablegen class. NFC imm field was unused. rs1 is already handled in RVInstIBase.	2025-07-21 15:16:18 -07:00
Philip Reames	bf86abee3e	[RISCV][IA] Support masked.store of deinterleaveN intrinsic (#149893 ) This is the masked.store side to the masked.load support added in 881b3fd. With this change, we support masked.load and masked.store via the intrinsic lowering path used primarily with scalable vectors. An upcoming change will extend the fixed vector (i.a. shuffle vector) paths in the same manner.	2025-07-21 14:04:03 -07:00
Craig Topper	860ff8714b	[RISCV] Use empty() instead of size()==0. NFC (#149868 ) Move the assert past the code that determines if the pass should run.	2025-07-21 13:16:38 -07:00
Philip Reames	d93f91fc46	[RISCV][IA] Prefer switch over intrinsic ID instead of if-chain [nfc]	2025-07-21 12:24:51 -07:00
Philip Reames	881b3fdfad	[RISCV][IA] Support masked.load for deinterleaveN matching (#149556 ) This builds on the whole series of recent API reworks to implement support for deinterleaveN of masked.load. The goal is to be able to enable masked interleave groups in the vectorizer once all the codegen and costing pieces are in place. I considered including the shuffle path support in this review as well (since the RISCV target specific stuff should be common), but decided to separate it into it's own review just to focus attention on one thing at a time.	2025-07-21 11:07:41 -07:00
Alex Bradbury	fc69f25a8f	[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs (#144703 ) After the refactoring in #149710 the logic change is trivial. Motivation for preferring sign-extended 32-bit loads (LW) vs zero-extended (LWU): * LW is compressible while LWU is not. * Helps to minimise the diff vs RV32 (e.g. LWU vs LW) * Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring. Similar normalisation could be done for LHU and LH, but this is less well motivated as there is a compressed LHU (and if performing the change in RISCVOptWInstrs it wouldn't be done for RV32). There is a compressed LBU but not LB, meaning doing a similar normalisation for byte-sized loads would actually be a regression in terms of code size. Load narrowing when allowed by hasAllNBitUsers isn't explored in this patch. This changes ~20500 instructions in an RVA22 build of the llvm-test-suite including SPEC 2017. As part of the review, the option of doing the change at ISel time was explored but was found to be less effective.	2025-07-21 11:48:33 +01:00
Alex Bradbury	9311f3814b	[RISCV][NFC] Combine RISCVOptWInstrs::stripWSuffixes and appendWSuffixes into canonicalizeWSuffixes (#149710 ) This refactor was suggested in <https://github.com/llvm/llvm-project/pull/144703>. I have checked for unexpected changes by comparing builds of llvm-test-suite with/without this refactor, including with preferWInst force enabled.	2025-07-21 10:48:06 +01:00
Luke Lau	b832c49cb4	[RISCV] Fix VLOptimizer assert, relax ElementsDependOn on viota/vms{b,i,o}f.m (#149698 ) The previous assert wasn't passing the TSFlags but the opcode, so wasn't working. Fixing it reveals that it was actually triggering, because we're too strict with viota and vmsxf.m We already reduce the VL on these instructions because the result in each element doesn't depend on VL. However, it does change if masked, so account for that.	2025-07-21 14:51:41 +08:00
Sudharsan Veeravalli	84e689b1db	[RISCV] Swap source register operands in QC_SHLADD ISEL patterns (#149697 ) The instruction does `rd = (rs1 << shamt) + rs2` but the ISEL patterns had `rs1` and `rs2` the other way around which is incorrect.	2025-07-21 12:03:55 +05:30
Alex Bradbury	c58225f757	[RISCV] Add RISCV::SUBW to RISCVOptWInstrs::stripWSuffixes (#149071 ) This is purely a benefit for reducing unnecessary diffs between RV32 and RV64, as RVC does have a compressed form of SUBW (so SUB isn't more compressible). This affects ~57.2k instructions in an rva22u64 build of llvm-test-suite with SPEC CPU 2017 included.	2025-07-20 09:59:41 +01:00
Alex Bradbury	971bfbead2	[RISCV][NFC] Add NumTransformedToNonWInstrs statistic to RISCVOptWInstrs extend debug printing RISCVOptWInstrs has a NumTransformedToWInstrs statistic, but didn't have one for the W=>Non-W transform done by stripWSuffixes. It also didn't do debug printing of the transformation. This patch addresses both issues. Reviewed as part of <https://github.com/llvm/llvm-project/pull/149071>, but landing separately.	2025-07-20 09:54:10 +01:00
Fangrui Song	fd6d6a7c8d	MC: Refactor FT_Align fragments when linker relaxation is enabled Previously, two MCAsmBackend hooks were used, with shouldInsertFixupForCodeAlign calling getWriter().recordRelocation directly, bypassing generic code. This patch: * Introduces MCAsmBackend::relaxAlign to replace the two hooks. * Tracks padding size using VarContentEnd (content is ignored). * Move setLinkerRelaxable from MCObjectStreamer::emitCodeAlignment to the backends. Pull Request: https://github.com/llvm/llvm-project/pull/149465	2025-07-20 00:55:54 -07:00
Fangrui Song	2ba5e0ad17	MC: Encode FT_Align in fragment's variable-size tail Follow-up to #148544 Pull Request: https://github.com/llvm/llvm-project/pull/149030	2025-07-20 00:46:51 -07:00
Craig Topper	ff0cbecb68	[RISCV] Add a non-template version of SelectAddrRegZextRegScale and move code there. NFC The template versions now call the non-template version. This avoids duplicating the code for each template.	2025-07-19 17:53:39 -07:00
Philip Reames	f6641e2f23	[RISCV][IA] Factor out code for extracting operands from mem insts [nfc] (#149344 ) We're going to end up repeating the operand extraction four times once all of the routines have been updated to support both plain load/store and vp.load/vp.store. I plan to add masked.load/masked.store in the near future, and we'd need to add that to each of the four cases. Instead, factor out a single copy of the operand normalization.	2025-07-18 11:04:18 -07:00
Sergei Barannikov	6112ebde0c	[RISCV] Guard CFI emission code with MF.needsFrameMoves() (#136060 ) Currently, AsmPrinter skips CFI instructions created by a backend if they are not needed. I'd like to change that so that it always prints/encodes CFI instructions if a backend created them. This change should slightly (perhaps negligibly) improve compile time as post-PEI passes no longer need to skip over these instructions in no-exceptions no-debug builds, and will allow to simplify convoluted logic in AsmPrinter once other targets stop emitting CFI instructions when they are not needed (that's my final goal). The changes in a test seem to be caused by slightly different post-RA scheduling in the absence of CFI instructions.	2025-07-18 16:49:30 +03:00
Liao Chunyu	52a9c493e6	Reland "[RISCV] AddEdge between mask producer and user of V0 (#146855 )" (#148566 ) The defmask vector cannot contain instructions that use V0. for `MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/mesh.cpp` Save `%173:vrm2nov0 = PseudoVMERGE_VVM_M2 undef %173:vrm2nov0(tied-def 0), %116:vrm2, %173:vrm2nov0, killed $v0, -1, 5 `to def mask caused crash.	2025-07-18 10:56:07 +08:00
Philip Reames	28417e6459	[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.	2025-07-17 17:29:28 -07:00
Philip Reames	8f18dde6c0	[RISCV][IA] Rearrange code for readability and ease of merge [nfc]	2025-07-17 07:38:15 -07:00
Craig Topper	0f71424280	[RISCV] Teach SelectAddrRegRegScale that ADD is commutable. (#149231 )	2025-07-17 07:13:50 -07:00
Philip Reames	b36188514a	[RISCV][IA] Check nuw on multiply when analyzing EVL (#149205 ) If we're checking to see if a number is a multiple of a small constant, we need to be sure the multiply doesn't overflow for the mul logic to hold. The VL is a unsigned number, so we care about unsigned overflow. Once we've proven a number of a multiple, we can also use an exact udiv as we know we're not discarding any bits. This fixes what is technically a miscompile with EVL vectorization, but I doubt we'd ever have seen it in practice since most EVLs are going to much less than UINT_MAX.	2025-07-16 19:11:32 -07:00
Philip Reames	b9adc4a59c	[IA] Use a single callback for lowerInterleaveIntrinsic [nfc] (#148978 ) (#149168 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPStore - currently in lowerInterleavedVPStore which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine.	2025-07-16 18:09:27 -07:00
Min-Yih Hsu	6824bcfdb4	[IA] Relax the requirement of having ExtractValue users on deinterleave intrinsic (#148716 ) There are cases where InstCombine / InstSimplify might sink extractvalue instructions that use a deinterleave intrinsic into successor blocks, which prevents InterleavedAccess from kicking in because the current pattern requires deinterleave intrinsic to be used by extractvalue. However, this requirement is bit too strict while we could have just replaced the users of deinterleave intrinsic with whatever generated by the target TLI hooks.	2025-07-16 13:46:02 -07:00
Mikhail R. Gadelha	ececa87708	[RISCV][VLOPT] Add support for vrgather (#148249 ) This PR adds support for the vrgather.vi, vrgather.vx, vrgather.vv, vrgatherei16.vv instructions in the RISC-V VLOptimizer. To support vrgatherei16.vv I also needed to add support for it in getOperandLog2EEW.	2025-07-16 17:25:27 -03:00
Mikhail R. Gadelha	c4d4e761ef	[RISCV] Pre-commit RVV instructions to the x60 scheduling model and tests	2025-07-16 15:48:37 -03:00
Serge Pavlov	372e99938f	Remove unused variable (#149115 )	2025-07-16 11:28:57 -04:00
Serge Pavlov	c71b92d09f	[RISCV][FPE] Remove unused variable (#149054 ) It was added by me in 905bb5bddb690765cab5416d55ab017d7c832eb3, which committed PR https://github.com/llvm/llvm-project/pull/148569.	2025-07-16 19:56:31 +07:00
Serge Pavlov	905bb5bddb	[RISCV][FPEnv] Lowering of fpmode intrinsics (#148569 ) The change implements custom lowering of `get_fpmode`, `set_fpmode` and `reset_fpmode` for RISCV target. The implementation is aligned with the functions `fegetmode` and `fesetmode` in GLIBC.	2025-07-16 16:02:15 +07:00
Jim Lin	3e4153c97b	[RISCV] Implement Builtins for XAndesBFHCvt extension. (#148804 ) XAndesBFHCvt provides two builtins functions for converting between float and bf16. Users can use them to convert bf16 values loaded from memory to float, perform arithmetic operations, then convert them back to bf16 and store them to memory. The load/store and move operations for bf16 will be handled in a later patch.	2025-07-16 16:13:31 +08:00
Craig Topper	dbb6ed7631	[RISCV] Refactor SelectAddrRegRegScale. NFC Rename UnwrapShl->SelectShl. Make it only responsible for matching a SHL by constant. Handle the fallback case of reg+reg with no scale outside of SelectShl. Reorder the check so RHS is checked for shift first. The base pointer is most likely on the LHS. It's very unlikely both operands are shifts. This is preparation for adding better costing decisions to this code.	2025-07-15 22:43:56 -07:00
Fangrui Song	dc3a4c0fcf	MC: Restructure MCFragment as a fixed part and a variable tail Refactor the fragment representation of `push rax; jmp foo; nop; jmp foo`, previously encoded as `MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo)`, to ``` MCFragment(fixed: push rax, variable: jmp foo) MCFragment(fixed: nop, variable: jmp foo) ``` Changes: * Eliminate MCEncodedFragment, moving content and fixup storage to MCFragment. * The new MCFragment contains a fixed-size content (similar to previous MCDataFragment) and an optional variable-size tail. * The variable-size tail supports FT_Relaxable, FT_LEB, FT_Dwarf, and FT_DwarfFrame, with plans to extend to other fragment types. dyn_cast/isa should be avoided for the converted fragment subclasses. * In `setVarFixups`, source fixup offsets are relative to the variable part's start. Stored fixup (in `FixupStorage`) offsets are relative to the fixed part's start. A lot of code does `getFragmentOffset(Frag) + Fixup.getOffset()`, expecting the fixup offset to be relative to the fixed part's start. * HexagonAsmBackend::fixupNeedsRelaxationAdvanced needs to know the associated instruction for a fixup. We have to add a `const MCFragment &` parameter. * In MCObjectStreamer, extend `absoluteSymbolDiff` to apply to FT_Relaxable as otherwise there would be many more FT_DwarfFrame fragments in -g compilations. https://llvm-compile-time-tracker.com/compare.php?from=28e1473e8e523150914e8c7ea50b44fb0d2a8d65&to=778d68ad1d48e7f111ea853dd249912c601bee89&stat=instructions:u ``` stage2-O0-g instructins:u geomeon (-0.07%) stage1-ReleaseLTO-g (link only) max-rss geomean (-0.39%) ``` ``` % /t/clang-old -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &\| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}' Total 59675 Align 2215 Data 29700 Dwarf 12044 DwarfCallFrame 4216 Fill 92 LEB 12 Relaxable 11396 % /t/clang-new -g -c sqlite3.i -w -mllvm -debug-only=mc-dump &\| awk '/^[0-9]+/{s[$2]++;tot++} END{print "Total",tot; n=asorti(s, si); for(i=1;i<=n;i++) print si[i],s[si[i]]}' Total 32287 Align 2215 Data 2312 Dwarf 12044 DwarfCallFrame 4216 Fill 92 LEB 12 Relaxable 11396 ``` Pull Request: https://github.com/llvm/llvm-project/pull/148544	2025-07-15 21:56:55 -07:00
Ming-Yi Lai	9b3064aec8	[llvm-objdump][RISCV] Display `@plt' symbols when disassembling .plt section (#147933 ) This patch adds dummy symbols for PLT entries for RISC-V 32-bit and 64-bit targets so llvm-objdump can show the function symbol that corresponds to each PLT entry.	2025-07-16 11:41:17 +08:00
Craig Topper	5ff99f2757	[RISCV] Remove duplicate check in an if statement. NFC	2025-07-15 18:52:57 -07:00
Craig Topper	a87b8398f9	[RISCV] Simplify conversion from ISD::Constant to ISD::TargetConstant in SelectAddrRegRegScale. NFC Directly copy the underlying ConstantInt instead of reconstructing it.	2025-07-15 18:41:28 -07:00
Philip Reames	4b81dc75f4	[IA] Use a single callback for lowerDeinterleaveIntrinsic [nfc] (#148978 ) This essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad which is shared between shuffle and intrinsic based interleaves - into the existing dedicated routine. My plan is that if we like this factoring is that I'll do the same for the intrinsic store paths, and then remove the excess generality from the shuffle paths since we don't need to support both modes in the shared VPLoad/Store callbacks. We can probably even fold the VP versions into the non-VP shuffle variants in the analogous way.	2025-07-15 18:08:57 -07:00
Philip Reames	c7d1eae4fc	[RISCV] Use masked segment LD/ST intrinsics in (de)interleaveN lowering [nfc] (#148966 ) Follow up on the work from e5bc7e7d, and extend it to the lowering used for interleave and deinterleave when we can't combine with a nearby memory operation.	2025-07-15 17:12:08 -07:00
Craig Topper	4bd0e9e7f3	[RISCV] Add early out to reduce indentation in SelectAddrRegRegScale. NFC	2025-07-15 17:01:08 -07:00
Craig Topper	b64d7baf9c	[RISCV] Change the InstFormat for Zicbop prefetch instructions to InstFormatOther. (#148934 ) The lower 5-bits of the immediate are not part of the address unlike other InstFormatS instructions. We use InstFormatS in RISCVRegisterInfo::needsFrameBaseReg and RISCVRegisterInfo::getFrameIndexInstrOffset which is not aware of this special encoding. Force the format to InstFormatOther so those functions will ignore it. InstFormatS is also used by relocation emission, but I don't believe we ever emit these instructions with a relocation because of the encoding.	2025-07-15 14:49:54 -07:00
Philip Reames	bc187b8270	[RISCV] Use early-return in lowerInterleaveIntrinsicToStore [nfc]	2025-07-15 13:58:22 -07:00
Philip Reames	e5bc7e7df3	[RISCV][IA] Always generate masked versions of segment LD/ST [nfc-ish] (#148905 ) Goal is to be able to eventually merge some of these code path. Having the mask operand should get dropped cleanly via pattern match.	2025-07-15 13:02:24 -07:00
Sudharsan Veeravalli	d67d91a990	[RISCV] Fix issues in ORI to QC.INSBI transformation (#148809 ) The transformation done in #147349 was incorrect since we were not passing the input node of the `OR` instruction to the `QC.INSBI` instruction leading to the generated instruction doing the wrong thing. In order to do this we first needed to add the output register to `QC.INSBI` as being both an input and output. The code produced after the above fix will need a copy (mv) to preserve the register input to the OR instruction if it has more than one use making the transformation net neutral ( `6-byte QC.E.ORI/ORAI` vs `2-byte C.MV + 4-byte QC.INSB`I). Avoid doing the transformation if there is more than one use of the input register to the OR instruction.	2025-07-15 12:01:33 -07:00
Luke Lau	bc2004c2e4	[RISCV] Handle LHS == 0 in isVLKnownLE (#148860 ) If a VL is zero then it's known to be less than or equal to every other VL. This looks weird on its own since a VL of zero isn't that common. The test diffs come from a type being split resulting in a VP intrinsic's EVL being zero. The motivation for this is to split off part of an upcoming patch I plan on submitting for RISCVVLOptimizer, which generalizes it to handle recurrences, and needs to reason about an initial state of demanded VLs set to zero.	2025-07-16 02:04:13 +08:00
Craig Topper	63d099af14	[RISCV] Remove incorrect and untested FrameIndex support from SelectAddrRegImm9. (#148779 ) To fold a FrameIndex, we need to teach eliminateFrameIndex to respect the uimm9 range.	2025-07-15 10:49:23 -07:00
Raphael Moreira Zinsly	1db9eb2320	[RISCV] Pass the MachineInstr flag as argument to allocateStack (#147531 ) When not in the prologue we do not want to set the FrameSetup flag, by passing the flag as argument we can use allocateStack correctly on those cases. This fixes the allocation and probe in eliminateCallFramePseudoInstr.	2025-07-15 08:09:18 -07:00
Philip Reames	e282cdb0a2	[RISCV][IA] Avoid use of redundant variables which differ solely by type [nfc] Instead of using dyn_cast, just use isa combined with accessors on the base VectotType class. Working towards being able to merge code from some of these routines.	2025-07-15 08:04:03 -07:00
Mikhail R. Gadelha	2435ea6975	[RISCV][VLOPT] Add support for vfclass.v (#148246 ) This PR adds support for the vfclass.v instruction in the RISC-V VLOptimizer.	2025-07-15 11:49:43 -03:00
Mikhail R. Gadelha	a606f4441a	[RISCV][VLOPT] Add support for vector integer add-with-carry/subtract-with-borrow instructions (#148247 ) This PR adds support for the vmadc.vim, vmadc.vvm, vmadc.vxm, vmsbc.vvm, vmsbc.vxm, vsbc.vvm, vsbc.vxm instructions in the RISC-V VLOptimizer.	2025-07-15 11:49:19 -03:00

... 2 3 4 5 6 ...

7012 Commits