llvm-project

Author	SHA1	Message	Date
Jesse Huang	e5ad7f4556	[RISCV] Move RISCVIndirectBranchTracking before Branch Relaxation (#139993 ) The `RISCVIndirectBranchTracking` pass inserts `lpad` instruction and could change the basic block alignment, so this should not happen after the branch relaxation as the adjusted offset is possible to exceed the branch range.	2025-06-17 17:21:24 +08:00
Mikhail R. Gadelha	d8e44a9ab2	[RISCV] Add late optimization pass for riscv (#133256 ) This patch is an alternative to PRs #117060, #131684, #131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding Michael as a co-author as most of the code that evaluates the condition comes from #131684. Co-authored-by: Michael Maitland michaeltmaitland@gmail.com	2025-03-27 19:31:38 -03:00
Frederik Harwath	6962cf1700	Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128 ) This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.	2025-03-14 13:11:45 +01:00
Djordje Todorovic	5048a0858b	[RISCV] Generate MIPS load/store pair instructions (#124717 ) Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction. This is part of MIPS extensions for the p8700 CPU. Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.	2025-03-07 09:21:36 +01:00
Luke Lau	df96b56b9f	[RISCV] Move VMV0 elimination past machine SSA opts (#126850 ) This is the follow up to #125026 that keeps mask operands in virtual register form for as long as possible throughout the backend. The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer kicking in. The invariant that the mask COPY never has a subreg no longer holds after MachineCSE (it coalesces some copies), so it needed to be relaxed.	2025-02-20 12:41:05 +08:00
Craig Topper	26e375046d	Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (#127151 )" Tests have been re-generated with recent scheduler changes. Original message: SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.	2025-02-19 12:11:00 -08:00
Craig Topper	37d0f20593	Revert "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (#127151 )" This reverts commit c3ebbfd7368ec3e4737427eef602296a868a4ecd. Seeing some test failures on the build bot.	2025-02-19 11:57:53 -08:00
Craig Topper	c3ebbfd736	[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (#127151 ) SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.	2025-02-19 11:30:55 -08:00
Luke Lau	cc7e83601d	[RISCV] Select mask operands as virtual registers and eliminate uses of vmv0 (#125026 ) This is another attempt at #88496 to keep mask operands in SSA after instruction selection. Previously we selected the mask operands into vmv0, a singleton register class with exactly one register, V0. But the register allocator doesn't really support singleton register classes and we ran into errors like "ran out of registers during register allocation in function". This avoids this by introducing a pass just before register allocation that converts any use of vmv0 to a copy to $v0, i.e. what isel currently does today. That way the register allocator doesn't need to deal with the singleton register class, but we get the benefits of having the mask registers in SSA throughout the backend: - This allows RISCVVLOptimizer to reduce the VLs of instructions that define mask registers - It enables CSE and code sinking in more places - It removes the need to peek through mask copies in RISCVISelDAGToDAG and keep track of V0 defs in RISCVVectorPeephole This patch initially eliminates uses of vmv0s after RISCVVectorPeephole to keep the diff to a minimum, and a follow up patch will move it past the other MachineInstr SSA passes. Note that it doesn't try to remove any defs of vmv0 as we shouldn't have any instructions that have any vmv0 outputs. As a further follow up, we can move the elimination pass to after phi elimination and outside of SSA, which would unblock the pre-RA scheduler around masked pseudos. This might also help the issue that RISCVVectorMaskDAGMutation tries to solve.	2025-02-12 12:06:55 +08:00
TiborGY	3630d9ef65	[PartiallyInlineLibCalls] Add infrastructure for emitting optimization remarks from PartiallyInlineLibCalls (#122654 ) I am planning to add some optimization remarks to the `PartiallyInlineLibCalls` pass. However, since this pass does not emit any optimization remarks yet, I have to add the "infrastructure" for that first, which is what this PR is about.	2025-01-22 13:15:40 +07:00
Michael Maitland	169c32eb49	[RISCV][VLOPT] Enable the RISCVVLOptimizer by default (#119461 ) Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.	2024-12-17 16:19:35 -05:00
Alex Bradbury	9d02264b03	[RISCV] Enable global merging by default (#115495 ) From the discussion at the round-table at the RISC-V Summit it was clear people see cases where global merging would help. So the direction of enabling it by default and iteratively working to enable it in more cases or to improve the heuristics seems sensible. This patch tries to make a minimal step in that direction.	2024-11-15 13:22:34 +00:00
abhishek-kaushik22	d2aff182d3	Revert "TLS loads opimization (hoist)" (#114740 ) This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4. Based on the discussions in #112772, this pass is not needed after the introduction of `llvm.threadlocal.address` intrinsic. Fixes https://github.com/llvm/llvm-project/issues/112771.	2024-11-07 10:10:28 +01:00
dlav-sc	97982a8c60	[RISCV][CFI] add function epilogue cfi information (#110810 ) This patch adds CFI instructions in the function epilogue. Before patch: addi sp, s0, -32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload addi sp, sp, 32 ret After patch: addi sp, s0, -32 .cfi_def_cfa sp, 32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload .cfi_restore ra .cfi_restore s0 .cfi_restore s1 addi sp, sp, 32 .cfi_def_cfa_offset 0 ret This functionality is already present in `riscv-gcc`, but it’s not in `clang` and this slightly impairs the `lldb` debugging experience, e.g. backtrace.	2024-11-06 00:20:21 +03:00
Alex Bradbury	0ee10e9466	[RISCV] Add additional fence for amocas when required by recent ABI change (#101023 ) A recent atomics ABI change / fix requires that for the "A6C" and A6S" atomics ABIs (i.e. both of those supported by LLVM currently), an additional fence is inserted for an atomic_compare_exchange with seq_cst failure ordering. <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445> This isn't trivial to support through the hooks used by AtomicExpandPass because that pass assumes that when fences are inserted, the original atomics ordering information can be removed from the instruction. Rather than try to change and complicate that API, this patch implements the needed fence insertion through a small special purpose pass.	2024-09-19 13:39:56 +01:00
Kyungwoo Lee	2cfdcfb490	Fix for llvm/test/CodeGen/RISCV/O3-pipeline.ll (#108050 ) The previous `Fix for Attempt to fix [CGData][MachineOutliner] Global Outlining (#90074) #108037 (#108047)` somehow dropped this file.	2024-09-10 09:25:00 -07:00
Kyungwoo Lee	0f52545289	[CGData][MachineOutliner] Global Outlining (#90074 ) This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past. The machine outliner now operates in one of three modes: 1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module. 2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the `__llvm_outline` section, which will be dead-stripped at link time. 3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw `__llvm_outline` sections from all native objects using the `llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline later). This depends on https://github.com/llvm/llvm-project/pull/105398. After this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang (https://github.com/llvm/llvm-project/pull/90304) will follow for each client side support. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-09-10 06:56:31 -07:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
Philip Reames	27a62ec72a	[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234 ) This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which is integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.	2024-08-17 18:34:23 -07:00
Peter Rong	74e4694b8c	[LTO] enable `ObjCARCContractPass` only on optimized build (#101114 ) \#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](`1579e9ca9c`). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>	2024-08-09 13:04:25 -07:00
Yeting Kuo	e80d8e1b42	[RISCV] Insert simple landing pad before indirect jumps for Zicfilp. (#91860 ) This patch is based on https://github.com/llvm/llvm-project/pull/91855. This patch inserts simple landing pad ([pr])before indirct jumps. And this also make option riscv-landing-pad-label influence this feature. [pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417	2024-08-08 13:22:59 +08:00
Yeting Kuo	9fb196b469	[RISCV] Insert simple landing pad for taken address labels. (#91855 ) This patch implements simple landing pad labels ([pr]). When Zicfilp enabled, this patch inserts `lpad 0` at the beginning of basic blocks which are possible to be landed by indirect jumps. This patch also supports option riscv-landing-pad-label to make users cpable to set nonzero fixed labels. Using nonzero fixed label force setting t2 before indirect jumps. It's less portable but more strict than original implementation. [pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417	2024-08-06 22:04:48 +08:00
Alexis Engelke	fa92d51f9e	[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652 ) Similar to #97727; avoid an extra pass over the entire IR by performing the lowering as part of the pre-isel-intrinsic-lowering pass.	2024-08-06 09:27:59 +02:00
Alexis Engelke	b5fc083dc3	[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727 ) Currently, the LowerConstantIntrinsics pass does an RPO traversal of every function... only to find that many functions don't have constant intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is already a pre-isel intrinsic lowering pass, which iterates over intrinsic declarations and lowers all users. Call lowerConstantIntrinsics from this pass to avoid the extra iteration over the entire IR and the RPO traversal.	2024-08-01 17:44:32 +02:00
Luke Lau	6f65a39785	[RISCV] Update RISCVVectorPeephole pass name It was previously called RISCVFoldMasks	2024-07-26 10:33:38 +08:00
Yunzezhu94	a833fa7d3e	[RISCV] Move Machine Copy Propagation Pass before Branch relaxation pass (#97261 ) Machine Copy Propagation Pass may enlarge branch relaxation distance by breaking generation of compressed insts. This commit moves Machine Copy Propagation Pass before Branch relaxation pass so the results of Branch relaxation pass won't be affected by Machine Copy Propagation Pass.	2024-07-02 09:58:00 +08:00
Egor Pasko	cab81dd038	[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171 ) Move EntryExitInstrumenter(PostInlining=true) to as late as possible and EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage (but skip for ThinLTO post-link). This should fix the issues reported in https://github.com/rust-lang/rust/issues/92109 and https://github.com/llvm/llvm-project/issues/52853. These are caused by https://reviews.llvm.org/D97608.	2024-05-31 12:48:45 -07:00
Luke Lau	1cff74130f	[RISCV] Merge RISCVCoalesceVSETVLI back into RISCVInsertVSETVLI (#92869 ) We no longer need to separate the passes now that #70549 is landed and this will unblock #89089. It's not strictly NFC because it will move coalescing before register allocation when -riscv-vsetvl-after-rvv-regalloc is disabled. But this makes it closer to the original behaviour.	2024-05-29 20:59:34 +01:00
Nikita Popov	1579e9ca9c	Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331 )" This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2. This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7. Causes major compile-time regressions for unoptimized builds.	2024-05-24 08:14:26 +02:00
Nuri Amari	8cc8e5d6c6	Run ObjCContractPass in Default Codegen Pipeline (#92331 ) Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase. The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. Co-authored-by: Nuri Amari <nuriamari@fb.com>	2024-05-23 10:04:55 -07:00
Piyou Chen	675e7bd1b9	[RISCV] Support postRA vsetvl insertion pass (#70549 ) This patch try to get rid of vsetvl implict vl/vtype def-use chain and improve the register allocation quality by moving the vsetvl insertion pass after RVV register allocation It will gain the benefit for the following optimization from 1. unblock scheduler's constraints by removing vl/vtype def-use chain 2. Support RVV re-materialization 3. Support partial spill This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1\|0>` to control this feature and default set as disable.	2024-05-21 14:42:55 +08:00
Luke Lau	1a58e88690	[RISCV] Move RISCVInsertVSETVLI to after phi elimination (#91440 ) Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi elimination where we exit SSA and need to move to LiveVariables. The motivation for splitting this off is to avoid the large scheduling diffs from moving completely to after regalloc, and instead focus on converting the pass to work on LiveIntervals. The two main changes required are updating VSETVLIInfo to store VNInfos instead of MachineInstrs, which allows us to still check for PHI defs in needVSETVLIPHI, and fixing up the live intervals of any AVL operands after inserting new instructions. On O3 the pass is inserted after the register coalescer, otherwise we end up with a bunch of COPYs around eliminated PHIs that trip up needVSETVLIPHI. Co-authored-by: Piyou Chen <piyou.chen@sifive.com>	2024-05-15 11:44:32 +08:00
Luke Lau	0ebe48f068	[RISCV] Move RISCVInsertVSETVLI after CSR/VXRM passes (#91701 ) This further splits off #91440 to inch RISCVInsertVSETVLI closer to post vector regalloc. As noted in #91440, most of the diffs are from moving vsetvli insertion after the vxrm/csr insertion passes, but these are getting conflated with the changes from moving to LiveIntervals. One idea was that we could try and remove some of these diffs by manually moving back the vsetvlis past the vxrm/csr instructions. But this meant having to touch up the LiveIntervals again which seemed to lead to even more diffs. This instead just moves RISCVInsertVSETVLI after RISCVInsertReadWriteCSR and RISCVInsertWriteVXRM so we can isolate those changes.	2024-05-10 14:31:43 +08:00
Luke Lau	52187b9f2e	[RISCV] Move RISCVDeadRegisterDefinitions to post vector regalloc (#90636 ) Currently RISCVDeadRegisterDefinitions runs after vsetvli insertion, but in #70549 vsetvli insertion runs after vector regalloc and as a result we no longer convert some vsetvli a0, a0s to vsetvli x0, a0. This patch moves it to after vector regalloc, but before scalar regalloc so we still get the benefits of reducing register pressure.	2024-05-07 00:36:47 +08:00
Luke Lau	af82d01fbb	Reapply "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295 )" The original commit was calling shrinkToUses on an interval for a virtual register whose def was erased. This fixes it by calling shrinkToUses first and removing the interval if we erase the old VL def.	2024-04-25 00:42:30 +08:00
Luke Lau	fc13353e10	Revert "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295 )" Seems to cause an address sanitizer failure on one of the buildbots related to live intervals.	2024-04-24 23:27:01 +08:00
Luke Lau	603ba4c596	[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295 ) This patch splits off part of the work to move vsetvli insertion to post regalloc in #70549. The doLocalPostpass operates outside of RISCVInsertVSETVLI's dataflow, so we can move it to its own pass. We can then move it to post vector regalloc which should be a smaller change. A couple of things that are different from #70549: - This manually fixes up the LiveIntervals rather than recomputing it via createAndComputeVirtRegInterval. I'm not sure if there's much of a difference with either. - For the postpass it's sufficient enough to just check isUndef() in hasUndefinedMergeOp, i.e. we don't need to lookup the def in VNInfo. Running on llvm-test-suite and SPEC CPU 2017 there aren't any changes in the number of vsetvlis removed. There are some minor scheduling diffs as well as extra spills and less spills in some cases (caused by transient vsetvlis existing between RISCVInsertVSETVLI and RISCVCoalesceVSETVLI when vec regalloc happens), but they are minor and should go away once we finish moving the rest of RISCVInsertVSETVLI. We could also potentially turn off this pass for unoptimised builds.	2024-04-24 16:31:40 +08:00
Jack Styles	28233408a2	[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770 ) When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour. This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.	2024-02-26 12:12:31 +00:00
Craig Topper	7d40ea85d5	[RISCV] Enable the TypePromotion pass from AArch64/ARM. This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.	2024-02-13 09:57:48 -08:00
Piyou Chen	d0a39e617b	[RISCV] default enable splitting regalloc between RVV and other (#72950 ) This patch make riscv-split-regalloc as true by default. It will not affect the codegen result if it vector register allocation doesn't exist. If there is the vector register allocation, it may affect the non-rvv register LiveInterval's segment/weight. It will make the allocation in a different order.	2023-11-30 21:12:46 -06:00
Wang Pengcheng	9bb69c1d96	[RISCV] Enable LoopDataPrefetch pass (#66201 ) So that we can benefit from data prefetch when `Zicbop` extension is supported. Tune information for data prefetching are added in `RISCVTuneInfo`.	2023-11-10 15:39:58 +08:00
Craig Topper	014390d937	[RISCV] Implement cross basic block VXRM write insertion. (#70382 ) This adds a new pass to insert VXRM writes for vector instructions. With the goal of avoiding redundant writes. The pass does 2 dataflow algorithms. The first is a forward data flow to calculate where a VXRM value is available. The second is a backwards dataflow to determine where a VXRM value is anticipated. Finally, we use the results of these two dataflows to insert VXRM writes where a value is anticipated, but not available. The pass does not split critical edges so we aren't always able to eliminate all redundancy. The pass will only insert vxrm writes on paths that always require it.	2023-11-02 14:09:27 -07:00
Luke Lau	72e6c1c70d	[RISCV] Begin moving post-isel vector peepholes to a MF pass (#70342 ) We currently have three postprocess peephole optimisations for vector pseudos: 1) Masked pseudo with all ones mask -> unmasked pseudo 2) Merge vmerge pseudo into operand pseudo's mask 3) vmerge pseudo with all ones mask -> vmv.v.v pseudo This patch aims to move these peepholes out of SelectionDAG and into a separate RISCVFoldMasks MachineFunction pass. There are a few motivations for doing this: * The current SelectionDAG implementation operates on MachineSDNodes, which are essentially MachineInstrs but require a bunch of logic to reason about chain and glue operands. The RISCVII::hasOp helper functions also don't exactly line up with the SDNode operands. Mutating these pseudos and their operands in place becomes a good bit easier at the MachineInstr level. For example, we would no longer need to check for cycles in the DAG during performCombineVMergeAndVOps. Although it's further down the line, moving this code out of SelectionDAG allows it to be reused by GlobalISel later on. * In performCombineVMergeAndVOps, it may be possible to commute the operands to enable folding in more cases (see test/CodeGen/RISCV/rvv/vmadd-vp.ll). There is existing machinery to commute operands in TII::commuteInstruction, but it's implemented on MachineInstrs. The pass runs straight after ISel, before any of the other machine SSA optimization passes run. This is so that dead-mi-elimination can mop up any vmsets that are no longer used (but if preferred we could try and erase them from inside RISCVFoldMasks itself). This also means that these peepholes are no longer run at codegen -O0, so this patch isn't strictly NFC. Only the performVMergeToVMv peephole is refactored in this patch, the remaining two would be implemented later. And as noted by @preames, it should be possible to move doPeepholeSExtW out of SelectionDAG as well.	2023-10-30 15:17:00 +00:00
Craig Topper	109aa586f0	[RISCV] Add an experimental pseudoinstruction to represent a rematerializable constant materialization sequence. (#69983 ) Rematerialization during register allocation is currently limited to a single instruction with no inputs. This patch introduces a pseudoinstruction that represents the materialization of a constant. I've started with a sequence of 2 instructions for now, which covers at least the common LUI+ADDI(W) case. This instruction will be expanded into real instructions immediately after register allocation using a new pass. This gives the post-RA scheduler a chance to separate the 2 instructions to improve ILP. I believe this matches the approach used by AArch64. Unfortunately, this loses some CSE opportunies when an LUI value is used by multiple constants with different LSBs. This feature is off by default and a new backend command line option is added to enable it for testing. This avoids the spill and reloads reported in #69586.	2023-10-25 17:20:32 -07:00
Wang Pengcheng	f4231bf446	[RISCV] Replace PostRAScheduler with PostMachineScheduler (#68696 ) Just like what other targets have done. And this will make DAG mutations like MacroFusion take effect.	2023-10-19 13:30:41 +08:00
Yingwei Zheng	93fde2ea1b	[RISCV] Add a pass to rewrite rd to x0 for non-computational instrs whose return values are unused When AMOs are used to implement parallel reduction operations, typically the return value would be discarded. This patch adds a peephole pass `RISCVDeadRegisterDefinitions`. It rewrites `rd` to `x0` when `rd` is marked as dead. It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO. Comparison with GCC: https://godbolt.org/z/bKaxnEcec Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158759	2023-09-20 01:02:19 +08:00
WuXinlong	c0221e006d	[RISCV] Add a pass to combine `cm.pop` and `ret` insts `RISCVPushPopOptimizer.cpp` combine `cm.pop` and `ret` to generates `cm.popretz` or `cm.popret` . Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D150416	2023-07-07 14:04:11 +08:00
Sami Tolvanen	83835e22c7	[RISCV] Implement KCFI operand bundle lowering With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits "kcfi" operand bundles to indirect call instructions. Similarly to the target-specific lowering added in D119296, implement KCFI operand bundle lowering for RISC-V. This patch disables the generic KCFI pass for RISC-V in Clang, and adds the KCFI machine function pass in `RISCVPassConfig::addPreSched` to emit target-specific `KCFI_CHECK` pseudo instructions before calls that have KCFI operand bundles. The machine function pass also bundles the instructions to ensure we emit the checks immediately before the calls, which is not possible with the generic pass. `KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a contiguous code sequence that traps if the expected hash in the operand bundle doesn't match the hash before the target function address. This patch emits an `ebreak` instruction for error handling to match the Linux kernel's `BUG()` implementation. Just like for X86, we also emit trap locations to a `.kcfi_traps` section to support error handling, as we cannot embed additional information to the trap instruction itself. Relands commit 62fa708ceb027713b386c7e0efda994f8bdc27e2 with fixed tests. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D148385	2023-06-23 22:57:56 +00:00
Sami Tolvanen	e809ebeb6c	Revert "[RISCV] Implement KCFI operand bundle lowering" This reverts commit 62fa708ceb027713b386c7e0efda994f8bdc27e2. Reverting to investigate -verify-machineinstrs errors in MIR tests.	2023-06-23 21:42:57 +00:00
Sami Tolvanen	62fa708ceb	[RISCV] Implement KCFI operand bundle lowering With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits "kcfi" operand bundles to indirect call instructions. Similarly to the target-specific lowering added in D119296, implement KCFI operand bundle lowering for RISC-V. This patch disables the generic KCFI pass for RISC-V in Clang, and adds the KCFI machine function pass in `RISCVPassConfig::addPreSched` to emit target-specific `KCFI_CHECK` pseudo instructions before calls that have KCFI operand bundles. The machine function pass also bundles the instructions to ensure we emit the checks immediately before the calls, which is not possible with the generic pass. `KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a contiguous code sequence that traps if the expected hash in the operand bundle doesn't match the hash before the target function address. This patch emits an `ebreak` instruction for error handling to match the Linux kernel's `BUG()` implementation. Just like for X86, we also emit trap locations to a `.kcfi_traps` section to support error handling, as we cannot embed additional information to the trap instruction itself. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D148385	2023-06-23 18:25:24 +00:00

1 2

94 Commits