329 Commits

Author SHA1 Message Date
Craig Topper
71b6f6b8a1
[RISCV] Add missing hasPostISelHook = 1 to vector pseudos that might read FRM. (#114186)
We need an implicit FRM read operand anytime the rounding mode is
dynamic. The post isel hook is responsible for this when isel creates an
instruction with dynamic rounding mode.

Add a MachineVerifier check to verify the operand is present.
2024-10-30 11:47:40 -07:00
Craig Topper
c3724ba866
[RISCV] Add OperandType for vector rounding mode operands. (#114179)
Use TSFlags to distinquish which type of rounding mode it is. We use the same tablegen base classes for vxrm and frm sometimes so its hard to have different types for different instructions.
2024-10-30 11:46:15 -07:00
Craig Topper
e7262c15d3
[RISCV] Add OperandType for sew and vecpolicy operands. (#114168) 2024-10-29 22:34:47 -07:00
Craig Topper
f672cc1ee1
[RISCV] Add OperandType for condition code arguments used by select and SFB pseudos. (#114163) 2024-10-29 21:23:31 -07:00
Craig Topper
13a3c4f97c
[RISCV] Add OperandType to frmarg and rtzarg. (#114142)
Teach RISCVInstrInfo::verifyInstruction to validate them.

This is partially extracted from #89047, but that did not include the
verification.
2024-10-29 17:46:52 -07:00
Luke Lau
0cbccb13d6
[RISCV] Remove support for pre-RA vsetvli insertion (#110796)
Now that LLVM 19.1.0 has been out for a while with post-vector-RA
vsetvli insertion enabled by default, this proposes to remove the flag
that restores the old pre-RA behaviour so we only have one configuration
going forward.

That flag was mainly meant as a fallback in case users ran into issues,
but I haven't seen anything reported so far.
2024-10-28 11:31:18 +00:00
Michael Maitland
ae68d532f8
[RISCV][VLOPT] Allow propagation even when VL isn't VLMAX (#112228)
The original goal of this pass was to focus on vector operations with
VLMAX. However, users often utilize only part of the result, and such
usage may come from the vectorizer.

We found that relaxing this constraint can capture more optimization
opportunities, such as non-power-of-2 code generation and vector
operation sequences with different VLs.

---------

Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-10-16 14:58:00 -04:00
Craig Topper
bc91f3cdd5
[RISCV] Add 32 bit GPR sub-register for Zfinx. (#108336)
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits.
    
I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg.
    
Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes.
    
This is similar to #107446 which adds a 16 bit register class.
2024-10-01 22:09:27 -07:00
Luke Lau
a81902ffc9
[RISCV] Fold vfmv.f.s of f16 into load from stack (#110214)
After #110144, we can finish off #110129 and fold f16 vfmv.f.s into a
flh.
vfmv.f.s is only available for f16 with zvfh, which in turn requires
zfhmin so we can use flh.

bf16 has no vfmv.f.s so the extract_vector_elt is lowered as an integer
in #110144, and gets the existing integer vmv.x.s fold.
2024-10-01 14:09:56 +08:00
Simon Pilgrim
44478ba5f6 Fix MSVC signed/unsigned mismatch warning. NFC. 2024-09-28 11:26:15 +01:00
Craig Topper
8a7843ca0f
[RISCV] Add 16 bit GPR sub-register for Zhinx. (#107446)
This patches adds a 16 bit register class for use with Zhinx
instructions. This makes them more similar to Zfh instructions and
allows us to only spill 16 bits.

I've added CodeGenOnly instructions for load/store using GPRF16 as that
gave better results than insert_subreg/extract_subreg. I'm using FSGNJ
for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will
use ADDI+subreg operations.

Function arguments use this new GPRF16 register class for f16 arguments
with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes.

I plan to extend this idea to Zfinx next.
2024-09-26 22:56:12 -07:00
Luke Lau
f6dacda949
[RISCV] Fold vfmv.f.s into load from stack (#110129)
This is the f64/f32 version of #109774.
I've left out f16 and bf16 for now because there's a separate issue
where we can't select extract_vector_elt when f16/bf16 is a legal type,
see #110126.
2024-09-27 09:35:14 +08:00
Luke Lau
63b534be17
[RISCV] Fold vmv.x.s into load from stack (#109774)
If a vector is reloaded from the stack to be used in vmv.x.s, we can
tell foldMemoryOperandImpl to fold it into a scalar load.

If XLEN < SEW then this currently just bails. I couldn't think of a way
to express a vmv.x.s that truncates in LLVM IR.
2024-09-25 18:49:52 +08:00
Craig Topper
6c7134b266
[RISCV] Don't create MachineMemOperand in foldMemoryOperandImpl. (#109840)
The caller already does this after we return. I think it will overwrite
any MMO we add.

I'm the original author of this code and I'm not sure why I did it.
2024-09-24 15:54:02 -07:00
Youngsuk Kim
d31e314131 [llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20 12:19:59 -05:00
Jonathon Penix
866b93e6b3
[RISCV] Don't outline pcrel_lo when the function has a section prefix (#107943)
GNU ld will error when encountering a pcrel_lo whose corresponding
pcrel_hi is in a different section. [1] introduced a check to help
prevent this issue by preventing outlining in a few circumstances.
However, we can also hit this same issue when outlining from functions
with prefixes ("hot"/"unlikely"/"unknown" from profile information, for
example) as the outlined function might not have the same prefix,
possibly resulting in a "paired" pcrel_lo and pcrel_hi ending up in
different sections.

To prevent this issue, take a similar approach as [1] and additionally
prevent outlining when we see a pcrel_lo and the function has a prefix.

[1]
96c85f80f0

Fixes #107520
2024-09-11 09:53:11 -07:00
Luke Lau
933fc63a1d
[RISCV] Rematerialize vmv.s.x and vfmv.s.f (#108012)
Continuing with #107993 and #108007, this handles the last of the main
rematerializable vector instructions.

There's an extra spill in one of the test cases, but it's likely noise
from the spill weights and isn't an issue in practice.
2024-09-11 09:44:57 +08:00
Luke Lau
21a0176c58
[RISCV] Rematerialize vfmv.v.f (#108007)
This is the same principle as vmv.v.x in #107993, but for floats.
2024-09-11 09:38:29 +08:00
Luke Lau
77fc8dae22
[RISCV] Rematerialize vmv.v.x (#107993)
Even though vmv.v.x has a non constant scalar operand, we can still
rematerialize it because we have split register allocation between
vectors and scalars.

InlineSpiller will check to make sure that the scalar operand is live at
the point where the rematerialization occurs, so this won't extend any
scalar live ranges. However this also means we may not be able to
rematerialize in some cases, as shown in @vmv.v.x_needs_extended.

It might be worthwhile teaching InlineSpiller to extend scalar live
ranges in a future patch. I experimented with this locally and it
reduced spills on 531.deepsjeng_r by a further 3%.
2024-09-11 09:13:23 +08:00
Luke Lau
65dc53baca
[RISCV] Rematerialize vmv.v.i (#107550)
This continues the line of work started in #97520, and gives a 2.5%
reduction in the number of spills on SPEC CPU 2017.


    Program            regalloc.NumSpills               
                       lhs                rhs      diff 
             605.mcf_s   141.00             141.00  0.0%
             505.mcf_r   141.00             141.00  0.0%
             519.lbm_r    73.00              73.00  0.0%
             619.lbm_s    68.00              68.00  0.0%
       631.deepsjeng_s   354.00             353.00 -0.3%
       531.deepsjeng_r   354.00             353.00 -0.3%
            625.x264_s  1896.00            1886.00 -0.5%
            525.x264_r  1896.00            1886.00 -0.5%
            508.namd_r  6665.00            6598.00 -1.0%
             644.nab_s   761.00             753.00 -1.1%
             544.nab_r   761.00             753.00 -1.1%
         638.imagick_s  4287.00            4181.00 -2.5%
         538.imagick_r  4287.00            4181.00 -2.5%
             602.gcc_s 12771.00           12450.00 -2.5%
             502.gcc_r 12771.00           12450.00 -2.5%
          510.parest_r 43876.00           42740.00 -2.6%
       500.perlbench_r  4297.00            4179.00 -2.7%
       600.perlbench_s  4297.00            4179.00 -2.7%
         526.blender_r 13503.00           13103.00 -3.0%
          511.povray_r  2006.00            1937.00 -3.4%
         620.omnetpp_s   984.00             946.00 -3.9%
         520.omnetpp_r   984.00             946.00 -3.9%
              657.xz_s   302.00             289.00 -4.3%
              557.xz_r   302.00             289.00 -4.3%
           541.leela_r   378.00             356.00 -5.8%
           641.leela_s   378.00             356.00 -5.8%
       623.xalancbmk_s  1646.00            1548.00 -6.0%
       523.xalancbmk_r  1646.00            1548.00 -6.0%
    Geomean difference                             -2.5%


I initially held off submitting this patch because it surprisingly
introduced a lot of spills in the test diffs, but after #107290 the
vmv.v.is that caused them are now gone.

The gist is that marking vmv.v.i as spillable decreased its spill
weight, which actually resulted in more m8 registers getting evicted and
spilled during register allocation.

The SPEC results show this isn't an issue in practice though, and I plan
on posting a separate patch to explain this in more detail.
2024-09-09 13:11:08 +08:00
Luke Lau
3d729571fd
[RISCV] Model dest EEW and fix peepholes not checking EEW (#105945)
Previously for vector peepholes that fold based on VL, we checked if the
VLMAX is the same as a proxy to check that the EEWs were the same. This
only worked at LMUL >= 1 because the EMULs of the Src output and user's
input had to be the same because the register classes needed to match.

At fractional LMULs we would have incorrectly folded something like
this:

    %x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, 4, 4 /* e16 */, 0
    %y:vr = PseudoVMV_V_V_MF8 $noreg, %x, 4, 3 /* e8 */, 0

This models the EEW of the destination operands of vector instructions
with a TSFlag, which is enough to fix the incorrect folding.

There's some overlap with the TargetOverlapConstraintType and
IsRVVWideningReduction. If we model the source operands as well we may
be able to subsume them.
2024-09-05 15:27:48 +08:00
Kyungwoo Lee
93b8d07a75
[MachineOutliner][NFC] Refactor (#105398)
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.

- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.

This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-08-27 14:38:36 -07:00
Piyou Chen
b01c006f73
[TII][RISCV] Add renamable bit to copyPhysReg (#91179)
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
2024-08-27 10:08:43 +08:00
Pengcheng Wang
27b608055f
[RISCV] Increase default tail duplication threshold to 6 at -O3 (#98873)
This is just like AArch64.

Changing the threshold to 6 will increase the code size, but will
also decrease unconditional branches. CPUs with wide fetch/issue units
can benefit from it.

The value 6 may be debatable, we can set it to `SchedModel.IssueWidth`.
2024-08-01 12:24:25 +08:00
Pengcheng Wang
ed4e75d5e5
[CodeGen] Remove AA parameter of isSafeToMove (#100691)
This `AA` parameter is not used and for most uses they just pass
a nullptr.

The use of `AA` was removed since 8d0383e.
2024-07-26 15:47:47 +08:00
Matt Arsenault
3cb5604d2c
MachineOutliner: Use PM to query MachineModuleInfo (#99688)
Avoid getting this from the MachineFunction
2024-07-24 13:22:56 +04:00
Craig Topper
dbc3df1718 [RISCV] Remove unnecessary call to MachineFunction::getSubtarget. NFC
RISCVInstrInfo already caches a reference to the subtarget object
that owns it. We can use that.
2024-07-16 15:39:20 -07:00
R
3c5f929ad0
[RISCV] Add QingKe "XW" compressed opcode extension (#97925)
This extension consists of 8 additional 16-bit compressed forms for
existing standard load/store opcodes.

These opcodes are found in some RISC-V microcontrollers from WCH /
Nanjing Qinheng Microelectronics.

As discussed in the Discourse forums, this uses incompatible extension
and opcode names vs the vendor binary toolchain. The chosen names
instead follow the conventions for other vendor extensions listed on the
"riscv-non-isa" project.
2024-07-11 11:10:02 +08:00
Craig Topper
6299681665 [RISCV] Remove unnecessary cast to RISCVTargetMachine in getInstSizeInBytes. NFC
getMCAsmInfo is a method on the base class so we don't need a cast here.
2024-07-10 10:42:18 -07:00
Luke Lau
ac20135605
[RISCV] Rematerialize vid.v (#97520)
This adds initial support for rematerializing vector instructions,
starting with vid.v since it's simple and has the least number of
operands. It has one passthru operand which we need to check is
undefined. It also has an AVL operand, but it's fine to rematerialize
with it because it's scalar and register allocation is split between
vector and scalar.
    
RISCVInsertVSETVLI can still happen before vector regalloc if
-riscv-vsetvl-after-rvv-regalloc is false, so this makes sure that we
only rematerialize after regalloc by checking for the implicit uses that
are added.
2024-07-04 11:34:25 +08:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Jianjian Guan
7625465651
[RISCV] Make M imply Zmmul (#95070)
According to the spec, M implies Zmmul.
2024-06-21 11:11:10 +08:00
Liao Chunyu
f4d2f7a3b7
[RISCV] Codegen support for XCVbi extension (#89719)
spec:
https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst#immediate-branching-operations

Contributors: @CharKeaney, @jeremybennett, @lewis-revill,
@NandniJamnadas,
@PaoloS02, @simonpcook, @xingmingjie, @realqhc, @PhilippvK,@melonedo
2024-05-08 11:22:16 +08:00
Pengcheng Wang
2c1c887c8e
[RISCV] Make fixed-point instructions commutable (#90035)
This PR includes:
* vsadd.vv/vsaddu.vv
* vaadd.vv/vaaddu.vv
* vsmul.vv
2024-04-28 12:04:09 +08:00
Min-Yih Hsu
5f67ce5611
[RISCV][MachineCombiner] Add reassociation optimizations for RVV instructions (#88307)
This patch covers a really basic reassociation optimizations for VADD_VV and VMUL_VV.
2024-04-25 16:36:11 -07:00
Wang Pengcheng
2125080fd5 [RISCV][NFC] Undef CASE_RVV_OPCODE* macros after using 2024-04-25 17:56:26 +08:00
Michael Maitland
12d47247e5 [RISCV][NFC] Move RISCVMaskedPseudoTable to RISCVInstrInfo 2024-04-24 09:29:20 -07:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Min-Yih Hsu
5fe93b0a4d
[CodeGen][TII] Allow reassociation on custom operand indices (#88306)
This opens up a door for reusing reassociation optimizations on
target-specific binary operations with non-standard operand list.

This is effectively a NFC.
2024-04-23 11:10:37 -07:00
Craig Topper
fca2a49325
[RISCV] Simplify FindRegWithEncoding in copyPhysRegVector. NFC (#89001)
Instead of searching all encodings, we can convert the encoding back to
a register and use getMatchingSuperReg.
2024-04-16 21:46:57 -07:00
Harald van Dijk
8cee94e989 [RISCV] Fix obvious copy paste error.
CASE_VFMA_OPCODE_VV and CASE_VFMA_CHANGE_OPCODE_VV need to match up if we are
are to avoid "Unexpected opcode" errors, but in CASE_VFMA_CHANGE_OPCODE_VV,
CASE_VFMA_CHANGE_OPCODE_LMULS_MF2 had mistakenly been used instead of
CASE_VFMA_CHANGE_OPCODE_LMULS_MF4.
2024-04-16 16:32:57 +01:00
Pengcheng Wang
d34a2c2adb
[RISCV] Make more vector pseudos commutable
This PR includes:
* vadd.vv/vand.vv/vor.vv/vxor.vv
* vmseq.vv/vmsne.vv
* vmin.vv/vminu.vv/vmax.vv/vmaxu.vv
* vmul.vv/vmulh.vv/vmulhu.vv
* vwadd.vv/vwaddu.vv
* vwmul.vv/vwmulu
* vwmacc.vv/vwmaccu.vv
* vadc.vvm

There is no test change, I may add it later.

Fixes part of #64422

Reviewers: michaelmaitland, preames, lukel97, topperc, asb

Reviewed By: topperc, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/88379
2024-04-16 15:55:14 +08:00
Yingwei Zheng
b5b17bf613
[RISCV] Fix assertion failure in genShXAddAddShift (#88757)
Fix assertion failure in our downstream CI
https://github.com/dtcxzyw/llvm-codegen-benchmark/issues/1.
2024-04-16 01:50:22 +08:00
Craig Topper
040efafa9f
[RISCV] Support uimm32 immediates in RISCVInstrInfo::movImm for RV32. (#88464)
This allows us to support larger stack offsets for FrameLowering.

Fixes #88365.
2024-04-12 08:45:03 -07:00
Michael Maitland
43248ffea7 [RISCV] Split widening floating point fused multiple-add pseudo instructions by SEW
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
2024-04-12 07:06:40 -07:00
Michael Maitland
c6b7944be4 [RISCV] Split single width floating point fused multiple-add pseudo instructions by SEW
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
2024-04-12 07:06:40 -07:00
Michael Maitland
aece68269c [RISCV] Split PseudoVFWADD, PseudoVFWSUB, and PseudoVFWMUL by SEW
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
2024-04-12 07:06:39 -07:00
Pengcheng Wang
b564036933
[MachineCombiner][NFC] Split target-dependent patterns
We split target-dependent MachineCombiner patterns into their target
folder.

This makes MachineCombiner much more target-independent.

Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc

Reviewed By: topperc, mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/87991
2024-04-11 12:20:27 +08:00
Sacha Coppey
53003e36e9
[RISCV] Implement Statepoint and Patchpoint lowering to call instructions (#77337)
This patch adds stackmap support for RISC-V with call targets.

Based on patch from https://reviews.llvm.org/D129848.
2024-04-11 12:19:56 +08:00
Craig Topper
7f1b9adfc8
[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884)
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.

This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.

I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
2024-04-10 08:39:56 -07:00