165 Commits

Author SHA1 Message Date
Alex Bradbury
3787fbf040
[RISCV] Enable merging of external globals by default (#117880)
This follows up #115495 by enabling merging of external globals by
default, which had been left as a next step in order to make the
previous change more incremental and so we can more easily narrow down
on any identified regressions.

Enabling merging of external globals matches what Arm does (for non
mach-o targets), though AArch64 doesn't as there were [some
concerns](https://reviews.llvm.org/D61947) it might cause regressions in
some cases.

See https://github.com/llvm/llvm-project/pull/117880 for benchmark figures and discussion.
2024-12-11 16:06:49 +00:00
Raphael Moreira Zinsly
708a478d67
[RISCV] Add stack clash protection (#117612)
Enable `-fstack-clash-protection` for RISCV and stack probe for function
prologues.
We probe the stack by creating a loop that allocates and probe the stack
in ProbeSize chunks.
We emit an unrolled probe loop for small allocations and emit a variable
length probe loop for bigger ones.
2024-12-10 16:48:26 +00:00
Pengcheng Wang
01a15dca09
[RISCV] Set a barrier between mask producer and user of V0 (#114012)
Here we add a scheduling mutation in pre-ra scheduling, which will
add an artificial dependency edge between mask producer and its
previous nearest instruction that uses V0 register.

This prevents the overlap of live intervals of mask registers and
as a consequence we can reduce some spills/moves.

From the test changes, we can see some improvements and also some
regressions (more vtype toggles).

Partially fixes #113489.
2024-11-29 13:25:53 +08:00
Alex Bradbury
9d02264b03
[RISCV] Enable global merging by default (#115495)
From the discussion at the round-table at the RISC-V Summit it was clear
people see cases where global merging would help. So the direction of
enabling it by default and iteratively working to enable it in more
cases or to improve the heuristics seems sensible. This patch tries to
make a minimal step in that direction.
2024-11-15 13:22:34 +00:00
Matin Raayai
bb3f5e1fed
Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.

cc @arsenm @aeubanks
2024-11-14 13:30:05 -08:00
Kazu Hirata
82d5dd28b4
[RISCV] Remove unused includes (NFC) (#115814)
Identified with misc-include-cleaner.
2024-11-11 22:54:54 -08:00
Alex Bradbury
ae4fc80574
[RISCV] When using global merging, don't enable merging of external globals by default (#115484)
AArch64 left this disabled after seeing some cases of slightly worse
codegen that weren't tracked down, so I suggest as a path to
incrementally moving towards enable globals merging we follow suit, and
evaluate turning on later.

This patch disables merging of external globals, but also adds a flag to
override that. This reduces churn in test cases, simplifies benchmarking
runs, and this flag can be removed later.

A follow-on PR enables the globals merging pass by default (and as it's
based on this commit, merging of external globals is disabled just as
they are for AArch64).
2024-11-09 10:02:50 +00:00
BoyaoWang430
69d0bab826
[RISCV] Add load/store clustering in post machine schedule (#111504)
#73789 added load clustering and #73796 tried to add store clustering.
If post machine schedule is used, previous cluster of load/store which
formed in machine schedule may break. In order to solve this, add
load/sotre clustering to post machine schedule.
2024-11-06 16:21:30 +08:00
dlav-sc
97982a8c60
[RISCV][CFI] add function epilogue cfi information (#110810)
This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret

After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret

This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.
2024-11-06 00:20:21 +03:00
Luke Lau
0cbccb13d6
[RISCV] Remove support for pre-RA vsetvli insertion (#110796)
Now that LLVM 19.1.0 has been out for a while with post-vector-RA
vsetvli insertion enabled by default, this proposes to remove the flag
that restores the old pre-RA behaviour so we only have one configuration
going forward.

That flag was mainly meant as a fallback in case users ran into issues,
but I haven't seen anything reported so far.
2024-10-28 11:31:18 +00:00
Luke Lau
db57fc4edc
[RISCV][VLOPT] Fix passthru check in getOperandInfo (#112244)
If a pseudo has a passthru, I believe the first source operand will have
operand no 2, not 1.
2024-10-14 20:54:17 +01:00
Alex Bradbury
2967e5f800
[RISCV] Enable store clustering by default (#73796)
Builds on #73789, enabling store clustering by default using the same
heuristic.
2024-10-11 20:25:53 +01:00
Michael Maitland
1c94388f38
[RISCV] Introduce VLOptimizer pass (#108640)
The purpose of this optimization is to make the VL argument, for
instructions that have a VL argument, as small as possible. This is
implemented by visiting each instruction in reverse order and checking
that if it has a VL argument, whether the VL can be reduced.

By putting this pass before VSETVLI insertion, we see three kinds of
changes to generated code:
1. Eliminate VSETVLI instructions
2. Reduce the VL toggle on VSETVLI instructions that also change vtype
3. Reduce the VL set by a VSETVLI instruction

The list of supported instructions is currently whitelisted for safety.
In the future, we could add more instructions to `isSupportedInstr` to
support even more VL optimization.

We originally wrote this pass because vector GEP instructions do not
take a VL, which leads us to emit code that uses VL=VLMAX to implement
GEP in the RISC-V backend. As a result, some of the vector instructions
will write to lanes, specifically between the intended VL and VLMAX,
that will never be read. As an alternative to this pass, we considered
adding a vector predicated GEP instruction, but this would not fit well
into the intrinsic type system since GEP has a variable number of
arguments, each with arbitrary types. The second approach we considered
was to put this pass after VSETVLI insertion, but we found that it was
more difficult to recognize optimization opportunities, especially
across basic block boundaries -- the data flow analysis was also a bit
more expensive and complex.

While this pass solves the GEP problem, we have expanded it to handle
more cases of VL optimization, and there is opportunity for the analysis
to be improved to enable even more optimization. We have a few follow up
patches to post, but figured this would be a good start.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-10-11 09:45:35 -04:00
Craig Topper
9c15ff21aa
[RISCV][GISel] Add RISCVPassConfig::getCSEConfig() to match other targets. (#110755) 2024-10-01 21:07:26 -07:00
Alex Bradbury
14c4f28ec1
[RISCV] Enable load clustering by default (#73789)
We believe this is neutral or slightly better in the majority of cases.
2024-10-01 13:45:30 +01:00
Michael Maitland
9c5ad62e74 Revert "[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991)"
This reverts commit 64972834c193632cbc47e54c0f0c721636b077e6.

Based on the discussions in #108991 that happened post merge, we have decided
to remove this pass in favor of generating `RISCV::G_*` opcodes in the legalizer.

We may reconsider moving that code elsewhere in the future so that we can do
a better job during generic combines. We don't feel that doing it in instruciton
selection is the right decision today. Firstly, it requires us to manually
do regbankselect on the newly introduced instructions. Secondly, it is more
difficult to test since the test output will contain whatever `RISCV::G_*`
instructions select to (instead of `RISCV::G_*`).

My personal opinion is that the legalizer pass can be split into an early
legalizer and a late legalizer, both before regbankselect. The first legalizer
would not introduce target specific generic opcodes and the generic combiner
would run after it. The second legalizer would introduce the target specific
generic opcodes. I think this approach is better than the lowerer because the
legalizer guarantees that whatever we lower to is legal, and apparently because
it is more performant at compared to the lowerer (although, I'm not sure how
true this is).
2024-09-20 07:01:17 -07:00
Alex Bradbury
0ee10e9466
[RISCV] Add additional fence for amocas when required by recent ABI change (#101023)
A recent atomics ABI change / fix requires that for the "A6C" and A6S"
atomics ABIs (i.e. both of those supported by LLVM currently), an
additional fence is inserted for an atomic_compare_exchange with seq_cst
failure ordering.
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445>

This isn't trivial to support through the hooks used by AtomicExpandPass
because that pass assumes that when fences are inserted, the original
atomics ordering information can be removed from the instruction. Rather
than try to change and complicate that API, this patch implements the
needed fence insertion through a small special purpose pass.
2024-09-19 13:39:56 +01:00
Michael Maitland
64972834c1
[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991)
This is mostly a copy of the AArch64PostLegalizerLoweringPass, except it
removes all of the AArch64 combines.

This pass allows us to lower instructions after the generic
post-legalization combiner has had a chance to run.

We will be adding combines to this pass in future patches.
2024-09-17 13:18:35 -04:00
Philip Reames
27a62ec72a
[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV.  Splitting it out makes
it easier to test.
    
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
    
Integration wise, I switched from using TTI to using a pass configuration
variable.  This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
2024-08-17 18:34:23 -07:00
Yeting Kuo
e80d8e1b42
[RISCV] Insert simple landing pad before indirect jumps for Zicfilp. (#91860)
This patch is based on https://github.com/llvm/llvm-project/pull/91855.
This patch inserts simple landing pad
([pr])before indirct jumps. And this also make option
riscv-landing-pad-label influence this feature.
[pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417
2024-08-08 13:22:59 +08:00
Yeting Kuo
9fb196b469
[RISCV] Insert simple landing pad for taken address labels. (#91855)
This patch implements simple landing pad labels ([pr]). When Zicfilp
enabled, this patch inserts `lpad 0` at the beginning of basic blocks
which are possible to be landed by indirect jumps.
This patch also supports option riscv-landing-pad-label to make users
cpable to set nonzero fixed labels. Using nonzero fixed label force
setting t2 before indirect jumps. It's less portable but more strict
than original implementation.

[pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417
2024-08-06 22:04:48 +08:00
Christudasan Devadasan
15b41d207e
[CodeGen] change prototype of regalloc filter function (#93525)
[CodeGen] Change the prototype of regalloc filter function

Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.

Patch provided by: Gang Chen (gangc@amd.com)
2024-07-22 16:49:39 +05:30
Luke Lau
c74ba57e0b
[RISCV] Convert AVLs with vlenb to VLMAX where possible (#97800)
Given an AVL that's computed from vlenb, if it's equal to VLMAX then we
can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP
intrinsics whilst emitting vsetvli a0, zero, so that we can replace
llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVVectorPeephole (previously RISCVFoldMasks, renamed
to account for the fact that it no longer just folds masks) instead of
SelectionDAG since there are multiple places places where VP nodes are
lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to
lookup the value of the AVL, and in RISCVVectorPeephole we can take
advantage of DeadMachineInstrElim to remove any leftover
PseudoReadVLENBs.
2024-07-11 14:22:00 +08:00
Min-Yih Hsu
8b55d342b6
[RISCV][LoopIdiomVectorize] Support VP intrinsics in LoopIdiomVectorize (#94082)
Teach LoopIdiomVectorize to use VP intrinsics to replace the byte
compare loops. Right now only RISC-V uses LoopIdiomVectorize of this
style.
2024-07-02 18:48:28 -07:00
Yunzezhu94
a833fa7d3e
[RISCV] Move Machine Copy Propagation Pass before Branch relaxation pass (#97261)
Machine Copy Propagation Pass may enlarge branch relaxation distance by
breaking generation of compressed insts. This commit moves Machine Copy
Propagation Pass before Branch relaxation pass so the results of Branch
relaxation pass won't be affected by Machine Copy Propagation Pass.
2024-07-02 09:58:00 +08:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
Luke Lau
1cff74130f
[RISCV] Merge RISCVCoalesceVSETVLI back into RISCVInsertVSETVLI (#92869)
We no longer need to separate the passes now that #70549 is landed and
this will unblock #89089.

It's not strictly NFC because it will move coalescing before register
allocation when -riscv-vsetvl-after-rvv-regalloc is disabled. But this
makes it closer to the original behaviour.
2024-05-29 20:59:34 +01:00
Piyou Chen
675e7bd1b9
[RISCV] Support postRA vsetvl insertion pass (#70549)
This patch try to get rid of vsetvl implict vl/vtype def-use chain and
improve the register allocation quality by moving the vsetvl insertion
pass after RVV register allocation

It will gain the benefit for the following optimization from

1. unblock scheduler's constraints by removing vl/vtype def-use chain
2. Support RVV re-materialization
3. Support partial spill

This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1|0>` to
control this feature and default set as disable.
2024-05-21 14:42:55 +08:00
Luke Lau
566fbb4500
[RISCV] Defer creating RISCVInsertVSETVLI to avoid leak with -stop-after (#92303)
As noted in
https://github.com/llvm/llvm-project/pull/91440#discussion_r1601976425,
if the pass pipeline stops early because of -stop-after any allocated
passes added with insertPass will not be freed if they haven't already
been added.

This was showing up as a failure on the address sanitizer buildbots. We
can fix it by instead passing the pass ID instead so that allocation is
deferred.
2024-05-16 12:57:28 +08:00
Luke Lau
1a58e88690
[RISCV] Move RISCVInsertVSETVLI to after phi elimination (#91440)
Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi
elimination where we exit SSA and need to move to LiveVariables.

The motivation for splitting this off is to avoid the large scheduling
diffs from moving completely to after regalloc, and instead focus on
converting the pass to work on LiveIntervals.

The two main changes required are updating VSETVLIInfo to store VNInfos
instead of MachineInstrs, which allows us to still check for PHI defs in
needVSETVLIPHI, and fixing up the live intervals of any AVL operands
after inserting new instructions.

On O3 the pass is inserted after the register coalescer, otherwise we
end up with a bunch of COPYs around eliminated PHIs that trip up
needVSETVLIPHI.

Co-authored-by: Piyou Chen <piyou.chen@sifive.com>
2024-05-15 11:44:32 +08:00
Luke Lau
0ebe48f068
[RISCV] Move RISCVInsertVSETVLI after CSR/VXRM passes (#91701)
This further splits off #91440 to inch RISCVInsertVSETVLI closer to post
vector regalloc.

As noted in #91440, most of the diffs are from moving vsetvli insertion
after the vxrm/csr insertion passes, but these are getting conflated
with the changes from moving to LiveIntervals.

One idea was that we could try and remove some of these diffs by
manually moving back the vsetvlis past the vxrm/csr instructions. But
this meant having to touch up the LiveIntervals again which seemed to
lead to even more diffs.

This instead just moves RISCVInsertVSETVLI after RISCVInsertReadWriteCSR
and RISCVInsertWriteVXRM so we can isolate those changes.
2024-05-10 14:31:43 +08:00
Luke Lau
52187b9f2e
[RISCV] Move RISCVDeadRegisterDefinitions to post vector regalloc (#90636)
Currently RISCVDeadRegisterDefinitions runs after vsetvli insertion, but
in #70549 vsetvli insertion runs after vector regalloc and as a result
we no longer convert some vsetvli a0, a0s to vsetvli x0, a0. This patch
moves it to after vector regalloc, but before scalar regalloc so we
still get the benefits of reducing register pressure.
2024-05-07 00:36:47 +08:00
Luke Lau
af82d01fbb Reapply "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)"
The original commit was calling shrinkToUses on an interval for a virtual
register whose def was erased. This fixes it by calling shrinkToUses first
and removing the interval if we erase the old VL def.
2024-04-25 00:42:30 +08:00
Luke Lau
fc13353e10 Revert "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)"
Seems to cause an address sanitizer failure on one of the buildbots related
to live intervals.
2024-04-24 23:27:01 +08:00
Luke Lau
603ba4c596
[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)
This patch splits off part of the work to move vsetvli insertion to post
regalloc in #70549.

The doLocalPostpass operates outside of RISCVInsertVSETVLI's dataflow,
so we can move it to its own pass. We can then move it to post vector
regalloc which should be a smaller change.

A couple of things that are different from #70549:

- This manually fixes up the LiveIntervals rather than recomputing it
via createAndComputeVirtRegInterval. I'm not sure if there's much of a
difference with either.
- For the postpass it's sufficient enough to just check isUndef() in
hasUndefinedMergeOp, i.e. we don't need to lookup the def in VNInfo.

Running on llvm-test-suite and SPEC CPU 2017 there aren't any changes in
the number of vsetvlis removed. There are some minor scheduling diffs as
well as extra spills and less spills in some cases (caused by transient
vsetvlis existing between RISCVInsertVSETVLI and RISCVCoalesceVSETVLI
when vec regalloc happens), but they are minor and should go away once
we finish moving the rest of RISCVInsertVSETVLI.

We could also potentially turn off this pass for unoptimised builds.
2024-04-24 16:31:40 +08:00
Luke Lau
ad4a42bbc7
[RISCV] Remove -riscv-split-regalloc flag (#89715)
Split vector and scalar regalloc has been enabled by default for 5
months now since d0a39e617ba301a76d28e2d82e1f657999c9dcfb, and shipped
with 18.1.0. I haven't heard of any issues with it so far, so this
proposes to remove the flag to reduce the number of configurations we
have to support.
2024-04-24 15:04:37 +08:00
Wang Pengcheng
68d07bf34f
[RISCV][NFC] Add helpers for RVV register classes
There are two places in tree that use these helpers and there will
be more future usages.

Reviewers: asb, BeMg, lukel97

Reviewed By: BeMg, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/84144
2024-03-07 14:18:37 +08:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Craig Topper
5b53fa04db
[RISCV] Enable -riscv-enable-sink-fold by default. (#82026)
AArch64 has had it enabled since late November, so hopefully the main
issues have been resolved.

I see a small reduction in dynamic instruction count on every benchmark
in specint2017. The best improvement was 0.3% so nothing amazing.
2024-02-22 09:07:21 -08:00
Craig Topper
7d40ea85d5 [RISCV] Enable the TypePromotion pass from AArch64/ARM.
This pass looks for unsigned icmps that have illegal types and tries
to widen the use/def graph to improve the placement of the zero
extends that type legalization would need to insert.

I've explicitly disabled it for i32 by adding a check for
isSExtCheaperThanZExt to the pass.

The generated code isn't perfect, but my data shows a net
dynamic instruction count improvement on spec2017 for both base and
Zba+Zbb+Zbs.
2024-02-13 09:57:48 -08:00
Jie Fu
ea4f44e85f [RISCV] Remove unused variable 'ST' in RISCVTargetMachine.cpp (NFC)
llvm-project/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp:358:27:
error: unused variable 'ST' [-Werror,-Wunused-variable]
    const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
                          ^
1 error generated.
2024-02-07 15:44:13 +08:00
Wang Pengcheng
cb7561ac5a
[Sched] Add MacroFusion mutation if fusions are not empty (#72227)
We can get the fusions list by `getMacroFusions` and if it is not
empty, then we will add the MacroFusion mutation automatically.
2024-02-07 15:38:02 +08:00
Wang Pengcheng
3fdb431b63
[RISCV] Use TableGen-based macro fusion (#72224)
We convert existed macro fusions to TableGen.
    
Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.
2024-01-25 17:10:49 +08:00
Wang Pengcheng
3ac9fe69f7
[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777)
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.

The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.

`RVE` can be combined with all current standard extensions.

The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.

If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.

To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.

FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.

Differential Revision: https://reviews.llvm.org/D70401
2024-01-16 20:44:30 +08:00
Alex Bradbury
84f7fb6217
[MachineScheduler] Add option to control reordering for store/load clustering (#75338)
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.

This patch adds a parameter that can control the behaviour on a
per-target basis.

Split out from #73789.
2024-01-16 07:17:41 +00:00
Wang Pengcheng
ea85345eb6
[RISCV][NFC] Use raw_svector_ostream to construct key of SubtargetMap (#72964)
To simplify some code.
2023-12-08 18:34:31 +08:00
Craig Topper
efc32f5e06 [RISCV] Use Triple::isRISCV64(). NFC 2023-12-07 18:21:20 -08:00