51 Commits

Author SHA1 Message Date
Jesse Huang
e5ad7f4556
[RISCV] Move RISCVIndirectBranchTracking before Branch Relaxation (#139993)
The `RISCVIndirectBranchTracking` pass inserts `lpad` instruction and
could change the basic block alignment, so this should not happen after
the branch relaxation as the adjusted offset is possible to exceed the
branch range.
2025-06-17 17:21:24 +08:00
Frederik Harwath
6962cf1700
Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128)
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.
2025-03-14 13:11:45 +01:00
Luke Lau
df96b56b9f
[RISCV] Move VMV0 elimination past machine SSA opts (#126850)
This is the follow up to #125026 that keeps mask operands in virtual
register form for as long as possible throughout the backend.

The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer
kicking in.

The invariant that the mask COPY never has a subreg no longer holds
after MachineCSE (it coalesces some copies), so it needed to be relaxed.
2025-02-20 12:41:05 +08:00
Luke Lau
cc7e83601d
[RISCV] Select mask operands as virtual registers and eliminate uses of vmv0 (#125026)
This is another attempt at #88496 to keep mask operands in SSA after
instruction selection.

Previously we selected the mask operands into vmv0, a singleton register
class with exactly one register, V0.

But the register allocator doesn't really support singleton register
classes and we ran into errors like "ran out of registers during
register allocation in function".

This avoids this by introducing a pass just before register allocation
that converts any use of vmv0 to a copy to $v0, i.e. what isel currently
does today.

That way the register allocator doesn't need to deal with the singleton
register class, but we get the benefits of having the mask registers in
SSA throughout the backend:

- This allows RISCVVLOptimizer to reduce the VLs of instructions that
define mask registers
- It enables CSE and code sinking in more places
- It removes the need to peek through mask copies in RISCVISelDAGToDAG
and keep track of V0 defs in RISCVVectorPeephole

This patch initially eliminates uses of vmv0s after RISCVVectorPeephole
to keep the diff to a minimum, and a follow up patch will move it past
the other MachineInstr SSA passes.

Note that it doesn't try to remove any defs of vmv0 as we shouldn't have
any instructions that have any vmv0 outputs.

As a further follow up, we can move the elimination pass to after phi
elimination and outside of SSA, which would unblock the pre-RA scheduler
around masked pseudos. This might also help the issue that
RISCVVectorMaskDAGMutation tries to solve.
2025-02-12 12:06:55 +08:00
dlav-sc
97982a8c60
[RISCV][CFI] add function epilogue cfi information (#110810)
This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret

After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret

This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.
2024-11-06 00:20:21 +03:00
Alex Bradbury
0ee10e9466
[RISCV] Add additional fence for amocas when required by recent ABI change (#101023)
A recent atomics ABI change / fix requires that for the "A6C" and A6S"
atomics ABIs (i.e. both of those supported by LLVM currently), an
additional fence is inserted for an atomic_compare_exchange with seq_cst
failure ordering.
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445>

This isn't trivial to support through the hooks used by AtomicExpandPass
because that pass assumes that when fences are inserted, the original
atomics ordering information can be removed from the instruction. Rather
than try to change and complicate that API, this patch implements the
needed fence insertion through a small special purpose pass.
2024-09-19 13:39:56 +01:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Yeting Kuo
e80d8e1b42
[RISCV] Insert simple landing pad before indirect jumps for Zicfilp. (#91860)
This patch is based on https://github.com/llvm/llvm-project/pull/91855.
This patch inserts simple landing pad
([pr])before indirct jumps. And this also make option
riscv-landing-pad-label influence this feature.
[pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417
2024-08-08 13:22:59 +08:00
Yeting Kuo
9fb196b469
[RISCV] Insert simple landing pad for taken address labels. (#91855)
This patch implements simple landing pad labels ([pr]). When Zicfilp
enabled, this patch inserts `lpad 0` at the beginning of basic blocks
which are possible to be landed by indirect jumps.
This patch also supports option riscv-landing-pad-label to make users
cpable to set nonzero fixed labels. Using nonzero fixed label force
setting t2 before indirect jumps. It's less portable but more strict
than original implementation.

[pr]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/417
2024-08-06 22:04:48 +08:00
Alexis Engelke
fa92d51f9e
[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652)
Similar to #97727; avoid an extra pass over the entire IR by performing
the lowering as part of the pre-isel-intrinsic-lowering pass.
2024-08-06 09:27:59 +02:00
Alexis Engelke
b5fc083dc3
[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727)
Currently, the LowerConstantIntrinsics pass does an RPO traversal of
every function... only to find that many functions don't have constant
intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is
already a pre-isel intrinsic lowering pass, which iterates over
intrinsic declarations and lowers all users. Call
lowerConstantIntrinsics from this pass to avoid the extra iteration over
the entire IR and the RPO traversal.
2024-08-01 17:44:32 +02:00
Philip Reames
8756043467 [RISCV] Teach RISCVInsertVSETVLI to work without LiveIntervals
(Reapplying with corrected commit message)

We recently moved RISCVInsertVSETVLI from before vector register allocation
to after vector register allocation.  When doing so, we added an unconditional
dependency on LiveIntervals - even at O0 where LiveIntevals hadn't previously
run.  As reported in #93587, this was apparently not safe to do.

This change makes LiveIntervals optional, and adjusts all the update code to
only run wen live intervals is present.  The only real tricky part of this
change is the abstract state tracking in the dataflow.  We need to represent
a "register w/unknown definition" state - but only when we don't have
LiveIntervals.

This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use.  Without
LiveIntervals, we  treat the definition of a register AVL as being unknown.

The key semantic change is that we now have a state in the lattice for which
something is known about the AVL value, but for which two identical lattice
elements do *not* neccessarily represent the same AVL value at runtime.
Previously, the only case which could result in such an unknown AVL was the
fully unknown state (where VTYPE is also fully unknown).  This requires a
small adjustment to hasSameAVL and lattice state equality to draw this
important distinction.

The net effect of this patch is that we remove the LiveIntervals dependency
at O0, and O0 code quality will regress for cases involving register AVL values.
In practice, this means we pessimize code written with intrinsics at O0.

This patch is an alternative to #93796 and #94340.  It is very directly
inspired by review conversation around them, and thus should be considered
coauthored by Luke.
2024-06-17 12:05:43 -07:00
Philip Reames
1d028151c9 Revert "[RISCV] Teach RISCVInsertVSETVLI to work without LiveIntervals (#94686)"
This reverts commit 111507ed4ce49bbb8cfbf36a3e143bb25f0f13c0.  Accidentally landed with stale commit message, will reply shortly.
2024-06-17 12:04:44 -07:00
Philip Reames
111507ed4c
[RISCV] Teach RISCVInsertVSETVLI to work without LiveIntervals (#94686)
Stacked on https://github.com/llvm/llvm-project/pull/94658.
    
We recently moved RISCVInsertVSETVLI from before vector register
allocation to after vector register allocation. When doing so, we added
an unconditional dependency on LiveIntervals - even at O0 where
LiveIntevals hadn't previously run. As reported in #93587, this was
apparently not safe to do.
    
This change makes LiveIntervals optional, and adjusts all the update
code to only run wen live intervals is present. The only real tricky
part of this change is the abstract state tracking in the dataflow. We
need to represent a "register w/unknown definition" state - but only
when we don't have LiveIntervals.
    
This adjust the abstract state definition so that the AVLIsReg state can
represent either a register + valno, or a register + unknown definition.
With LiveIntervals, we have an exact definition for each AVL use.
Without LiveIntervals, we treat the definition of a register AVL as
being unknown.
    
The key semantic change is that we now have a state in the lattice for
which something is known about the AVL value, but for which two
identical lattice elements do *not* necessarily represent the same AVL
value at runtime. Previously, the only case which could result in such
an unknown AVL was the fully unknown state (where VTYPE is also fully
unknown). This requires a small adjustment to hasSameAVL and lattice
state equality to draw this important distinction.
    
The net effect of this patch is that we remove the LiveIntervals
dependency at O0, and O0 code quality will regress for cases involving
register AVL values.
    
This patch is an alternative to
https://github.com/llvm/llvm-project/pull/93796 and
https://github.com/llvm/llvm-project/pull/94340. It is very directly
inspired by review conversation around them, and thus should be
considered coauthored by Luke.
2024-06-17 12:01:51 -07:00
Egor Pasko
cab81dd038
[EntryExitInstrumenter] Move passes out of clang into LLVM default pipelines (#92171)
Move EntryExitInstrumenter(PostInlining=true) to as late as possible and
EntryExitInstrumenter(PostInlining=false) to an early pre-inlining stage
(but skip for ThinLTO post-link).

This should fix the issues reported in
https://github.com/rust-lang/rust/issues/92109 and
https://github.com/llvm/llvm-project/issues/52853. These are caused
by https://reviews.llvm.org/D97608.
2024-05-31 12:48:45 -07:00
Luke Lau
1cff74130f
[RISCV] Merge RISCVCoalesceVSETVLI back into RISCVInsertVSETVLI (#92869)
We no longer need to separate the passes now that #70549 is landed and
this will unblock #89089.

It's not strictly NFC because it will move coalescing before register
allocation when -riscv-vsetvl-after-rvv-regalloc is disabled. But this
makes it closer to the original behaviour.
2024-05-29 20:59:34 +01:00
Nikita Popov
1579e9ca9c Revert "Run ObjCContractPass in Default Codegen Pipeline (#92331)"
This reverts commit 8cc8e5d6c6ac9bfc888f3449f7e424678deae8c2.
This reverts commit dae55c89835347a353619f506ee5c8f8a2c136a7.

Causes major compile-time regressions for unoptimized builds.
2024-05-24 08:14:26 +02:00
Nuri Amari
8cc8e5d6c6
Run ObjCContractPass in Default Codegen Pipeline (#92331)
Prior to this patch, when using -fthinlto-index= the ObjCARCContractPass isn't run prior to CodeGen, and instruction selection fails on IR containing arc intrinsics. This patch is motivated by that usecase.

The pass was previously added in various places codegen is performed. This patch adds the pass to the default codegen pipepline, makes sure it bails immediately if no arc intrinsics are found, and removes the adhoc scheduling of the pass. 

Co-authored-by: Nuri Amari <nuriamari@fb.com>
2024-05-23 10:04:55 -07:00
Piyou Chen
675e7bd1b9
[RISCV] Support postRA vsetvl insertion pass (#70549)
This patch try to get rid of vsetvl implict vl/vtype def-use chain and
improve the register allocation quality by moving the vsetvl insertion
pass after RVV register allocation

It will gain the benefit for the following optimization from

1. unblock scheduler's constraints by removing vl/vtype def-use chain
2. Support RVV re-materialization
3. Support partial spill

This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1|0>` to
control this feature and default set as disable.
2024-05-21 14:42:55 +08:00
Luke Lau
1a58e88690
[RISCV] Move RISCVInsertVSETVLI to after phi elimination (#91440)
Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi
elimination where we exit SSA and need to move to LiveVariables.

The motivation for splitting this off is to avoid the large scheduling
diffs from moving completely to after regalloc, and instead focus on
converting the pass to work on LiveIntervals.

The two main changes required are updating VSETVLIInfo to store VNInfos
instead of MachineInstrs, which allows us to still check for PHI defs in
needVSETVLIPHI, and fixing up the live intervals of any AVL operands
after inserting new instructions.

On O3 the pass is inserted after the register coalescer, otherwise we
end up with a bunch of COPYs around eliminated PHIs that trip up
needVSETVLIPHI.

Co-authored-by: Piyou Chen <piyou.chen@sifive.com>
2024-05-15 11:44:32 +08:00
Luke Lau
0ebe48f068
[RISCV] Move RISCVInsertVSETVLI after CSR/VXRM passes (#91701)
This further splits off #91440 to inch RISCVInsertVSETVLI closer to post
vector regalloc.

As noted in #91440, most of the diffs are from moving vsetvli insertion
after the vxrm/csr insertion passes, but these are getting conflated
with the changes from moving to LiveIntervals.

One idea was that we could try and remove some of these diffs by
manually moving back the vsetvlis past the vxrm/csr instructions. But
this meant having to touch up the LiveIntervals again which seemed to
lead to even more diffs.

This instead just moves RISCVInsertVSETVLI after RISCVInsertReadWriteCSR
and RISCVInsertWriteVXRM so we can isolate those changes.
2024-05-10 14:31:43 +08:00
Luke Lau
af82d01fbb Reapply "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)"
The original commit was calling shrinkToUses on an interval for a virtual
register whose def was erased. This fixes it by calling shrinkToUses first
and removing the interval if we erase the old VL def.
2024-04-25 00:42:30 +08:00
Luke Lau
fc13353e10 Revert "[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)"
Seems to cause an address sanitizer failure on one of the buildbots related
to live intervals.
2024-04-24 23:27:01 +08:00
Luke Lau
603ba4c596
[RISCV] Separate doLocalPostpass into new pass and move to post vector regalloc (#88295)
This patch splits off part of the work to move vsetvli insertion to post
regalloc in #70549.

The doLocalPostpass operates outside of RISCVInsertVSETVLI's dataflow,
so we can move it to its own pass. We can then move it to post vector
regalloc which should be a smaller change.

A couple of things that are different from #70549:

- This manually fixes up the LiveIntervals rather than recomputing it
via createAndComputeVirtRegInterval. I'm not sure if there's much of a
difference with either.
- For the postpass it's sufficient enough to just check isUndef() in
hasUndefinedMergeOp, i.e. we don't need to lookup the def in VNInfo.

Running on llvm-test-suite and SPEC CPU 2017 there aren't any changes in
the number of vsetvlis removed. There are some minor scheduling diffs as
well as extra spills and less spills in some cases (caused by transient
vsetvlis existing between RISCVInsertVSETVLI and RISCVCoalesceVSETVLI
when vec regalloc happens), but they are minor and should go away once
we finish moving the rest of RISCVInsertVSETVLI.

We could also potentially turn off this pass for unoptimised builds.
2024-04-24 16:31:40 +08:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Piyou Chen
d0a39e617b
[RISCV] default enable splitting regalloc between RVV and other (#72950)
This patch make riscv-split-regalloc as true by default. 

It will not affect the codegen result if it vector register allocation
doesn't exist. If there is the vector register allocation, it may affect
the non-rvv register LiveInterval's segment/weight. It will make the
allocation in a different order.
2023-11-30 21:12:46 -06:00
Craig Topper
014390d937
[RISCV] Implement cross basic block VXRM write insertion. (#70382)
This adds a new pass to insert VXRM writes for vector instructions. With
the goal of avoiding redundant writes.

The pass does 2 dataflow algorithms. The first is a forward data flow to
calculate where a VXRM value is available. The second is a backwards
dataflow to determine where a VXRM value is anticipated.

Finally, we use the results of these two dataflows to insert VXRM writes
where a value is anticipated, but not available.

The pass does not split critical edges so we aren't always able to
eliminate all redundancy.

The pass will only insert vxrm writes on paths that always require it.
2023-11-02 14:09:27 -07:00
Craig Topper
109aa586f0
[RISCV] Add an experimental pseudoinstruction to represent a rematerializable constant materialization sequence. (#69983)
Rematerialization during register allocation is currently limited to a
single instruction with no inputs.

This patch introduces a pseudoinstruction that represents the
materialization of a constant. I've started with a sequence of 2
instructions for now, which covers at least the common LUI+ADDI(W) case.
This instruction will be expanded into real instructions immediately
after register allocation using a new pass. This gives the post-RA
scheduler a chance to separate the 2 instructions to improve ILP.

I believe this matches the approach used by AArch64.

Unfortunately, this loses some CSE opportunies when an LUI value is used
by multiple constants with different LSBs.

This feature is off by default and a new backend command line option is
added to enable it for testing.

This avoids the spill and reloads reported in #69586.
2023-10-25 17:20:32 -07:00
Philip Reames
a63bd7e99b [RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru operands
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.

This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.

We may end up backporting this into the 17.x release branch.  Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.

Differential Revision: https://reviews.llvm.org/D156909
2023-08-14 12:57:38 -07:00
Sami Tolvanen
83835e22c7 [RISCV] Implement KCFI operand bundle lowering
With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits
"kcfi" operand bundles to indirect call instructions. Similarly to
the target-specific lowering added in D119296, implement KCFI operand
bundle lowering for RISC-V.

This patch disables the generic KCFI pass for RISC-V in Clang, and
adds the KCFI machine function pass in `RISCVPassConfig::addPreSched`
to emit target-specific `KCFI_CHECK` pseudo instructions before calls
that have KCFI operand bundles. The machine function pass also bundles
the instructions to ensure we emit the checks immediately before the
calls, which is not possible with the generic pass.

`KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a
contiguous code sequence that traps if the expected hash in the
operand bundle doesn't match the hash before the target function
address. This patch emits an `ebreak` instruction for error handling
to match the Linux kernel's `BUG()` implementation. Just like for X86,
we also emit trap locations to a `.kcfi_traps` section to support
error handling, as we cannot embed additional information to the trap
instruction itself.

Relands commit 62fa708ceb027713b386c7e0efda994f8bdc27e2 with fixed
tests.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D148385
2023-06-23 22:57:56 +00:00
Sami Tolvanen
e809ebeb6c Revert "[RISCV] Implement KCFI operand bundle lowering"
This reverts commit 62fa708ceb027713b386c7e0efda994f8bdc27e2.

Reverting to investigate -verify-machineinstrs errors in MIR tests.
2023-06-23 21:42:57 +00:00
Sami Tolvanen
62fa708ceb [RISCV] Implement KCFI operand bundle lowering
With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits
"kcfi" operand bundles to indirect call instructions. Similarly to
the target-specific lowering added in D119296, implement KCFI operand
bundle lowering for RISC-V.

This patch disables the generic KCFI pass for RISC-V in Clang, and
adds the KCFI machine function pass in `RISCVPassConfig::addPreSched`
to emit target-specific `KCFI_CHECK` pseudo instructions before calls
that have KCFI operand bundles. The machine function pass also bundles
the instructions to ensure we emit the checks immediately before the
calls, which is not possible with the generic pass.

`KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a
contiguous code sequence that traps if the expected hash in the
operand bundle doesn't match the hash before the target function
address. This patch emits an `ebreak` instruction for error handling
to match the Linux kernel's `BUG()` implementation. Just like for X86,
we also emit trap locations to a `.kcfi_traps` section to support
error handling, as we cannot embed additional information to the trap
instruction itself.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D148385
2023-06-23 18:25:24 +00:00
eopXD
7c8365121a [2/3][RISCV][POC] Model vxrm in LLVM intrinsics and machine instructions for RVV fixed-point instructions
Depends on D151395.

This is the 2nd patch of the patch-set. For the cover letter of the
patch-set, please checkout D151395. This patch originates from
D121376.

This commit models vxrm by adding an immediate operand into intrinsics
and machine instructions of RVV fixed-point instruction `vaadd`,
`vaaddu`, `vasub`, and `vasubu`. This commit only covers intrinsics of
the four instructions, the proceeding patches of the patch-set will do
the same to other RVV fixed-point instructions.

The current naiive approach is to have a write to vxrm inserted before
every fixed-point instruction. This is done by the new added pass
`RISCVInsertReadWriteCSR`. The reason to name the pass in a more general
term is because we will also model rounding mode for the RVV floating-
point instructions. The approach will be improved in the future,
implementing partial redundancy elimination algorithms to it.

The original LLVM intrinsics and machine instructions, take `vaadd` as
an example, does not model the rounding mode is not removed in this
patch. That is, `int.riscv.vaadd.*` co-exists with
`int.riscv.vaadd.rm.*` after this patch. The next patch will add C
intrinsics of vaadd with an additional operand that models the control
of the rounding mode, in this patch, `int.riscv.vaadd.rm.*` will
replace `int.riscv.vaadd.*`.

Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151396
2023-06-20 11:07:01 -07:00
Craig Topper
13fe673301 [RISCV] Move NTLH hint emission into RISCVAsmPrinter.cpp.
Rather than having a separate pass to add the hint instructions,
emit them directly into the streamer during asm printing.

Reviewed By: BeMg, kito-cheng

Differential Revision: https://reviews.llvm.org/D149511
2023-05-01 12:05:18 -07:00
Piyou Chen
8d7c865c2e [RISCV] Support __builtin_nontemporal_load/store by MachineMemOperand
Differential Revision: https://reviews.llvm.org/D143361
2023-04-05 22:57:49 -07:00
Craig Topper
0f4c9c016c [RISCV] Replace RISCV->RISC-V in strings.
To be consistent with RISC-V branding guidelines
https://riscv.org/about/risc-v-branding-guidelines/
Think we should be using RISC-V where possible.

D146449 already updated comments. Strings may have more user impact.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D146451
2023-03-27 09:50:17 -07:00
Nick Desaulniers
a3a84c9e25 [llvm] add CallBrPrepare pass to pipelines
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: https://github.com/llvm/llvm-project/issues/53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
2023-02-16 17:58:34 -08:00
OCHyams
99c12afeb4 [Assignment Tracking] Fix tests for buildbot failure (2)
Follow-up for 4ece50737d5385fb80cfa23f5297d1111f8eed39 (D142027).

Assignment Tracking Analysis now always runs and is skipped internally if
assignment tracking is disabled. Update these tests to expect to see the
pass run.

Buildbot failure: https://lab.llvm.org/buildbot/#/builders/57/builds/24094
2023-01-20 15:58:35 +00:00
Paul Kirth
557a5bc336 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-19 01:51:14 +00:00
Paul Kirth
fdc0bf6adc Revert "[codegen] Add StackFrameLayoutAnalysisPass"
This breaks on some AArch64 bots

This reverts commit 0a652c540556a118bbd9386ed3ab7fd9e60a9754.
2023-01-13 22:59:36 +00:00
Paul Kirth
0a652c5405 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-13 20:52:48 +00:00
Dmitry Vyukov
dbe8c2c316 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-12-05 14:40:31 +01:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Marco Elver
b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc8b69dda70bf5f999c5b39dc314dd73.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov
d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Matthias Gehre
af3758d678 Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" 2022-09-06 16:38:43 +01:00
Craig Topper
e07a8155f5 [RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc.
addMachineSSAOptimization is skipped for -O0, but this pass is
required for -O0.
2022-08-01 13:44:43 -07:00
Craig Topper
ee6267c443 [RISCV] Remove Gather/Scatter Opt from the O0 pipeline. 2022-07-17 10:58:33 -07:00
eopXD
2cadf84fc8 [RISCV] Pass OptLevel to RISCVDAGToDAGISel correctly
Originally, `OptLevel` isn't passed into the `MachineFunctionPass`.
This lets the default parameter of `SelectionDAGISel`, which is
`CodeGenOpt::Default`, be passed in. OptLevelChanger captures the
optimization level with the parameter, and rather not the value
within `TargetMachine`. This lets the optimization be
unintentionally overwriten if other value than `CodeGenOpt::Default`
passed.

This patch fixes this by passing the optimization level rather
than using the default value.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126641
2022-05-30 17:22:50 -07:00
Lewis Revill
29a5a7c6d4 [RISCV] Add pre-emit pass to make more instructions compressible
When optimizing for size, this pass searches for instructions that are
prevented from being compressed by one of the following:

1. The use of a single uncompressed register.
2. A base register + offset where the offset is too large to be
   compressed and the base register may or may not already be compressed.

In the first case, if there is a compressed register available, then the
uncompressed register is copied to the compressed register and its uses
replaced. This is only done if there are enough uses that code size
would be improved.

In the second case, if a compressed register is available, then the
original base register is copied and adjusted such that:

new_base_register = base_register + adjustment
base_register + large_offset = new_base_register + small_offset

and the uses of the base register are replaced with the new base
register. Again this is only done if there are enough uses for code size
to be improved.

This pass was authored by Lewis Revill, with large offset optimization
added by Craig Blackmore.

Differential Revision: https://reviews.llvm.org/D92105
2022-05-25 09:25:02 +01:00