5717 Commits

Author SHA1 Message Date
Sudharsan Veeravalli
c4645ffeda
[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169)
The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and
write CSRs.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.
2024-11-28 12:46:15 +05:30
Pengcheng Wang
d36a4c0715
[RISCV] Rename some Feature* to Tune* (#117966)
These features should be tune features.
2024-11-28 15:01:49 +08:00
Craig Topper
80afdbe6a5 [RISCV] Use RISCVSubtarget::is64Bit() instead of hasFeature(RISCV::Feature64Bit). NFC 2024-11-27 14:02:15 -08:00
Philip Reames
c6f2d35c4d Fix a build warning introduce by my febbf910 2024-11-27 13:41:29 -08:00
Felipe Magno de Almeida
e3fdc3aa81
[RISCV] Allow hoisting VXRM writes out of loops speculatively (#110044)
Change the intersect for the anticipated algorithm to ignore unknown
when anticipating. This effectively allows VXRM writes speculatively
because it could do a VXRM write even when there's branches where VXRM
is unneeded.

The importance of this change is because VXRM writes causes pipeline
flushes in some micro-architectures and so it makes sense to allow more
aggressive hoisting even if it causes some degradation for the slow
path.

An example is this code:
```
typedef unsigned char uint8_t;
__attribute__ ((noipa))
void foo (uint8_t *dst,  int i_dst_stride,
           uint8_t *src1, int i_src1_stride,
           uint8_t *src2, int i_src2_stride,
           int i_width, int i_height )
{
   for( int y = 0; y < i_height; y++ )
     {
       for( int x = 0; x < i_width; x++ )
         dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
       dst  += i_dst_stride;
       src1 += i_src1_stride;
       src2 += i_src2_stride;
     }
}
```
With this patch, the code above generates a hoisting VXRM writes out of
the outer loop.
2024-11-27 13:31:39 -08:00
Philip Reames
febbf9105f
[RISCV] Match vcompress during shuffle lowering (#117748)
This change matches a subset of vcompress patterns during shuffle
lowering. The subset implemented requires a contiguous prefix of
demanded elements followed by undefs. This subset was chosen for two
reasons: 1) which elements to spurious demand is a non-obvious problem,
and 2) my first several attempts at implementing the general case were
buggy. I decided to go with the simple case to start with.

vcompress scales better with LMUL than a general vrgather, and at least
the SpaceMit X60, has higher throughput even at m1. It also has the
advantage of requiring smaller vector constants at one bit per element
as opposed to vrgather which is a minimum of 8 bits per element. The
downside to using vcompress is that we can't fold a vselect into it, as
there is no masked vcompress variant.

For reference, here are the relevant throughputs from camel-cdr's data
table on BP3 (X60):
  vrgather.vv v8,v16,v24    4.0  16.0  64.0  256.0
  vcompress.vm v8,v16,v24   3.0  10.0  36.0  136.
  vmerge.vvm v8,v16,v24,v0  2.0  4.0   8.0   16.0

The largest concern with the extra vmerge is that we locally increase
register pressure. If we do have masking, we also have a passthru,
without the ability to fold that into the vcompress, we need to keep it
alive a bit longer. This can hurt at e.g. m8 where we have very few
architectural registers. As compared with the vrgather.vv sequence, this
is only one additional m1 VREG - since we no longer need the index
vector. It compares slightly worse against vrgatherie16.vv which can use
index vectors smaller than other operands. Note that we could
potentially fold the vmerge if only tail elements are being preserved; I
haven't investigated this.

It is unfortunately hard given our current lowering structure to know if
we're emitting a shuffle where masking will follow. Thankfully, it
doesn't seem to show up much in practice, so I think we can probably
ignore it.

This patch only handles single source compress idioms at the moment.
This is an effort to avoid interacting with other patches on review for
changing how we canonicalize length changing shuffles.
2024-11-27 13:23:18 -08:00
Craig Topper
175051b05e [RISCV][GISel] Support libcalls for f32/f64 acos/asin/atan/atan2/cosh/sinh/tanh. 2024-11-27 12:23:12 -08:00
Craig Topper
d7643e8610 [RISCV][GISel] Support f32/f64 llvm.exp10 intrinsics. 2024-11-27 10:24:33 -08:00
Craig Topper
50dfb0772b [RISCV] Support f32/f64 libcalls for sin/cos/pow/log/log2/log10/exp/exp2
Test cases copied from SelectionDAG.
2024-11-26 23:35:52 -08:00
Brandon Wu
4a7dbede6b
[RISCV] Support svukte extension (#115657)
This is the extension for "Address-Independent Latency of User-Mode
Faults to Supervisor Addresses".
Spec: https://github.com/riscv/riscv-isa-manual/pull/1564,
https://lf-riscv.atlassian.net/browse/RVS-2977
The spec states that the `svukte` depends on `sv39`, but we don't have
`sv39` yet, so I didn't add it to the implied list.
2024-11-27 10:54:57 +08:00
Craig Topper
43b6b78771
[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660)
LegalizerHelp only supported f128 libcalls and incorrectly assumed that
the destination register for the G_FCMP was s32.
2024-11-26 15:48:49 -08:00
Mark Goncharov
80df56e03b
Reapply "[RISCV] Implement tail call optimization in machine outliner" (#117700)
This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`.

It was failed due to previously merged commit `[TTI][RISCV]
Unconditionally break critical edges to sink ADDI (PR #108889)`.

So, regenerated `compress-opt-select` test.
2024-11-26 23:39:45 +08:00
Mehdi Amini
f94bd3c933
Revert "[RISCV] Implement tail call optimization in machine outliner" (#117710)
Reverts llvm/llvm-project#115297
Bots are broken
2024-11-26 13:45:47 +01:00
Mark Goncharov
29062329f3
[RISCV] Implement tail call optimization in machine outliner (#115297)
Following up issue #89822, this patch adds opportunity to use tail call
in machine outliner pass.
Also it enables outline patterns with X5(T0) register.
2024-11-26 12:30:37 +03:00
LiqinWeng
c3377af4c3
[RISCV][CostModel] add cost for cttz/ctlz under the non-zvbb (#117515) 2024-11-26 11:40:52 +08:00
Philip Reames
6657d4bd70
[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889)
This looks like a rather weird change, so let me explain why this isn't
as unreasonable as it looks. Let's start with the problem it's solving.

```
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
  %i = icmp eq i32 %arg1, 1
  br i1 %i, label %bb2, label %bb5

bb2:                                              ; preds = %bb
  %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
  %i4 = load i32, ptr %i3, align 4
  br label %bb5

bb5:                                              ; preds = %bb2, %bb
  %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
  ret i32 %i6
}
```

Right now, we codegen this as:

```
	li	a3, 1
	li	a2, 13
	bne	a1, a3, .LBB0_2
	lw	a2, 4(a0)
.LBB0_2:
	mv	a0, a2
	ret
```

In this example, we have two values which must be assigned to a0 per the
ABI (%arg, and the return value). SelectionDAG ensures that all values
used in a successor phi are defined before exit the predecessor block.
This creates an ADDI to materialize the immediate in the entry block.

Currently, this ADDI is not sunk into the tail block because we'd have
to split a critical edges to do so. Note that if our immediate was
anything large enough to require two instructions we *would* split this
critical edge.

Looking at other targets, we notice that they don't seem to have this
problem. They perform the sinking, and tail duplication that we don't.
Why? Well, it turns out for AArch64 that this is entirely an accident of
the existance of the gpr32all register class. The immediate is
materialized into the gpr32 class, and then copied into the gpr32all
register class. The existance of that copy puts us right back into the
two instruction case noted above.

This change essentially just bypasses this emergent behavior aspect of
the aarch64 behavior, and implements the same "always sink immediates"
behavior for RISCV as well.
2024-11-25 18:59:31 -08:00
Pengcheng Wang
6633916ef5
[RISCV] Remove getPostRAMutations (#117527)
We are using `PostMachineScheduler` instead of `PostRAScheduler`
since #68696.

The hook `getPostRAMutations` is only used in `PostRAScheduler` so
it is actually dead code for RISC-V now.
2024-11-26 10:55:43 +08:00
LiqinWeng
dd7aabf7c0
[TTI][RISCV] Deduplicate type-based VP costing of vpcmp/vpcast (#117520)
Refered to: https://github.com/llvm/llvm-project/pull/115983
2024-11-26 10:49:24 +08:00
Craig Topper
c2bb056482
[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix integer promotion of STRICT_FLDEXP in type legalizer. (#117633)
A special case in type legalization wasn't accounting for different
operand numbering between FLDEXP and STRICT_FLDEXP.

AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for
it.
2024-11-25 16:12:45 -08:00
Kazu Hirata
8e510b8472 [RISCV] Fix a warning
This patch fixes:

  llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp:476:25: error: unused
  variable 'ST' [-Werror,-Wunused-variable]
2024-11-25 12:57:05 -08:00
Philip Reames
d733fa1c90
[RISCV] Consolidate VLS codepaths in stack frame manipulation [nfc] (#117605)
We can move the logic from adjustStackForRVV into adjustReg, which
results in the remaining logic being trivially inlined to the two
callers and allows a duplicate copy of the same logic in
eliminateFrameIndex to be pruned.
2024-11-25 12:40:37 -08:00
Craig Topper
ed6749a405 [RISCV] Promote frexp with Zfh.
The default expansion tries to create an illegal integer type after
legalization.
2024-11-25 10:27:37 -08:00
Craig Topper
29828b26fa
[RISCV] Fix double counting scalar CSRs with Zcmp when emitting cfi_offset for RVV CSRs. (#117408)
getCalleeSavedStackSize() already contains RVPushStackSize. Don't
subtract it again.
2024-11-25 10:03:48 -08:00
Raphael Moreira Zinsly
d88ed9357a
[NFC][RISCV] Refactor allocation of the stack space (#116625)
Separates the stack allocations from prologue in preparation for the
stack clash protection support.
2024-11-25 09:36:15 -08:00
Craig Topper
20bd029a40
[RISCV] Promote fldexp with Zfh. (#117396)
The default expansion tries to create i16 operations after type
legalization.

Fixes #117349
2024-11-25 09:08:56 -08:00
David Sherwood
9b76e7fc60
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
2024-11-25 13:49:21 +00:00
Luke Lau
15fadeb2aa
[RISCV] Add cost for @llvm.experimental.vp.splat (#117313)
This is split off from #115274. There doesn't seem to be an easy way to
share this with getShuffleCost since that requires passing in a real
insert_element operand to get it to recognise it's a scalar splat.

For i1 vectors we can't currently lower them so it returns an invalid
cost.

---------

Co-authored-by: Shih-Po Hung <shihpo.hung@sifive.com>
2024-11-25 11:28:46 +01:00
LiqinWeng
db14010405
[RISCV][TTI] Implement cost of intrinsic abs with LMUL (#115813) 2024-11-25 17:35:58 +08:00
David Sherwood
22ec44f509
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:

  %icmp = icmp ult <4 x i32> %a, splat (i32 5)
  %res = extractelement <4 x i1> %icmp, i32 1

where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into

  %ext = extractelement <4 x i32> %a, i32 1
  %res = icmp ult i32 %ext, 5

For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.

NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file

CodeGen/AArch64/extract-vector-cmp.ll

because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
2024-11-25 09:25:01 +00:00
Craig Topper
3fb0bea859 [RISCV][GISel] Add register class to some isel output patterns so they can be imported.
This makes (fcopysign X, (fneg Y)) patterns work.
2024-11-24 19:29:52 -08:00
hev
e26af0938c
[llvm] Add BasicTTIImpl::areInlineCompatible for target feature subset checks (#117493)
This patch moves the `areInlineCompatible` implementation from multiple
subclasses (`AArch64TTIImpl`, `RISCVTTIImpl`, `WebAssemblyTTIImpl`) to
the base class `BasicTTIImpl`. The new implementation checks whether the
callee's target features are a subset of the caller's, enabling
consistent behavior across targets. Subclasses now simply delegate to
the base implementation, reducing code duplication and improving
maintainability.
2024-11-25 11:22:49 +08:00
Craig Topper
bb5bbe523d [RISCV][GISel] Support s32/s64 G_FSUB/FDIV/FNEG without F/D extensions.
Use libcalls for G_FSUB/FDIV. Use integer operations for G_FNEG.

Copy most of the IR tests for arithmetic from SelectionDAG.
2024-11-24 18:22:12 -08:00
LiqinWeng
48b13ca48b
[RISCV][CostModel] cost of vector cttz/ctlz under ZVBB (#115800) 2024-11-24 09:18:18 +08:00
Craig Topper
213b849c5e [RISCV][GISel] Use libcalls for some FP instructions when F/D aren't present.
This is based on what fails when adding integer only RUN lines to
float-intrinsics.ll and double-intrinsics.ll.

We're still missing a lot of test cases that SelectionDAG has. These
will be added in future patches.
2024-11-23 11:43:14 -08:00
Alexey Bataev
7523086a05
[SLP]Use getExtendedReduction cost and fix reduction cost calculations
Patch uses getExtendedReduction for reductions of ext-based nodes + adds
cost estimation for ctpop-kind reductions into basic implementation and
RISCV-V specific vcpop cost estimation.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/117350
2024-11-22 16:12:53 -05:00
Pengcheng Wang
4da960b898 [RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202)
We can get these information via `sys_riscv_hwprobe`.

This can be used to implement `__builtin_cpu_is`.
2024-11-22 22:58:54 +08:00
Mikhail Goncharov
d1dae1e861 Revert "[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202)" chain
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.

multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076
2024-11-22 14:09:13 +01:00
Pengcheng Wang
775148f236
[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202)
We can get these information via `sys_riscv_hwprobe`.

This can be used to implement `__builtin_cpu_is`.
2024-11-22 19:54:45 +08:00
Craig Topper
f84fc44f1a [RISCV][GISel] Make s16->s32 G_ANYEXT/SEXT/ZEXT legal. 2024-11-21 22:45:25 -08:00
Jim Lin
bd15c7c1ca
[RISCV] Make A implies Zaamo and Zalrsc (#116907)
Ref:
https://github.com/riscv/riscv-isa-manual/blob/main/src/a-st-ext.adoc.
2024-11-22 10:35:38 +08:00
Craig Topper
8e65b72691
[RISCV] Fix double counting CSRs with Zcmp in RISCVFrameLowering::getFrameIndexReference. (#117207)
The Zcmp callee saved registers are already accounted for in
getCalleeSavedStackSize(). Subtracting RVPushStackSize subtracts
them a second time leading to incorrect stack offsets during frame
index elimination.
    
This should have been removed in
0de2b26942f890a6ec84cd75ac7abe3f6f2b2e37
when Zcmp handling was changed. Prior to that, RVPushStackSize was
not included in getCalleeSavedStackSize(). The commit message at the
time noted that Zcmp+RVV was likely broken.
2024-11-21 13:53:15 -08:00
Craig Topper
29afbd5893
[RISCV] Add DAG combine to convert (iX ctpop (bitcast (vXi1 A))) into vcpop.m. (#117062)
This only handles the simplest case where vXi1 is a legal vector type.
If the vector type isn't legal we need to go through type legalization,
but the pattern gets much harder to recognize after that. Either because
ctpop gets expanded due to Zbb not being enabled, or the bitcast
becoming a bitcast+extractelt, or the ctpop being split into multiple
ctpops and adds, etc.
2024-11-21 11:12:07 -08:00
Min-Yih Hsu
0165f8817c
[RISCV] Fix the worst case for VSHA2MS in SiFive P400/P600 scheduling models (#116893)
For each RVV instruction we should have a single WriteRes assignment to
the worst case scheduling class. This assignment is usually equal to
that of the largest LMUL + smallest SEW. My #114317 accidentally made
two of these assignments on `WriteVSHA2MSV_WorstCase`. This won't affect
our MachineScheduler nor most of our llvm-mca use cases (assuming you
populate the correct LMUL and SEW), yet it's not ideal either.

This patch fixes this issue by assigning the correct numbers and
resource mapping to `WriteVSHA2MSV_WorstCase`, which is equal to that of
largest LMUL + _largest_ SEW (Zvknh's scheduling properties are
special). I also added a MCA test to make sure we always pick up the
correct worst case numbers for P600's scheduling model.

Original issue was reported by @reidtatge
2024-11-21 10:59:46 -08:00
Craig Topper
cdd1e27124
[X86][RISCV] Don't emit JumpTableDebugInfo unless triple is OSBinFormatCOFF. (#117083)
This makes the override in RISCV and X86 consistent with the base class
implementation of expandIndirectJTBranch.
2024-11-21 09:38:16 -08:00
Craig Topper
e9c561e934 [RISCV][GISel] Add atomic load/store test. Add additional atomic load/store isel patterns." 2024-11-20 22:23:18 -08:00
Craig Topper
4087b871c5
[RISCV][GISel] Move G_BRJT expansion to legalization (#73711)
Instead of custom selecting a bunch of instructions, we can expand to
generic MIR during legalization.
2024-11-20 13:43:36 -08:00
Sam Elliott
408659c5b5
[RISCV] Merge GPRPair and GPRF64Pair (#116094)
As suggested by Craig, this tries to merge the two sets of register
classes created in #112983, GPRPair* and GPRF64Pair*.

- I added some explicit annotations to `RISCVInstrInfoD.td` which fixed
the type inference issues I was seeing from tablegen for select
patterns.
- I've had to make the behaviour of `splitValueIntoRegisterParts` and
`joinRegisterPartsIntoValue` cover more cases, because you cannot
bitcast to/from untyped (the bitcast would otherwise have been inserted
automatically by TargetLowering code).
- I apparently didn't need to change `getNumRegisters` again, which
continues to tell me there's a bug in the code for tied inputs. I added
some more test coverage of this case but it didn't seem to help find the
asserts I was finding before - I think the difference is between the
default behaviour for integers which doesn't apply to floats.
- There's still a difference between BuildGPRPair and BuildPairF64 (and
the same for SplitGPRPair and SplitF64). I'm not happy with this, I
think it's quite confusing, as they're very similar, just differing in
whether they give a `untyped` or a `f64`. I haven't really worked out
how the DAGCombiner copes if one meets the other, I know we have some of
this for the f64 variants already, but they're a lot more complex than
the GPRPair variants anyway.
2024-11-20 10:08:55 +00:00
Craig Topper
2bf6751522 [RISCV] Add IsRV32 some patterns in RISCVInstrInfoXTHead.td.
This restores the code to its original state before I experimented
with making i32 a legal type.
2024-11-19 21:41:14 -08:00
Petr Penzin
41c86ca714
[RISCV] Add TT-Ascalon-d8 processor (#115100)
Ascalon is an out-of-order CPU core from Tenstorrent. Overview:
https://tenstorrent.com/ip/tt-ascalon

Adding 8-wide version, -mcpu=tt-ascalon-d8. Scheduling model will be
added in a separate PR.

---------

Co-authored-by: Anton Blanchard <antonb@tenstorrent.com>
2024-11-19 14:20:55 -08:00
Craig Topper
eff60d83b0 [RISCV][GISel] Make extended loads and truncating stores with s16 register type and s8 memory type legal.
This addresses some failures I've seen in testing on real code.
2024-11-19 11:57:35 -08:00