5923 Commits

Author SHA1 Message Date
yingopq
754ed95b66
[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525)
…on returning { i8, i128 }

Fixes https://github.com/llvm/llvm-project/issues/96432.
2025-01-20 16:47:40 +08:00
Patryk Wychowaniec
814b34f31e
[AVR] Force relocations for non-encodable jumps (#121498)
This commit changes the branch emission logic so that instead of
throwing the "branch target out of range" error, we emit a relocation
instead.
2025-01-20 09:23:57 +08:00
Philip Reames
143c33c6df
[RISCV] Consider only legally typed splats to be legal shuffles (#123415)
Given the comment, I'd expected test coverage. There was none so let's
do the simple thing which benefits the one thing we have tests for.
2025-01-17 19:13:04 -08:00
Craig Topper
0c6e03eea0
[RISCV] Fold vp.store(vp.reverse(VAL), ADDR, MASK) -> vp.strided.store(VAL, NEW_ADDR, -1, MASK) (#123123)
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
2025-01-17 14:22:25 -08:00
Luke Lau
a761e26b23
[RISCV] Allow non-loop invariant steps in RISCVGatherScatterLowering (#122244)
The motivation for this is to allow us to match strided accesses that
are emitted from the loop vectorizer with EVL tail folding (see #122232)

In these loops the step isn't loop invariant and is based off of
@llvm.experimental.get.vector.length.

We can relax this as long as we make sure to construct the updates after
the definition inside the loop, instead of the preheader.

I presume the restriction was previously added so that the step would
dominate the insertion point in the preheader. I can't think of why it
wouldn't be safe to calculate it in the loop otherwise.
2025-01-17 08:58:56 +08:00
Philip Reames
bb6e94a05d
[RISCV] Custom legalize <N x i128>, <4 x i256>, etc.. shuffles (#122352)
I have a particular user downstream who likes to write shuffles in terms
of unions involving _BitInt(128) types. This isn't completely crazy
because there's a bunch of code in the wild which was written with SSE
in mind, so 128 bits is a common data fragment size.

The problem is that generic lowering scalarizes this to ELEN, and we end
up with really terrible extract/insert sequences if the i128 shuffle is
between other (non-i128) operations.

I explored trying to do this via generic lowering infrastructure, and
frankly got lost. Doing this a target specific DAG is a bit ugly -
really, there's nothing hugely target specific here - but oh well. If
reviewers prefer, I could probably phrase this as a generic DAG combine,
but I'm not sure that's hugely better. If reviewers have a strong
preference on how to handle this, let me know, but I may need a bit of
help.

A couple notes:

* The argument passing weirdness is due to a missing combine to turn a
build_vector of adjacent i64 loads back into a vector load. I'm a bit
surprised we don't get that, but the isel output clearly has the
build_vector at i64.
* The splat case I plan to revisit in another patch. That's a relatively
common pattern, and the fact I have to scalarize that to avoid an
infinite loop is non-ideal.
2025-01-16 14:55:45 -08:00
Raphael Moreira Zinsly
01d7f434d2
[RISCV] Stack clash protection for dynamic alloca (#122508)
Create a probe loop for dynamic allocation and add the corresponding
SelectionDAG support in order to use it.
2025-01-16 11:58:42 -08:00
Craig Topper
fc7a1ed0ba
[RISCV] Fold vp.reverse(vp.load(ADDR, MASK)) -> vp.strided.load(ADDR, -1, MASK). (#123115)
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
2025-01-16 08:20:17 -08:00
Luke Lau
437e1a70ca
[RISCV][VLOPT] Handle tied pseudos in getOperandInfo (#123170)
For .wv widening instructions when checking if the opperand is vs1 or
vs2, we take into account whether or not it has a passthru. For tied
pseudos though their passthru is the vs2, and we weren't taking this
into account.
2025-01-16 23:00:13 +08:00
Luke Lau
ec5d17b587 [RISCV] Explicitly check for passthru in doPeepholeMaskedRVV. NFC
We were previously checking a combination of the vector policy op and
the opcode to determine if we needed to skip copying the passthru from
a masked pseudo to an unmasked pseudo.

However we can just do this by checking
RISCVII::isFirstDefTiedToFirstUse, which is a proxy for whether or not
a pseudo has a passthru operand.

This should hopefully remove the need for the changes in #123106
2025-01-16 11:28:05 +08:00
Guy David
1a935d7a17
[llvm] Mark scavenging spill-slots as *spilled* stack objects. (#122673)
This seems like an oversight when copying code from other backends.
2025-01-14 10:18:31 +02:00
Piotr Fusik
cfe5a0847a
[RISCV] Enable Zbb ANDN/ORN/XNOR for more 64-bit constants (#122698)
This extends PR #120221 to 64-bit constants that don't match
the 12-low-bits-set pattern.
2025-01-14 09:15:14 +01:00
Luke Lau
ffe5cddb68
[RISCV] Support vp.{gather,scatter} in RISCVGatherScatterLowering (#122232)
This adds support for lowering llvm.vp.{gather,scatter}s to
experimental.vp.strided.{load,store}.

This will help us handle strided accesses with EVL tail folding that are
emitted from the loop vectorizer, but note that it's still not enough.
We will also need to handle the vector step not being loop-invariant
(i.e. produced by @llvm.experimental.vector.length) in a future patch.
2025-01-14 12:51:01 +08:00
Alexey Bataev
bab7920fd7
[RISCV][CG]Use processShuffleMasks for per-register shuffles
Patch adds usage of processShuffleMasks in in codegen
in lowerShuffleViaVRegSplitting. This function is already used for X86
shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE
functions, unifies the code.

Reviewers: topperc, wangpc-pp, lukel97, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/121765
2025-01-13 17:06:25 -05:00
Michael Maitland
e44f03dd4e
[RISCV][VLOPT] Add floating point widening and narrowing bf16 convert support (#122353)
We already have getOperandInfo tests that cover this instruction.
2025-01-13 15:38:03 -05:00
quic_hchandel
171d3edd05
[RISCV] Add Qualcomm uC Xqciint (Interrupts) extension (#122256)
This extension adds eleven instructions to accelerate interrupt
servicing.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.

---------

Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
2025-01-13 16:36:05 +05:30
Craig Topper
7979e1ba29 [RISCV] Add a default assignment of Inst{12-7} to RVInst16CSS. NFC
Some bits need to be overwritten by child classes, but at
least a few of the upper bits are common to all child classes.
2025-01-10 14:28:54 -08:00
Raphael Moreira Zinsly
6f53886a9a
[RISCV] Add stack clash vector support (#119458)
Use the probe loop structure to allocate vector code in the stack as
well. We add the pseudo instruction RISCV::PROBED_STACKALLOC_RVV to
differentiate from the normal loop.
2025-01-10 09:48:21 -08:00
Philip Reames
24bb180e8a
[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311)
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionally when the mask
allows is slightly questionable, but seems to work out okay in practice.

As a bit of context, it's helpful to realize that we have existing logic
in both DAGCombine and InstCombine which mutates the element width of in
an analogous manner. However, that code has two restriction which
prevent it from handling the motivating cases here. First, it only
triggers if there is a bitcast involving a different element type.
Second, the matcher used considers a partially undef wide element to be
a non-match. I considered trying to relax those assumptions, but the
information loss for undef in mid-level opt seemed more likely to open a
can of worms than I wanted.
2025-01-10 07:12:24 -08:00
LiqinWeng
98e5962b7c
[RISCV][CostModel] Add cost for fabs/fsqrt of type bf16/f16 (#118608) 2025-01-10 17:22:51 +08:00
Craig Topper
6829f30883
[RISCV] Add a default common assignment of Inst{6-2} to the RVInst16CI base class. NFC (#122377)
Many instructions assign all or a subset of Inst{6-2} to Imm{4-0}. Make
this the default. Subsets of Inst{6-2} can be overridden as needed by
derived classes/records which we already do with Inst{12} in a few
places.
2025-01-09 22:11:04 -08:00
Shao-Ce SUN
369c61744a
[RISCV] Fix the cost of llvm.vector.reduce.and (#119160)
I added some CodeGen test cases related to reduce. To maintain
consistency, I also added cases for instructions like
`vector.reduce.or`.

For cases where `v1i1` type generates `VFIRST`, please refer to:
https://reviews.llvm.org/D139512.
2025-01-10 10:10:42 +08:00
Craig Topper
41e4018f9c
[RISCV][VLOPT] Simplify code by removing extra temporary variables. NFC (#122333)
Just do the conditional operator in the return statement.
2025-01-09 18:05:41 -08:00
Craig Topper
b11fe33aea
[RISCV] Correct the cost model for the i1 reduce.add and reduce.or. (#122349)
reduce.add uses the same sequence as reduce.xor. reduce.or should use
vmor not vmxor.
2025-01-09 18:05:22 -08:00
Michael Maitland
d0373dbe7c
[RISCV][VLOPT] Add vadc to isSupportedInstr (#122345) 2025-01-09 19:44:40 -05:00
Michael Maitland
04e54cc19f
[RISCV][VLOPT] Add Vector Single-Width Averaging Add and Subtract to isSupportedInstr (#122351) 2025-01-09 19:39:12 -05:00
Craig Topper
5d88a84ecd [RISCV] Simplify some RISCVInstrInfoC classes by removing arguments that never change. NFC 2025-01-09 16:21:55 -08:00
Michael Maitland
328c3a843f
[RISCV][VLOPT] Add vmerge to isSupportedInstr (#122340) 2025-01-09 16:10:40 -05:00
Craig Topper
b16777afb0
[RISCV] Return MILog2SEW for mask instructions getOperandLog2EEW. NFC (#122332)
The SEW operand for these instructions should have a value of 0. This
matches what was done for vcpop/vfirst.
2025-01-09 11:36:09 -08:00
Michael Maitland
5f70fea79f [RISCV][VLOPT] Add Vector Floating-Point Compare Instructions to getSupportedInstr 2025-01-09 10:50:32 -08:00
Michael Maitland
b419edeec3 [RISCV][VLOPT] Add widening floating point multiply to isSupportedInstr 2025-01-09 10:50:32 -08:00
Michael Maitland
a484fa1d0a [RISCV][VLOPT] Add floating point multiply divide instructions to getSupportedInstr 2025-01-09 10:50:32 -08:00
Michael Maitland
8beb9d393d [RISCV][VLOPT] Add vector widening floating point add subtract instructions to isSupportedInstr 2025-01-09 10:50:31 -08:00
Michael Maitland
c036a9a2c2 [RISCV][VLOPT] Add vector single width floating point add subtract instructions to isSupportedInstr 2025-01-09 10:50:31 -08:00
Michael Maitland
d5145715f7
[RISCV][VLOPT] Add vfirst and vcpop to getOperandInfo (#122295) 2025-01-09 13:31:02 -05:00
Michael Maitland
550841f839
[RISCV][VLOPT] Add fp-reductions to getOperandInfo (#122151) 2025-01-09 09:43:26 -05:00
Michael Maitland
f77a7dd875
[RISCV][VLOPT] Add getOperandInfo for integer and floating point widening reductions (#122176) 2025-01-09 09:35:06 -05:00
Luke Lau
c8ee1164bd
[RISCV] Fix masked->unmasked peephole handling masked pseudos with no passthru (#122253)
Some masked pseudos like PseudoVCPOP_M_B8_MASK don't have a passthru,
but in the masked->unmasked peephole we assumed the masked pseudo always
had one.

This checks for a passthru first and fixes #122245.
2025-01-09 19:54:37 +08:00
Craig Topper
5d03235c73
[RISCV] Add -mcpu=sifive-p550. (#122164)
This is the CPU in SiFive's HiFive Premier P550 development board.

Scheduler model will come in a later patch.
2025-01-08 21:02:46 -08:00
Craig Topper
b0f11dfc75
[RISCV] Add call preserved regmask to tail calls. (#122181)
Every call should have regmask operand to indicate what registers are
preserved or clobbered by the call. VirtRegRewriter uses this to tell
MachineRegisterInfo what registers are clobbered by a function. If the
mask isn't present the registers potentially clobbered by a tail called
function aren't counted. I have checked ARM, AArch64, and X86 and they
all have a regmask operand on their tail calls.

I believe this fixes an issue I'm seeing with IPRA.
2025-01-08 16:19:31 -08:00
Philip Reames
0b4fca5b75
[RISCV][VLOpt] Remove State field from OperandInfo [nfc] (#122160)
We can just use a std::optional to wrap the operand info instead. The
state field is confusing as we have a "partially known" state where EEW
is known and EMUL is nullopt, but it's still "Known".
2025-01-08 12:37:28 -08:00
Philip Reames
983a957768
[RISCV][VLOpt] Consolidate EMUL=SEW/EEW*LMUL logic [NFC] (#122021)
All but one of the cases in tree today have EMUL=SEW/EEW*LMUL. Repeating
this each time is verbose and introduces oppurtunity for error. (For
instance, the comment associated with vwmul.vv was out of sync with the
code for same.)

Introduce getOperandLog2EEW and move most complexity to it. Then
introduce getOperandInfo as a wrapper around previous, and special case
the one case which requires it.

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2025-01-08 10:58:37 -08:00
Michael Maitland
e93181bf13
[RISCV][VLOPT] Add vector fp-conversion instruction to isSupportedInstr (#122033)
When these instructions are marked nofpexcept, we can optimize them.
There are some added toggles in the output, likley because other
noexcept fp instructions are not part of isSupportedInstr yet. We may
want to avoid marking an instruction as isSupported in the future if any
of its FP users are missing nofpexcept to avoid added toggles. However,
we seem to get some GPRs back as a result of this change, which may
outweigh the cost of avoiding extra toggles.

The plan is to follow this patch up with added support for more FP
instructions in the same way. The instructions in this patch are a
natural starting point because they allow us to test with integer
instructions which have good support already.
2025-01-08 13:30:40 -05:00
Michael Maitland
b253a80f54
[RISCV][VLOPT] Add mask load to isSupported and getOperandInfo (#122030)
Add mask store to getOperandInfo since it has the same behavior.
2025-01-07 22:07:57 -05:00
Luke Quinn
dde5546b79
[RISCV] GISel custom lowering for G_ADD/G_SUB (#121587)
Custom lowering for s32 G_ADD/SUB to help match selection dag better.
Specifically for RV64 a s32 is produced as a add+sext the output this
allows for fewer instructions to sign extend a couple patterns. Allows
for the generation of addiw,subw,negw to reduce required instructions to
load values quicker

Log2_ceil_i32 in rvzbb.ll shows a more obvious improvement case.
2025-01-07 18:53:10 -08:00
Philip Reames
4c4364869c
[RISCV][VLOpt] Kill all uses of and remove twoTimesVLMUL [NFC] (#122003)
Case analysis:
* EEW=SEW*2, getEMULEqualsEEWDivSEWTimesLMUL(EEW) returns 2 x VLMUL
* EEW=SEW, getEMULEqualsEEWDivSEWTimesLMUL(EEW) returns VLMUL
2025-01-07 15:14:45 -08:00
Min-Yih Hsu
90d79ca4c7
[RISCV] Update the latencies of MUL and CPOP in SiFive P400 scheduling model (#122007)
According to llvm-exegesis, they should have around 2 cycles of latency
on P400 cores.
2025-01-07 15:01:05 -08:00
Michael Maitland
142787d368
[RISCV][VLOPT] Add support for checkUsers when UserMI is a Single-Width Integer Reduction (#120345)
Reductions are weird because for some operands, they are vector
registers but only read the first lane. For these operands, we do not
need to check to make sure the EEW and EMUL ratios match. The EEWs,
however, do need to match.
2025-01-07 17:56:07 -05:00
Michael Maitland
36e4176f1d
[RISCV][VLOPT] Add strided, unit strided, and indexed loads to isSupported (#121705)
Add to getOperandInfo too since that is needed to reduce the VL.
2025-01-07 17:45:06 -05:00
Craig Topper
afa8aeeeec
[RISCV][llvm-exegesis] Add default Pfm cycle counter. (#121866)
Also tested with Ubuntu on SiFive's HiFive Premier P550 board. Curiously
latency is reporting ~1.5 on basic scalar arithmetic, scalar mul is
~3.5, and div is ~36.5. This 0.5 cycles higher than I expect.
2025-01-07 09:51:34 -08:00