The patch handles fixed type strict-fp by new RISCVISD::STRICT_ prefixed
isd nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145900
D144535 enabled machine copy propagation for RISC-V and added it to the
pass pipeline in addPreEmitPass2 (after the MachineOutliner).
Unfortunately, the MachineCopyPropagation pass is unable to correctly
analyse outlined functions, and will delete copy instructions where a
register is set that is intended to be live-out.
RISCVInstrInfo::buildOutlinedFrame will directly insert a JALR, while a
similar function going through the normal codegen path would have a
PseudoRet with operands indicating registers that are live-out.
This patch does the simplest fix, which is to run MachineCopyPropagation
before the MachineOutliner.
Differential Revision: https://reviews.llvm.org/D146037
i8 and i16 are not using overflow.
Reduce the number of zero extension instructions.
To reduce the uncertainty of the unknown,
most of the checks of the virtual function are kept
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D143646
Fix mistake in f16 table in previous patch.
Original commit message:
Use separate lookup tables instead of trying to reuse the fli.s
table.
We were missing the 2 denormal cases for fli.h. We also had an issue
where fli.d was only checking 8 bits of the 11 bit exponent.
Use separate lookup tables instead of trying to reuse the fli.s
table.
We were missing the 2 denormal cases for fli.h. We also had an issue
where fli.d was only checking 8 bits of the 11 bit exponent.
We fail to use fli.h for the 2 denormal values.
We use fli.d for some values where the value is larger than a float
can represent due to truncating the exponent to 8 bits without checking
if it fits in 8 bits.
-Return false from RISCVDAGToDAGISel::selectFPImm for fli
constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145766
Previously, we printed all constants in scientific notation with
6 digits of precision. This is not enough to accurately display
the smallest value, but increasing the precision would be too much
for other values.
This patch prints values with fractional bits using only as many digits as
needed. 1*2^-15 and 1*2^-16 will be printed in scientific notation while
the others are printed without scientific notation. The integer values
are printed with a single 0 after the decimal point.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145645
Refer from: https://reviews.llvm.org/D44782
After https://reviews.llvm.org/D130302, LW+SEXT.B can be folded into LB
as partially reload stack slot. This gains incorrect optimization result
from `StackSlotColoring` without given the number of bytes exactly load
from stack. LB+SW are mis-interpreted as fully reload/restore from stack
slot without the sign-extension. SW would be considered as a redundant store.
The testcase is copied from llvm/test/CodeGen/X86/pr30821.mir.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145471
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
fli.h requires Zfh or Zvfh. We need to check for this in
isFPImmLegal. Zvfh support will come in another patch.
I had to split the test file because there are other issues with
Zfhmin and some intrinsics.
There are a couple bugs in the current support for this:
-We do all the parsing in single precision so any value less than or
equal to the minimum fp32 is accepted as the minimum value for f64.
-To support fp16 minimum value, getLoadFP32Imm has a special case, but
that causes a miscompile in CodeGen.
Differential Revision: https://reviews.llvm.org/D145542
These intrinsics are equivalent to the regular @llvm.riscv.vssegNF
intrinsics, only they accept fixed length vectors in their overloaded
types: The regular intrinsics only operate on scalable vectors.
These intrinsics convert the fixed length vectors to scalable ones, and
then lower it on to the regular scalable intrinsic.
This mirrors the intrinsics added in 0803dba7dd998ad073d75a32b65296734c10ae70
This will be used in a later patch with the Interleaved Access pass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145022
The f32 matching code for fli was hacked to allow the f16 minimum value
to match for the fli.h instruction in the assembler. This was done
because the assembler parses the floating point literal for fli.h,
fli.s, and fli.d as a single precision value.
Unfortunately, this function is also used by CodeGen and causes
this value to be miscompiled for f32.
We can form a MU TU operation and remove the merge if they use the
same merge value.
My primary interest was a case involving VP intrinsics from our downstream,
but it requires another optimization that isn't in upstream yet. So I've used
RVV intrinsics to get the desired instructions.
Co-authored-by: Nitin John Raj <nitin.raj@sifive.com>
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D145272
Like what has been done in AArch64 (D125335).
We enable this under `-O2` to show the codegen diffs here but we
may only do this under `-O3` like AArch64.
There are two cases that we may produce these eliminable copies:
1. ISel of `FrameIndex`. Like `rvv/fixed-vectors-calling-conv.ll`.
2. Tail duplication. Like `select-optimize-multiple.ll`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144535
Sign-extending the input is irrelevant to the test in question, which
merely needs some arbitrary input to switch on. This also removes the
only difference between the RV32I and RV64I cases for @below_threshold
and will allow unifying them.
Default MaxDivRemBitWidthSupported to 128, so that divisions larger
than 128 bits are always expanded, without requiring additional
configuration from the target.
Note that this may still emit calls to __udivti3 on 32-bit targets,
which likely don't have an implementation of that builtin. However,
I believe this is sufficient to fix
https://github.com/llvm/llvm-project/issues/60531, because Zig must
already be defining those builtins.
Differential Revision: https://reviews.llvm.org/D144871
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half. Usually
we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0).
This can reduce the live range of LHSLo.
On targets that lack ADDCARRY support we split a wide uaddo into
an ADD and a SETCC that both need to be split.
For (uaddo X, 1) we can observe that when the add overflows the result
will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this.
This improves D142071.
There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or
(lo(X) & hi(X)) == -1 before the addition. That would be closer to the
code before D142071.
Reviewed By: liaolucy
Differential Revision: https://reviews.llvm.org/D144614
Lower the two intrinsics introduced in D141924.
These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144092
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144536
This introduces R_RISCV_PLT32, PC-relative data relocation that takes
the 32-bit relative offset to a function or its PLT entry from its
relocation location.
This is needed to support relative vtables on RISCV.
Github PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/363
The lld handling of this reloc is D143115.
Differential Revision: https://reviews.llvm.org/D143226
A mistake in the control flow of performMemPairCombine resulted in paired
loads/stores for types that were not supported by the instructions (i8/i16).
These loads/stores could not match the constraints of the patterns defined
in the THead td file and the compiler would throw a 'Cannot select' error.
This is now fixed and two new test functions have been added in xtheadmempair.ll
which would previously crash the compiler. The compiler was additionally tested
with a wide range of benchmarks and no issues were observed.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144559
RISC-V vector instruction has register overlapping constraint for certain
instructions, and will cause illegal instruction trap if violated, we use
early clobber to model this constraint, but it can't prevent register allocator
allocated same or overlapped if the input register is undef value, so convert
IMPLICIT_DEF to temporary pseudo could prevent that happen, it's not best way
to resolve this. Ideally we should model the constraint right, but before we
model the constraint right, it's the approach to prevent that happen.
See also: https://github.com/llvm/llvm-project/issues/50157
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D129735
Interleaves generated with vwaddu.vv and vwmaccu.vx were using VLs that
were twice the number of elements actually needed in the vector.
This also pulls the interleaving logic out into its own function so it
can be reused by later patches, and adapts it for scalable vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144386
Support for the unratified 1.0-rc3 specification was introduced in
D133443. The specification has since been ratified (in November 2022
according to the recently ratified extensions list
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>.
A review of the diff
<https://github.com/riscv/riscv-zawrs/compare/V1.0-rc3...main> of the
1.0-rc3 spec vs the current/ratified document shows no changes to the
instruction encoding or naming. At one point, a note was added
<e84f42406a>
indicating Zawrs depends on the Zalrsc extension (not officially
specified, but I believe to be just the LR/SC instructions from the A
extension). The final text ended up as "The instructions in the Zawrs
extension are only useful in conjunction with the LR instructions, which
are provided by the A extension, and which we also expect to be provided
by a narrower Zalrsc extension in the future." I think it's consistent
with this phrasing to not require the A extension for Zawrs, which
matches what was implemented.
No intrinsics are implemented for Zawrs currently, meaning we don't need
to additionally review whether those intrinsics can be considered
finalised and ready for exposure to end users.
Differential Revision: https://reviews.llvm.org/D143507
Experimental support for the zfa extension was recently added in https://reviews.llvm.org/D141984. A couple of the normal test changes and clang plumbing got missed in that change. This commit updates the usual suspects.
Differential Revision: https://reviews.llvm.org/D144288
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.
This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144229
The instructions in the XTHeadMac extension (multiply accumulate
instructions) were marked as commutative but because the destination
register was also an input (accumulate) register and was connected to
the destination register with a register allocator constraint, all
three operands (instead of two) were incorrectly considered
commutative. To fix that an appropriate fixCommutedOpIndices call was
added for these instructions in findCommutedOpIndices
New test functions have been added to test the correct behaviour in
xtheadmac.ll.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144278