The patch handles fixed type strict-fp by new RISCVISD::STRICT_ prefixed
isd nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145900
We currently have 3 functions and 3 lookup tables. This was the
most expediant and obvious way to fix several bugs.
This patch uses a single function and single lookup
table. It uses APFloat::convert to convert from the half or double
to single precision. If the conversion doesn't have any errors or
lose any information we use the f32 table to finish the lookup.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D145897
This adds two new methods to ShuffleVectorInst, isInterleave and
isInterleaveMask, so that the logic to check if a shuffle mask is an
interleave can be shared across the TTI, codegen and the interleaved
access pass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145971
I don't have a test case that fails for this, but it seemed like
we should only handle legal types. The callers I looked at in
DAGCombine either check the type is legal or don't even call
isFPImmLegal unless LegalOperations is true.
Written in a slightly odd way because switches on EVT require
an additional isSimple check so an if/else chain is easier. Used a bool
to shorten the code instead of having multiple ifs and returns.
AArch64 uses a similarish structure.
-Return false from RISCVDAGToDAGISel::selectFPImm for fli
constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145766
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
fli.h requires Zfh or Zvfh. We need to check for this in
isFPImmLegal. Zvfh support will come in another patch.
I had to split the test file because there are other issues with
Zfhmin and some intrinsics.
These intrinsics are equivalent to the regular @llvm.riscv.vssegNF
intrinsics, only they accept fixed length vectors in their overloaded
types: The regular intrinsics only operate on scalable vectors.
These intrinsics convert the fixed length vectors to scalable ones, and
then lower it on to the regular scalable intrinsic.
This mirrors the intrinsics added in 0803dba7dd998ad073d75a32b65296734c10ae70
This will be used in a later patch with the Interleaved Access pass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145022
Lower the two intrinsics introduced in D141924.
These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144092
A mistake in the control flow of performMemPairCombine resulted in paired
loads/stores for types that were not supported by the instructions (i8/i16).
These loads/stores could not match the constraints of the patterns defined
in the THead td file and the compiler would throw a 'Cannot select' error.
This is now fixed and two new test functions have been added in xtheadmempair.ll
which would previously crash the compiler. The compiler was additionally tested
with a wide range of benchmarks and no issues were observed.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144559
Not entirely sure we'll end up reusing the body of the transform, but personally I find this structure easier to follow anyways.
Differential Revision: https://reviews.llvm.org/D144532
Interleaves generated with vwaddu.vv and vwmaccu.vx were using VLs that
were twice the number of elements actually needed in the vector.
This also pulls the interleaving logic out into its own function so it
can be reused by later patches, and adapts it for scalable vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144386
Instead of special casing Zfa in the custom inserters, select the
correct instructions during isel.
BuildPairF64 we can do with pattern, but SplitF64 requires custom
selection due to the two destinations.
If we didn't need SplitF64 without Zfa, I would have an extract low
and extract high ISD opcode for Zfa to avoid that issue.
Rather than using operator[] on getFeatureBits we can use
hasFeature to shorten the code.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D144300
The RISC-V backend thus far does not provide strength-reduction, which
causes a long (but not complete) list of 3-instruction patterns listed
to utilize the shift-and-add instruction from Zba and XTHeadBa in
strength-reduction.
This adds the logic to perform strength-reduction through the DAG
combine for ISD::MUL. Initially, we wire this up for XTheadBa only,
until this has had some time to settle and get real-world test
exposure.
The following strength-reductions strategies are currently supported:
- XTheadBa
- C = (n + 1) // th.addsl
- C = (n + 1)k // th.addsl, slli
- C = (n + 1)(m + 1) // th.addsl, th.addsl
- C = (n + 1)(m + 1)k // th.addsl, th.addsl, slli
- C = ((n + 1)m + 1) // th.addsl, th.addsl
- C = ((n + 1)m + 1)k // th.addslm th.addsl, slli
- base ISA
- C being 2 set-bits // slli, slli, add
(possibly slli, th.addsl)
Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.
With the strength reduction now being performed in performMUL combine,
we drop the (now redundant) patterns from RISCVInstrInfoXTHead.td.
Depends on D143029
Differential Revision: https://reviews.llvm.org/D143394
Recommit by preames with commit message, various style cleanups, and unaddressed review comments corrected.
This patch implements experimental codegen support for the RISCV Zfa extension as specified here: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221119-5234c63/riscv-spec.pdf, Ch. 25. This extension has not been ratified.
This change does not include support for FLI (upcoming in a follow up change) or FCVTMOD (not relevant for C/C++).
Differential Revision: https://reviews.llvm.org/D143982
This is needed to support the new interleave intrinsics from D141924 for
fixed vectors.
I've reworked the core loop to operate in terms of half of a source. Making 4
possible half sources. The first element of the half is used to indicate which
source using the same numbering as the shuffle where the second source elements
are numbered after the first source.
I've added restrictions to only match the first half of two vectors or the
first and second half of a single vector. This was done to prevent regressions
on the cases we have coverage for. I saw cases where generic DAG combine split
a single interleave into 2 smaller interleaves a concat. We can revisit in the
future.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144143
These seem like properties we will want to adjust based on -mtune.
Move them to subtarget like is done on ARM and AArch64. Don't add
any overrides yet.
Note there's a slight change here. We are now passing Align(1) for
preferred function alignment where we previously passed the minimum
alignment. As far as I could tell, it will be maxed with min when
it used so this should be ok.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144048
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Fix the following warning:
/lib/Target/RISCV/RISCVISelLowering.cpp:315:24: error: braces around scalar initializer [-Werror,-Wbraced-scalar-init]
setOperationAction({ISD::CTLZ}, XLenVT, Legal);
^~~~~~~~~~~
This change implements a proposed lowering from LLVM's memory model to the TSO memory model defined by the Ztso extension. Selecting the proposed mapping turns out to be an involved conversation that really didn't fit within a review description, so let me refer you to https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst. This review implements the WMO compatible variant (the proposed one in that document).
Ztso is currently accepted as an experimental extension in LLVM. Despite the fact the extension was recently ratified, I think we need to leave it as experimental until we have wide agreement on the chosen mapping for ABI purposes.
I need to note that the current in-tree implementation defaults to generating WMO compatible fences. This is entirely compatible with the proposed mapping in this patch, but is unfortunately not compatible with the major alternative. The in tree implementation is explicitly experimental so the impact of this is limited, but it is worth calling out that if settle on the alternative we will have a minor ABI break. My apologies for not calling this out in the original patch; I had not realized at the time that one of our realistic choices for mappings wouldn't be WMO compatible.
This patch only contains the changes for load/store and fence. That is, it does not change the lowering for atomicrmw operations. This is a sound thing to do under the proposed mapping since the existing WMO mappings remain compatible. I do plan to change these; I'm just working incrementally.
Differential Revision: https://reviews.llvm.org/D143076
The RISC-V backend thus far does not provide strength-reduction, which
causes a long (but not complete) list of 3-instruction patterns listed
to utilize the shift-and-add instruction from Zba and XTHeadBa in
strength-reduction.
This adds the logic to perform strength-reduction through the DAG
combine for ISD::MUL. Initially, we wire this up for XTheadBa only,
until this has had some time to settle and get real-world test
exposure.
The following strength-reductions strategies are currently supported:
- XTheadBa
- C = (n + 1) // th.addsl
- C = (n + 1)k // th.addsl, slli
- C = (n + 1)(m + 1) // th.addsl, th.addsl
- C = (n + 1)(m + 1)k // th.addsl, th.addsl, slli
- C = ((n + 1)m + 1) // th.addsl, th.addsl
- C = ((n + 1)m + 1)k // th.addslm th.addsl, slli
- base ISA
- C being 2 set-bits // slli, slli, add
(possibly slli, th.addsl)
Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.
With the strength reduction now being performed in performMUL combine,
we drop the (now redundant) patterns from RISCVInstrInfoXTHead.td.
Depends on D143029
Differential Revision: https://reviews.llvm.org/D143394
Doing so makes it easier to do printf style debugging in idiomatic manner. I followed the code structure of Value with only the definition of dump being #ifdef out in non-debug builds. Not sure if this is the "right" option; we don't seem to have any single consistent scheme on how dump is handled.
Note: This is a follow up to D143454 which did the same for EVT.
Differential Revision: https://reviews.llvm.org/D143511
Fuchsia provides a slot relative to tp for the stack-guard value,
which is cheaper to materialize than the default GOT load.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D143353