Also the patch loose the fixed vector contraint in llvm/lib/IR/Verifier.cpp.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147380
Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.
We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145085
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.
We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145085
The patch customized lower vector type ISD::STRICT_FP_ROUND to RISCVISD::STRICT_FP_ROUND.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147113
Like D145900, the patch also supports fixed vector strict_fma nodes in RISC-V by
customized lowering them to riscv_strict_vfmadd_vl nodes. riscv_strict_vfmadd_vl
is created to avoid some riscv_vfmadd_vl optimizations happening to original
strict_fma nodes. The patch also adds combine patterns for riscv_strict_fmadd_vl
nodes with negation operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D146939
We custom isel for ConstantFP that has higher priority than isel
patterns. We were previously detecting valid FP constants for fli
to early exit from the custom code. This detection called
getLoadFPImm. Then we would run the isel patterns which would call
getLoadFPImm a second time.
With a little bit more code we can directly select the fli instruction
in the custom handler and avoid a second call.
Remove the incorrect mayRaiseFPException flag from the FLI instructions.
Reviewed By: joshua-arch1
Differential Revision: https://reviews.llvm.org/D146093
The patch handles fixed type strict-fp by new RISCVISD::STRICT_ prefixed
isd nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145900
i8 and i16 are not using overflow.
Reduce the number of zero extension instructions.
To reduce the uncertainty of the unknown,
most of the checks of the virtual function are kept
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D143646
-Return false from RISCVDAGToDAGISel::selectFPImm for fli
constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145766
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
Lower the two intrinsics introduced in D141924.
These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144092
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Fuchsia provides a slot relative to tp for the stack-guard value,
which is cheaper to materialize than the default GOT load.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D143353
Grabs the same logic and reasoning from the X86 implementation of the
hook. The benefit is slightly less clear for when the soft float ABI is
used (i.e. there's no transfer from an FPR to a GPR), but I've opted not
to gate it based on ABI.
Differential Revision: https://reviews.llvm.org/D140408
Adds new pseudo instructions to make sure that the fcvt instructions
have all rounding mode (RM) and unsigned (XU) variants across
single-width, widening and narrowing conversions.
And likewise, extends the VL patterns to accompany them. We don't add
new VL nodes for the widening/narrowing conversions though, instead we
just add specific patterns for vfcvts on those wider/narrower types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D142102
Like in the scalar domain, combine calls to (fp_to_int (ftrunc X)) on
scalable and fixed-length vectors into a single vfcvt instruction.
For truncating rounds, the static vfcvt.rtz rounding mode is used.
Otherwise use the VFCVT_RM_ variants to set the rounding mode
dynamically.
Closes https://github.com/llvm/llvm-project/issues/56737
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D141599
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.
Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.
The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer
This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5
And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K
I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140638
Previously lowerCTLZ_CTTZ_ZERO_UNDEF converted the source to float value by
ISD::UINT_TO_FP. ISD::UINT_TO_FP uses dynamic rounding mode, so the rounding
may make the exponent of the result not as expected when converting i32/i64 to f32.
This is the reason why we constrained lowerCTLZ_CTTZ_ZERO_UNDEF to only handle
an i32 source when the f64 type having the same element count as source is legal.
The patch teaches lowerCTLZ_CTTZ_ZERO_UNDEF converts i32/i64 vectors to f32
vectors by vfcvt.f.xu.v with RTZ rounding mode. Using RTZ is to make sure the
exponent of results is correct, although f32 could not totally represent each
value in i32/i64.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140782
Rework the change to prevent build failures. NFCI.
The failing code was submitted as
cf7a8305a2b4ddfd299c748136cb9a2960ef7089 and reverted via
8bd65e535fb33bc48805bafed8217b16a853e158.
The rework in this new commit prevents failures like the following:
FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/Targets/RISCV.cpp.o
/usr/bin/c++ [bunch of non interesting stuff] -c <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp
In file included from <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp:19:
<path-to>/llvm-project/llvm/include/llvm/TargetParser/RISCVTargetParser.h:29:10: fatal error: llvm/TargetParser/RISCVTargetParserDef.inc: No such file or directory
29 | #include "llvm/TargetParser/RISCVTargetParserDef.inc"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These failures happen because the library LLVMTargetParser depends on
RISCVTargetParserTableGen, which is a tablegen target that generates
the list of CPUs in
llvm/TargetParser/RISCVTargetParserDef.inc. This *.inc file is
included by the public header file
llvm/TargetParser/RISCVTargetParser.h.
The header file llvm/TargetParser/RISCVTargetParser.h is also used in
components (clangDriver and clangBasic) that link into
LLVMTargetParser, but on some configurations such components might end
up being built before TargetParser is ready.
The fix is to make sure that clangDriver and clangBasic depend on the
tablegen target RISCVTargetParserTableGen, which generates the .inc
file whether or not LLVMTargetParser is ready.
WRT the original patch at https://reviews.llvm.org/D137517, this
commit is just adding RISCVTargetParserTableGen in the DEPENDS list of
clangDriver and clangBasic.
This patch removes the file `llvm/include/llvm/TargetParser/RISCVTargetParser.def` and replaces it with a tablegen-generated `.inc` file out of `llvm/lib/Target/RISCV/RISCV.td`.
The module system has been updated to make sure we can build clang/llvm with `-DLLVM_ENABLE_MODULES=On`
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137517
If the divisor is repeated at least twice, we will convert the FDIVs to the
calculation of the reciprocal and FMULs.
We perform the transformation only under fast-math mode. FDIVs must have
'arcp' flag.
Differential Revision: https://reviews.llvm.org/D140024
This patch implements shouldFoldSelectWithIdentityConstant for RISCV. It would try to generate vmerge after the binary instruction and let them folded to maksed instruction later.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D131551
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.
We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.
We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.
Differential Revision: https://reviews.llvm.org/D138941
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.
This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.
This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)
The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.
Differential Revision: https://reviews.llvm.org/D134382