303 Commits

Author SHA1 Message Date
Yeting Kuo
d83620d101 [RISCV] Support vector strict_fsetcc/fsetccs.
The patch supports vector strict_fsetcc/fsetccs. Instead of revserving fflags,
the method to implement scalar quiet compares, the patch implement quiet
compares by masking the signaling compares when either input is NaN [0].

[0]: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-floating-point-compare-instructions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D147998
2023-04-14 09:10:41 +08:00
Yeting Kuo
6858a920b8 [RISCV] Support vector type strict_[su]int_to_fp and strict_fp_to_[su]int.
Also the patch loose the fixed vector contraint in llvm/lib/IR/Verifier.cpp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D147380
2023-04-06 10:09:44 +08:00
Dinar Temirbulatov
7f05bdf4ee [AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming-compatible-sve flag.
Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().
2023-04-05 10:10:55 +00:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Luke Lau
ec26c9cdc0 [RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.

We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145085
2023-04-02 16:47:44 +01:00
Luke Lau
80f3be9603 Revert "[RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN"
This reverts commit b95913e8c3a3521b85d689a358e620d89a4e83de.
2023-04-02 15:56:24 +01:00
Luke Lau
b95913e8c3 [RISCV] Lower fixed length interleaved accesses via vssegN/vlsegN
This enables the interleaved access pass on O1 and above, and causes
interleaving/deinterleaving shuffles of fixed length vectors with
stores/loads to be lowered into vssegN/vlsegN.

We need to be careful and make sure that we only lower vsseg/vlseg
whenever we know the fixed vector type will fit within the minimum vlen,
and that the interleaving factor is supported for the given LMUL.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145085
2023-04-02 15:20:21 +01:00
Yeting Kuo
84c8c2b4b4 [DAG][RISCV] Allow scalable vector ISD::STRICT_FP_ROUND and support vector ISD::STRICT_FP_ROUND for RISC-V.
The patch customized lower vector type ISD::STRICT_FP_ROUND to RISCVISD::STRICT_FP_ROUND.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D147113
2023-03-30 08:20:02 +08:00
Yeting Kuo
5cb4619a1f [RISCV][NFC] Fix ident in RISCVISelLowering.h.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D147120
2023-03-29 16:11:10 +08:00
Yeting Kuo
0676c6d91f [RISCV] Support vector type strict_fma.
Like D145900, the patch also supports fixed vector strict_fma nodes in RISC-V by
customized lowering them to riscv_strict_vfmadd_vl nodes. riscv_strict_vfmadd_vl
is created to avoid some riscv_vfmadd_vl optimizations happening to original
strict_fma nodes. The patch also adds combine patterns for riscv_strict_fmadd_vl
nodes with negation operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D146939
2023-03-28 09:01:46 +08:00
Craig Topper
29463612d2 [RISCV] Replace RISCV -> RISC-V in comments. NFC
To be consistent with RISC-V branding guidelines
https://riscv.org/about/risc-v-branding-guidelines/
Think we should be using RISC-V where possible.

More patches will follow.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D146449
2023-03-27 09:50:17 -07:00
Yeting Kuo
946d29e7e9 [RISCV] Support vector type strict_fsqrt.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D146911
2023-03-27 14:02:22 +08:00
Craig Topper
b50c6857a4 [RISCV] Move fli selection in RISCVISelDAGToDAG.cpp. NFC
We custom isel for ConstantFP that has higher priority than isel
patterns. We were previously detecting valid FP constants for fli
to early exit from the custom code. This detection called
getLoadFPImm. Then we would run the isel patterns which would call
getLoadFPImm a second time.

With a little bit more code we can directly select the fli instruction
in the custom handler and avoid a second call.

Remove the incorrect mayRaiseFPException flag from the FLI instructions.

Reviewed By: joshua-arch1

Differential Revision: https://reviews.llvm.org/D146093
2023-03-21 19:33:27 -07:00
Yeting Kuo
9637e950cb [RISCV] Support ISD::STRICT_FADD/FSUB/FMUL/FDIV for vector types.
The patch handles fixed type strict-fp by new RISCVISD::STRICT_ prefixed
isd nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145900
2023-03-15 07:47:16 +08:00
LiaoChunyu
eb54254b6e [RISCV] Return false from shouldFormOverflowOp when type is i8 and i16
i8 and i16 are not using overflow.
Reduce the number of zero extension instructions.

To reduce the uncertainty of the unknown,
most of the checks of the virtual function are kept

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D143646
2023-03-14 20:42:55 +08:00
Craig Topper
30705e9770 [RISCV] Support Zfa fli instructions with vector splats.
-Return false from RISCVDAGToDAGISel::selectFPImm for fli
 constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D145766
2023-03-10 09:16:21 -08:00
Yeting Kuo
b2c48559c8 [IR][DAG][RISCV] Allow scalable vector ISD::STRICT_FP_EXTEND and RISC-V supports for vector ISD::STRICT_FP_EXTEND.
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145548
2023-03-09 17:37:59 +08:00
LiaoChunyu
fbace95408 [RISCV] Enable preferZeroCompareBranch to optimize branch on zero in codegenprepare
Similar to ARM and SystemZ.

Related Patchs: D101778(preferZeroCompareBranch)
https://reviews.llvm.org/rG9a9421a461166482465e786a46f8cced63cd2e9f   ( == 0 to u< 1)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D142071
2023-02-28 14:43:40 +08:00
Manolis Tsamis
f6262201d8 [RISCV] Add vendor-defined XTheadMemIdx (Indexed Memory Operations) extension
The vendor-defined XTHeadMemIdx (no comparable standard extension exists
at the time of writing) extension adds indexed load/store instructions
as well as load/store and update register instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=27cfd142d0a7e378d19aa9a1278e2137f849b71b

Depends on D144002

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144249
2023-02-24 00:17:58 +01:00
Luke Lau
8d15e7275f [RISCV] Lower interleave and deinterleave intrinsics
Lower the two intrinsics introduced in D141924.

These intrinsics can be combined with loads and stores into the much more efficient segmented load and store instructions in a following patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144092
2023-02-23 16:23:02 +00:00
Philip Reames
9168c98553 [RISCV] Extract a helper routine for computing (runtime) VLMax [nfc] 2023-02-21 09:55:59 -08:00
Manolis Tsamis
bbb58a2302 [RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension
The vendor-defined XTHeadMemPair (no comparable standard extension exists
at the time of writing) extension adds two-GPR load/store pair instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd

Depends on D143847

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144002
2023-02-21 12:21:49 +01:00
Luke Lau
7e2f2f0fc8 [RISCV][NFC] Make a note of the operands for RISCVISD::VNSRL_VL
Split out from https://reviews.llvm.org/D144092

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144387
2023-02-21 09:45:09 +00:00
Philipp Tomsich
16a66af0a0 Revert "[RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension"
This reverts commit d2918544a7fc4b5443879fe12f32a712e6dfe325.
2023-02-17 19:45:55 +01:00
Manolis Tsamis
d2918544a7 [RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension
The vendor-defined XTHeadMemPair (no comparable standard extension exists
at the time of writing) extension adds two-GPR load/store pair instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd

Depends on D143847

Differential Revision: https://reviews.llvm.org/D144002
2023-02-17 19:45:22 +01:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Roland McGrath
34b21e817f [RISCV] Use OS-specific stack-guard ABI for Fuchsia
Fuchsia provides a slot relative to tp for the stack-guard value,
which is cheaper to materialize than the default GOT load.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D143353
2023-02-05 18:45:59 -08:00
Alex Bradbury
ae14754612 [RISCV] Implement isMultiStoresCheaperThanBitsMerge hook
Grabs the same logic and reasoning from the X86 implementation of the
hook. The benefit is slightly less clear for when the soft float ABI is
used (i.e. there's no transfer from an FPR to a GPR), but I've opted not
to gate it based on ABI.

Differential Revision: https://reviews.llvm.org/D140408
2023-01-31 12:47:48 +00:00
Luke Lau
f5a6447196 [RISCV] Combine FP_TO_INT to vfwcvt/fvncvt
Adds new pseudo instructions to make sure that the fcvt instructions
have all rounding mode (RM) and unsigned (XU) variants across
single-width, widening and narrowing conversions.
And likewise, extends the VL patterns to accompany them. We don't add
new VL nodes for the widening/narrowing conversions though, instead we
just add specific patterns for vfcvts on those wider/narrower types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D142102
2023-01-24 09:44:57 +00:00
Luke Lau
a0d80c2398 [RISCV] Generalize performFP_TO_INTCombine to vectors
Like in the scalar domain, combine calls to (fp_to_int (ftrunc X)) on
scalable and fixed-length vectors into a single vfcvt instruction.
For truncating rounds, the static vfcvt.rtz rounding mode is used.
Otherwise use the VFCVT_RM_ variants to set the rounding mode
dynamically.
Closes https://github.com/llvm/llvm-project/issues/56737

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D141599
2023-01-18 10:53:24 +00:00
Roman Lebedev
cc39c3b17f
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.

Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.

The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer

This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5

And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K

I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140638
2023-01-14 19:12:18 +03:00
Yeting Kuo
5280d3e738 [RISCV] Teach lowerCTLZ_CTTZ_ZERO_UNDEF to handle conversion i32/i64 vectors to f32 vectors.
Previously lowerCTLZ_CTTZ_ZERO_UNDEF converted the source to float value by
ISD::UINT_TO_FP. ISD::UINT_TO_FP uses dynamic rounding mode, so the rounding
may make the exponent of the result not as expected when converting i32/i64 to f32.
This is the reason why we constrained lowerCTLZ_CTTZ_ZERO_UNDEF to only handle
an i32 source when the f64 type having the same element count as source is legal.

The patch teaches lowerCTLZ_CTTZ_ZERO_UNDEF converts i32/i64 vectors to f32
vectors by vfcvt.f.xu.v with RTZ rounding mode. Using RTZ is to make sure the
exponent of results is correct, although f32 could not totally represent each
value in i32/i64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140782
2023-01-12 14:42:47 +08:00
Francesco Petrogalli
ac1ffd3cac [TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen.
Rework the change to prevent build failures. NFCI.

The failing code was submitted as
cf7a8305a2b4ddfd299c748136cb9a2960ef7089 and reverted via
8bd65e535fb33bc48805bafed8217b16a853e158.

The rework in this new commit prevents failures like the following:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/Targets/RISCV.cpp.o
/usr/bin/c++  [bunch of non interesting stuff]  -c <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp
In file included from <path-to>/llvm-project/clang/lib/Basic/Targets/RISCV.cpp:19:
<path-to>/llvm-project/llvm/include/llvm/TargetParser/RISCVTargetParser.h:29:10: fatal error: llvm/TargetParser/RISCVTargetParserDef.inc: No such file or directory
  29 | #include "llvm/TargetParser/RISCVTargetParserDef.inc"
     |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These failures happen because the library LLVMTargetParser depends on
RISCVTargetParserTableGen, which is a tablegen target that generates
the list of CPUs in
llvm/TargetParser/RISCVTargetParserDef.inc. This *.inc file is
included by the public header file
llvm/TargetParser/RISCVTargetParser.h.

The header file llvm/TargetParser/RISCVTargetParser.h is also used in
components (clangDriver and clangBasic) that link into
LLVMTargetParser, but on some configurations such components might end
up being built before TargetParser is ready.

The fix is to make sure that clangDriver and clangBasic depend on the
tablegen target RISCVTargetParserTableGen, which generates the .inc
file whether or not LLVMTargetParser is ready.

WRT the original patch at https://reviews.llvm.org/D137517, this
commit is just adding RISCVTargetParserTableGen in the DEPENDS list of
clangDriver and clangBasic.
2023-01-11 11:18:44 +01:00
Francesco Petrogalli
8bd65e535f Revert "[TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen."
This reverts commit cf7a8305a2b4ddfd299c748136cb9a2960ef7089.
2023-01-11 10:22:56 +01:00
Francesco Petrogalli
cf7a8305a2 [TargetParser] Generate the defs for RISCV CPUs using llvm-tblgen.
This patch removes the file `llvm/include/llvm/TargetParser/RISCVTargetParser.def` and replaces it with a tablegen-generated `.inc` file out of `llvm/lib/Target/RISCV/RISCV.td`.

The module system has been updated to make sure we can build clang/llvm with `-DLLVM_ENABLE_MODULES=On`

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137517
2023-01-11 10:00:04 +01:00
jacquesguan
db3f3243bb [RISCV] Use vfirst.m to extract the first element from mask vector.
This patch uses vfirst.m to extract the first bit of mask.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139512
2023-01-03 11:24:18 +08:00
Anton Sidorenko
37f9eec142 [RISCV] Allow conversion of fp divisions to fp multiplications by the reciprocal
If the divisor is repeated at least twice, we will convert the FDIVs to the
calculation of the reciprocal and FMULs.

We perform the transformation only under fast-math mode. FDIVs must have
'arcp' flag.

Differential Revision: https://reviews.llvm.org/D140024
2022-12-15 13:00:36 +03:00
jacquesguan
c2f199fa48 [DAGCombiner] Scalarize extend/truncate for splat vector.
This revision scalarizes extend/truncate for splat vector.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122875
2022-12-12 14:53:10 +08:00
jacquesguan
f7a46aa8fb [RISCV] Fold vector binary operatrion into select with identity constant.
This patch implements shouldFoldSelectWithIdentityConstant for RISCV. It would try to generate vmerge after the binary instruction and let them folded to maksed instruction later.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131551
2022-12-06 11:19:31 +08:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Krzysztof Parzyszek
864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Philip Reames
7d82c99403 [RISCV][TTI] Account for constant materialization cost when costing arithmetic operations
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.

We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.

We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.

Differential Revision: https://reviews.llvm.org/D138941
2022-11-30 07:20:51 -08:00
Philip Reames
b25672ba82 [RISCV] Separate out helper for checking if vector splat supported for operand [nfc] 2022-11-29 11:05:46 -08:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Yeting Kuo
ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Craig Topper
dde8423f21 [RISCV] Expand i32 abs to negw+max at isel.
This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.

This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
2022-11-14 19:44:05 -08:00
Craig Topper
6254495c6b [RISCV] Move RVVBitsPerBlock to TargetParser.h so we can use it in clang. NFC
Differential Revision: https://reviews.llvm.org/D137266
2022-11-02 13:09:14 -07:00
Yeting Kuo
71e4e35581 [VP][RISCV] Add vp.rint and RISC-V support.
FRINT uses dynamic rounding mode instead of static rounding mode. The patch
rename VFCVT_X_F_VL to VFCVT_RM_X_F_VL for static rounding mode uses and added
new ISDNode VFCVT_X_F_VL directly selected to PseudoVFCVT_X_F_V.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136662
2022-11-01 14:52:47 +08:00
Craig Topper
e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00
Philip Reames
60c91fd364 [RISCV] Disallow scale for scatter/gather
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.

This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)

The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.

Differential Revision: https://reviews.llvm.org/D134382
2022-09-22 15:31:26 -07:00