970 Commits

Author SHA1 Message Date
Fraser Cormack
f924a3d474 [SelectionDAG] Support scalable-vector splats in yet more cases
This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.

Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106575
2021-07-26 10:15:08 +01:00
Craig Topper
c63dbd8501 [RISCV] Custom lower (i32 (fptoui/fptosi X)).
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346
2021-07-24 10:50:43 -07:00
Fraser Cormack
1ffc369394 [RISCV] Add a test showing an incorrect vsetvli insertion
This patch adds a reduced test case which identifies an illegal vsetvli
inserted by the compiler. The compiler emits a vsetvli which is intended
to preserve VL with the SEW/LMUL ratio e32/m1 when in fact the VL could
have been set by e64/m2 in a predecessor block.

Differential Revision: https://reviews.llvm.org/D106286
2021-07-23 09:27:06 -07:00
Craig Topper
5edccc4581 [RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant.
Since we're changing VTYPE, we may change VLMAX which could
invalidate the previous VL. If we can't tell if it is safe we
should use an AVL of 1 instead of keeping the old VL.

This is a quick fix. We may want to thread VL to the pseudo
instruction instead of making up a value. That will require ISD
opcode changes and changes to the C intrinsic interface.

This fixes the issue raised in D106286.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106403
2021-07-23 09:12:05 -07:00
Fraser Cormack
99ed6ce2bd [SelectionDAG][RISCV] Add tests showing missed scalable-splat optimizations
These tests show missed opportunities in the SelectionDAG layer when
dealing with scalable-vector splats. All of these are handled for the
equivalent `ISD::BUILD_VECTOR` code, and the tests have largely been
translated from the equivalent X86 tests.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106574
2021-07-23 06:58:16 +01:00
Hsiangkai Wang
4b2dd318dd [RISCV] Add FrameSetup/FrameDestroy flag to prologue/epilog instructions.
Differential Revision: https://reviews.llvm.org/D105086
2021-07-23 11:35:19 +08:00
Fraser Cormack
b115c038d2 [RISCV] Fix a crash when lowering split float arguments
Lowering certain float vectors without legal vector types could cause a
crash due to a bad interaction between passing floats via GPRs and
argument splitting. Split vector floats appear just like scalar floats.
Under certain situations we choose to pass these float arguments via
GPRs and use an XLenVT location and set the 'BCvt' info to track how
they must be converted back to floating-point values. However, later
logic for handling split arguments may take over, in which case we lose
the previous information and set the 'Indirect' info, thus incorrectly
lowering to integer types.

I don't believe that we would have come across the notion of split
floating-point arguments before. This patch addresses the issue by
updating the lowering so that split arguments are only passed indirectly
when they are scalar integer types.

This has some change to how we lower some larger illegal float vectors,
as can be seen in 'fastcc-float.ll' where the vector is now passed
partly in registers and partly on the stack.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D102852
2021-07-22 09:55:26 +01:00
Fraser Cormack
7b3a69bc16 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This relands a6ca88e908b5befcd9b0f8c8cb40f53095cc17bc which was originally
reverted due to overflow bugs in e3fa2b1eab60342dc882b7b888658b03c472fa2b.

This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-22 09:36:12 +01:00
ShihPo Hung
8d86562e5f [RegisterCoalescer] Make resolveConflicts aware of earlyclobber
Prior to this patch, it skipped the instruction defining VNI when checking if the tainted lanes are used.
In the given example, VRGATHER is an illegal instruction because its DstReg overlaps with SrcReg.

Therefore we need to check the defining instruction as well when there is an earlyclobber constraint.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D105684
2021-07-22 12:11:10 +08:00
Ben Shi
9e5c5afc7e [RISCV] Optimize multiplication in the zba extension with SH*ADD
This patch make the following optimization.

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105796
2021-07-22 10:28:41 +08:00
Eli Friedman
0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Ben Shi
d3738a09fb [RISCV][test] Add tests for mul optimization in the zba extension with SH*ADD
These tests will show the following optimization by future patches.

(mul x, 11) -> (SH1ADD (SH2ADD x, x), x)
(mul x, 19) -> (SH1ADD (SH3ADD x, x), x)
(mul x, 13) -> (SH2ADD (SH1ADD x, x), x)
(mul x, 21) -> (SH2ADD (SH2ADD x, x), x)
(mul x, 37) -> (SH2ADD (SH3ADD x, x), x)
(mul x, 25) -> (SH3ADD (SH1ADD x, x), x)
(mul x, 41) -> (SH3ADD (SH2ADD x, x), x)
(mul x, 73) -> (SH3ADD (SH3ADD x, x), x)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106031
2021-07-21 10:16:56 +08:00
Craig Topper
81efb82570 [RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants.
If we need to shift left anyway we might be able to take advantage
of LUI implicitly shifting its immediate left by 12 to cover part
of the shift. This allows us to use more bits of the LUI immediate
to avoid an ADDI.

isDesirableToCommuteWithShift now considers compressed instruction
opportunities when deciding if commuting should be allowed.

I believe this is the same or similar to one of the optimizations
from D79492.

Reviewed By: luismarques, arcbbb

Differential Revision: https://reviews.llvm.org/D105417
2021-07-20 09:22:06 -07:00
Craig Topper
2ad2c5d457 [RISCV] Add -mattr=+c command lines to add-before-shl.ll to prepare for D105417. NFC 2021-07-20 09:22:06 -07:00
Craig Topper
98d4adc2d1 [RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)
Replace some existing isel patterns that are covered by the new
code. SLLIUWPat has been removed in favor of folding its root case
into the new code. The other uses in isel patterns for shXadd.uw
have been switched to using hardcoded AND masks.

This is based on the original version of D49585 from ARM. The final
version of that was made a DAG combine, but I've chosen to keep it
as custom isel. I'm not convinced DAG combine is as good with
shift pairs as it is with and+shift. I saw some issues optimizing
the shifts created by vscale lowering if an and isn't created for
from a shift pair.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106230
2021-07-20 08:53:55 -07:00
Craig Topper
84877a098a [RISCV] Use unordered indexed loads for MGATHER.
I don't think the semantics of the llvm masked gather intrinsic care
about the order the elements are loaded. For example, type legalization
by splitting will chain them in parallel. This is different than
scatter which we do chain in order.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106025
2021-07-20 08:46:02 -07:00
Craig Topper
4f1270a61e [RISCV] Add test cases to show an issue with our fcvt.wu isel patterns on RV64.
The pattern we match is (sext_inreg (assertzexti32 (fp_to_uint)), i32). If
the assertzexti32 has an additional user we'll end up emitting
an fcvt.wu and an fcvt.lu.

This can happen if the original fp_to_uint before type legalization
has one user that causes a sext_inreg to be emitted and one that
doesn't.
2021-07-19 22:58:42 -07:00
Craig Topper
50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Craig Topper
00c1cc867f [RISCV] Add more i32 srem/sdiv with power of 2 constant tests. NFC
Add a small power 2 srem test to match existing sdiv test. Add
larger power of 2 test to both.

The larger constant test shows materialization of a constant
for an AND in the RV64 code. We should be using W shift instructions
to match the RV32 code.
2021-07-18 00:21:14 -07:00
Craig Topper
d0f8047d37 [RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8. 2021-07-17 11:24:20 -07:00
ShihPo Hung
be8159bfa5 [RISCV][RVV] Precommit a test case for D105684
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105685
2021-07-18 00:43:17 +08:00
Craig Topper
173332d175 [RISCV] Manually emit the best shift for VSCALE lowering to improve codegen.
We assume VLENB is a multiple of 8 and previously relied on shift
pairs being optimized to an AND+SHL/SHR and computeKnownBits
removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd
to have multiple uses. This patch manually emits the best shift
to workaround this.
2021-07-17 00:52:07 -07:00
Craig Topper
2e65ec1010 [RISCV] Rename the fixed vector vwmacc tests to have the 'm' in their filenames. NFC 2021-07-16 10:43:17 -07:00
Craig Topper
8f0343cc9c [RISCV] Use tail agnostic policy for fixed vector vwmacc(u).
This adds new pseudoinstructions with ForceTailAgnostic set. This
matches what we did for non-widening VMACC. We should move to a
tail policy operand on the pseudos when we expand the intrinsic
interface to include the tail policy.
2021-07-16 10:41:09 -07:00
Craig Topper
4dbb788068 [RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions.
If the upper 32 bits are zero and bit 31 is set, we might be able to
use zext.w to fill in the zeros after using an lui and/or addi.

Most of this patch is plumbing the subtarget features into the constant
materialization.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105509
2021-07-16 09:35:56 -07:00
Fraser Cormack
e3fa2b1eab Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID"
This reverts commit a6ca88e908b5befcd9b0f8c8cb40f53095cc17bc.

More caution is required to avoid overflow/underflow. Thanks to the
santizers for catching this.
2021-07-16 15:00:20 +01:00
Fraser Cormack
a6ca88e908 [RISCV] Lower more BUILD_VECTOR sequences to RVV's VID
This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
2021-07-16 10:35:13 +01:00
Fraser Cormack
03a4702c88 [RISCV] Fix the neutral element in vector 'fadd' reductions
Using positive zero as the neutral element in 'fadd' reductions, while
it generates better code, is incorrect. The correct neutral element is
negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0.

There are perhaps more optimal lowerings of negative zero avoiding
constant-pool loads which could be left as future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105902
2021-07-14 10:18:38 +01:00
Craig Topper
1e670dc7d7 [RISCV] Use DIVUW/REMUW/DIVW instructions for i8/i16/i32 udiv/urem/sdiv when LHS is constant.
We don't really have optimizations for division with a constant
LHS. If we don't use a W instruction we end up needing to sign
or zero extend the RHS to use the 64-bit instruction.

I had to sign_extend i32 constants on the LHS instead of using
any_extend which becomes zero_extend. If we don't do this, constants
that were originally negative become harder to materialize. I think
this problem exists for more of our W instruction cases. For example
(i32 (shl -1, X)), but we don't have lit tests. I'll work on that
as a follow up.

I also left a FIXME for enabling W instruction for RHS constants
under -Oz.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105769
2021-07-13 10:33:57 -07:00
Craig Topper
46e8970817 [RISCV] Prevent use of t0(aka x5) as rs1 for jalr instructions.
Some microarchitectures treat rs1=x1/x5 on jalr as a hint to pop
the return-address stack. We should avoid using x5 on jalr
instructions since we aren't using x5 as an alternate link register.

Differential Revision: https://reviews.llvm.org/D105875
2021-07-13 09:46:21 -07:00
Fangrui Song
3d89fb4d13 [RISCV] Support machine constraint "S"
Similar to D46745, "S" represents an absolute symbolic operand, which
can be used to specify the access models, e.g.

  extern int var;
  void *addr_via_asm() {
    void *ret;
    asm("lui %0, %%hi(%1)\naddi %0,%0,%%lo(%1)" : "=r"(ret) : "S"(&var));
    return ret;
  }

'S' is documented in trunk GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101275

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105254
2021-07-13 09:30:09 -07:00
Fraser Cormack
d991b7212b [RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR
Often when lowering vector shuffles, we split the shuffle into two
LHS/RHS shuffles which are then blended together. To do so we split the
original indices into two, indexed into each respective vector. These
two index vectors are then separately lowered as BUILD_VECTORs.

This patch forwards on any undef indices to the BUILD_VECTOR, rather
than having the VECTOR_SHUFFLE lowering decide on an optimal concrete
index. The motiviation for ths change is so that we don't duplicate
optimization logic between the two lowering methods and let BUILD_VECTOR
do what it does best.

Propagating undef in this way allows us, for example, to generate
`vid.v` to produce the LHS indices of commonly-used interleave-type
shuffles. I have designs on further optimizing interleave-type and other
common shuffle patterns in the near future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104789
2021-07-13 10:41:54 +01:00
Eli Friedman
ec1cdee6aa [SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets.
Not really useful on its own, but D105673 depends on it.

Differential Revision: https://reviews.llvm.org/D105840
2021-07-12 14:53:42 -07:00
Craig Topper
f0393deb33 [RISCV] Add tests for suboptimal handling of negative constants for i32 uaddo/usubo on RV64. NFC
We end up zero extending constants when we promote to i64. We
should sign extend instead to allow use of addiw or improve
constant materialization.
2021-07-11 12:38:51 -07:00
Craig Topper
6644a61121 [RISCV] Add tests for suboptimal handling of negative constants on the LHS of i32 shifts/rotates/subtracts on RV64. NFC
The constants end up getting zero extended to i64, but sign extend
would be better for constant materialization. We're using W
instructions so either behavior is correct since the upper bits
aren't read.
2021-07-11 11:54:34 -07:00
Craig Topper
1410aab622 [RISCV] Remove stale FIXME from a test. NFC
sext has been used for sltu/sltiu since e0e62e97.
2021-07-11 10:25:07 -07:00
Craig Topper
99b8c46828 [RISCV] Restore non-constant srem test I accidentally deleted. NFC 2021-07-10 18:02:13 -07:00
Craig Topper
86109fa9e8 [RISCV] Add test cases for div/rem with constant left hand side. NFC
Some of these would produce better code if we used W instructions,
but constant LHS currently prevents that.
2021-07-10 17:22:40 -07:00
Ben Shi
ed102ce20a [RISCV][test] Add new tests for mul optimization in the zba extension with SH*ADD
This patch will show the following optimization by future patches.

(mul x imm) -> (SH1ADD x, (SLLI x, bits)) when imm = 2^n + 2.
(mul x imm) -> (SH2ADD x, (SLLI x, bits)) when imm = 2^n + 4.
(mul x imm) -> (SH3ADD x, (SLLI x, bits)) when imm = 2^n + 8.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105614
2021-07-09 09:48:23 +08:00
Craig Topper
12d51f95fe [RISCV] Implement lround*/llround*/lrint*/llrint* with fcvt instruction with -fno-math-errno
These are fp->int conversions using either RMM or dynamic rounding modes.

The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.

The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.

gcc also does this optimization.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D105206
2021-07-06 11:43:22 -07:00
Craig Topper
2b5e53111a [RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors.
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.

Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.

There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D104802
2021-07-06 10:24:31 -07:00
Matt Arsenault
fae05692a3 CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).

Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.

This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
2021-06-30 16:54:13 -04:00
Craig Topper
3b6dfa381e [RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount.
It seems it is possible for DAG combine to create a shl with an
i64 result type and an i32 shift amount. This is ok before type
legalization since the type don't need to match in SelectionDAG.
This results in type legalization calling LowerOperation to
legalize just the amount. We weren't expecting this so we
asserted for not finding a fixed vector shift.

To fix this, I've added a check for the fixed vector case and
returned SDValue() to get the default type legalizer. I've
factored all shifts together and added a fixed vector specific
handler to avoid repeating similar code for each in
LowerOperation.

The particular case I found was exposed by D104581, but the bad
shift is created after that patch triggers.
2021-06-29 09:45:13 -07:00
Craig Topper
4c92e31dd0 [RISCV] Add tests for __builtin_parity idiom.
We use (and (ctpop X), 1) to represent parity.

The generated code for i32 parity on RV64 has more instructions than
necessary which I hope to improve in a followup patch.

Also add missing test for i64 ctpop.
2021-06-27 12:37:29 -07:00
Craig Topper
010f0f000f Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions."
I thought this might help with another optimization I was
thinking about, but I don't think it will. So it just wastes
compile time calling computeKnownBits for no benefit.

This reverts commit 81b2f95971edd47a0057ac4a77b674d7ea620c01.
2021-06-27 10:33:43 -07:00
Craig Topper
81b2f95971 [RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions. 2021-06-26 11:57:26 -07:00
Craig Topper
d4f4a1ba62 [RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend.
If type legalization is going to insert a sign_extend for other users
of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is
better to replace the ANY_EXTEND so we don't end up with a separate
ADD/MUL/SUB instruction for the users of the ANY_EXTEND.

I'm only handling setcc uses right now, but there are other
instructions that force sign_extends like ashr.

There are probably other *W instructions we could use in addition
to ADDW/SUBW/MULW.

My motivating case was a loop terminating compare and a phi use
as seen in the new test file.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D104581
2021-06-25 23:16:37 -07:00
Fraser Cormack
ab1bd25593 [RISCV] Permit larger RVV stacks and stack offsets
This patch teaches the compiler to generate code to handle larger RVV
stack sizes and stack offsets which resolve an amount larger than 2047
vector registers in size.

The previous behaviour was asserting on such large values as it was only
able to materialize the constant by feeding it to the 12-bit immediate
of an `ADDI` instruction. The compiler can now materialize this amount
into a temporary register before continuing with the computation.

A test case for this scenario is included which also checks that the
temporary register used to materialize the amount doesn't require an
additional spill slot over what we're already reserving for RVV code.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D104727
2021-06-25 07:17:33 +01:00
Fraser Cormack
a4729f7f88 [RISCV] Lower RVV vector SELECTs to VSELECTs
This patch optimizes the code generation of vector-type SELECTs (LLVM
select instructions with scalar conditions) by custom-lowering to
VSELECTs (LLVM select instructions with vector conditions) by splatting
the condition to a vector. This avoids the default expansion path which
would either introduce control flow or fully scalarize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104772
2021-06-24 10:12:51 +01:00
Craig Topper
91319534ba [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt.
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D104612
2021-06-23 15:38:11 -07:00