If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.
Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.
This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122933
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.
Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D122185
This patch fixes a (seemingly very rare) crash during vector constant
folding introduced in D113300.
Normally, during legalization, if we create an illegally-typed node during
a failed attempt at constant folding it's cleaned up before being
visited, due to it having no uses.
If, however, an illegally-typed node is created during one round of
legalization and isn't cleaned up, it's possible for a second round of
legalization to create new illegally-typed nodes which add extra uses to
the old illegal nodes. This means that we can end up visiting the old
nodes before they're known to be dead, at which point we crash.
I'm not happy about this fix. Creating illegal types at all seems like a
bad idea, but we all-too-often rely on illegal constants being
successfully folded and being fixed up afterwards. However, we can't
rely on constant folding actually happening, and we don't have a
foolproof way of peering into the future.
Perhaps the correct fix is to revisit the node-iteration order during
legalization, ensuring we visit all uses of nodes before the nodes
themselves. Or alternatively we could try and clean up dead nodes
immediately after failing constant folding.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122382
masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could
check maskedoff value to decide mask policy rather than have a addtional
policy operand.
Reviewed By: craig.topper, arcbbb
Differential Revision: https://reviews.llvm.org/D122456
This reverts commit 10fd2822b77e12215b4ea82fc6d0a052961eb9d9.
I have a better implementation for those operations without the
additional policy operand.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could
assume undef maskedoff is mask agnostic.
Differential Revision: https://reviews.llvm.org/D122455
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D122186
The mask being NoRegister prevented the existing aliases from matching
since NoRegister isn't in the VMV0 register class.
To workaround this I've added new aliases that look for zero_reg.
I had to motify tablegen to generate matching code for zero_reg.
And as a consequence, I had to change the EmitPriority for an ARM
alias that used zero_reg that started printing.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D121496
intrinsics.
Those operations are updated under a tail agnostic policy, but they
could have mask agnostic or undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D120228
Add the UsesMaskPolicy flag to indicate the operations result
would be effected by the mask policy. (ex. mask operations).
It means RISCVInsertVSETVLI should decide the mask policy according
by mask policy operand or passthru operand.
If UsesMaskPolicy is false (ex. unmasked, store, and reduction operations),
the mask policy could be either mask undisturbed or agnostic.
Currently, RISCVInsertVSETVLI sets UsesMaskPolicy operations default to
MA, otherwise to MU to keep the current mask policy would not be changed
for unmasked operations.
Add masked-tama, masked-tamu, masked-tuma and masked-tumu test cases.
I didn't add all operations because most of implementations are using
the same pseudo multiclass. Some tests maybe be duplicated in different
tests. (ex. masked vmacc with tumu shows in vmacc-rv32.ll and masked-tumu)
I think having different tests only for policy would make the testing
clear.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120226
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.
Differential Revision: https://reviews.llvm.org/D122186
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.
In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.
Draft spec here https://github.com/riscv/riscv-v-spec/pull/780
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121345
The InstAlias framework cannot match registers against zero_reg, which
RVV uses to encode unmasked operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D92228
We already do this for RISCVISD::VFMV_S_F_VL and the vfmv.v.f
intrinsic.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121429
This helps us form vfnmsub, vfnmadd, and vfmusb from masked VP
intrinsics.
I've used "srcvalue" for the mask parameter in the fneg nodes. We
can't match "V0" because that doesn't ensure the mask the is the same.
Instead it matches two different nodes and generates two copies to
V0 of those separate values.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D120287
vmsgeu.vi with 0 is always true, but in the masked with mask undisturbed
policy, we still need to keep inactive elelemt which come from maskedoff.
We could return mask directly if it's mask agnostic policy in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121080
We should not emit a tail agnostic vlse for a tail undisturbed vmv.s.x
In D119688:
- if (IsScalarMove && !Node->getOperand(0).isUndef())
+ bool HasPassthruOperand = Node->getOpcode() != ISD::SPLAT_VECTOR;
+ if (HasPassthruOperand && !IsScalarMove &&
!Node->getOperand(0).isUndef())
break;
The IsScalarMove check in the if statement had been changed.
Differential Revision: https://reviews.llvm.org/D120963
This lowers VECTOR_SPLICE of scalable vectors to a slidedown follow by a slideup.
Fixed vectors are encouraged to use shufflevector instruction. The equivalent patch
for fixed vectors is D119039.
I've used a tail agnostic slidedown and limited the VL to only the
elements that will not be overwritten by the slideup. The slideup
uses VLMax for its VL. It unfortunately uses tail undisturbed policy
but it isn't required as there is no tail. We just need the merge
operand to carry the bits for the lower portion of the result.
Care was taken to ensure that either the slideup or slidedown will
be able to use a .vi instruction when the immediate is small. Which
one uses the immediate depends on the sign of the immediate.
Reviewed By: frasercrmck, ABataev
Differential Revision: https://reviews.llvm.org/D119303
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120302
Clang computes the default ABI if -mabi is empty
and encode it in LLVM IR module flag since D105555.
For correctness, llc need to give the same target-abi
(Options.MCOptions.ABIName) with ABI encoded in IR.
The getSubtargetImpl already has a check for them only if
Options.MCOptions.ABIName is not empty.
In order to get more robustness we could have a check for
explicit ABI, but now we have two different logic to
compute the default ABI.
The front-end ABI is defautl to the ilp32/ilp32e/lp64, and
ilp32d/lp64d when hardware support for extension D.
The backend ABI is default to the ilp32/ilp32e/lp64.
Reviewed by: asb, jrtc27
Differential Revision: https://reviews.llvm.org/D118333
Trying to reduce the diffs from D118333 for cases where it makes
more sense to use an FP ABI.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D120447
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
The nomask vector Multiply-Add need a policy operand
because merge value could not be undef.
Reviewed By: monkchiang
Differential Revision: https://reviews.llvm.org/D119727
This generalizes isElementRotate to work when there's only a single
slide needed. I've removed matchShuffleAsSlideDown which is now
redundant.
Reviewed By: frasercrmck, khchen
Differential Revision: https://reviews.llvm.org/D119759