Using AArch64's original implementation for reference, this patch
implements a pass to remove unneeded copies of X0. This pass runs
after register allocation and looks to see if a register is implied
to be 0 by a branch in the predecessor basic block.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118160
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as
default name for fp16 and fp32 conversion in LLVM.
However RISC-V GCC using default naming scheme for that, which is
`__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI
incompatible issue.
Although we didn't have formal runtime ABI spec to specify those naming
convention yet, but I think it would be great to fix the incompatible
issue first.
And I've plan to create a runtime ABI spec undere psABI spec this year.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118207
After D86836, we can define multiple cost values for
different cost models. So here we set CostPerUse to
1 iff RVC is enabled to avoid potential impact on RA.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117741
Now AND is used for zero extension when both Zbb and Zbp are not enabled.
It may be better to use shift operation if the trailing ones mask exceeds simm12.
This patch optimzes LUI+ADDI+AND to SLLI+SRLI.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116720
These nodes should saturate to their saturating VT. We can use this
information to know the bits past the VT are all zeros or all sign bits.
I think we might only have test coverage for the unsigned case. I'll
verify and add tests.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D116870
This is similar to what is done for targets that prefer zero extend
where we avoid using a zero extend if the promoted values are sign
extended.
We'll also check for zero extended operands for ugt, ult, uge, and ule when the
target prefers sign extend. This is different than preferring zero extend, where
we only check for sign bits on equality comparisons.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D116421
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.
Differential Revision: https://reviews.llvm.org/D113967
Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by running script
`llvm/utils/update_llc_test_checks.py`.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D112692
This improves our coverage of soft float libcalls lowering.
Remove most of the test cases from rv64i-single-softfloat.ll. They
were duplicated in the test files that now test softflow. Only
a couple test cases for constrained FP remain. Those should be
removed when we start supporting constrained FP.
This is follow up from D113528.
By not using ADDIW we can cause both an ADDIW and ADDI to be emitted
when the add has multiple users.
These instructions needed be added to the list of instructions that
only use the lower 32 bits of input.
I've also added tests for the wu versions, but I'm having trouble
showing bad codegen from it.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.
This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.
We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D107230
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.
To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.
To keep everything consistent I've added new nodes for fptosi as well.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D106346
The pattern we match is (sext_inreg (assertzexti32 (fp_to_uint)), i32). If
the assertzexti32 has an additional user we'll end up emitting
an fcvt.wu and an fcvt.lu.
This can happen if the original fp_to_uint before type legalization
has one user that causes a sext_inreg to be emitted and one that
doesn't.
This helps us select W instructions in more cases. Most of the
affected tests have had the sign_extend_inreg or AND folded into
sextload/zextload.
Differential Revision: https://reviews.llvm.org/D104079
The loads end up becoming sextload/zextload which prevent our
isel patterns from finding the sign_extend_inreg or AND instruction
we need.
The easiest way to fix this is to use computeKnownBits or
ComputeNumSignBits in our isel matching to catch this.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.
Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.
Fixes PR50083
The patterns that use this really want to know if the operand has at
least 32 sign/zero bits.
This increases opportunities to use W instructions when the original
source used i8/i16. Not sure how much this matters for performance,
but it makes i8/i16 code more consistent with i32.
Regenerated using:
./llvm/utils/update_llc_test_checks.py -u llvm/test/CodeGen/RISCV/*.ll
This has added comments to spill-related instructions and added @plt to
some symbols.
Differential Revision: https://reviews.llvm.org/D92841