When optimizing vmv.s.x/vmv.v.x's of scalar loads, if VL is known to be
1 then we don't need to perform a stride of x0, and can just do a
regular load.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147609
As part of D145584, we noticed that llvm was generating suboptimal code
for constraint m when the operand can be be lowered to reg+imm form: it
was being selected as a single register rather than register+imm. This
caused an unnecessary 'addi' to be gen for each m constraint.
This patch changes llvm to select register+imm. This might generate code
that cannot be assembled, but matches gcc's behavior.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D146245
We custom isel for ConstantFP that has higher priority than isel
patterns. We were previously detecting valid FP constants for fli
to early exit from the custom code. This detection called
getLoadFPImm. Then we would run the isel patterns which would call
getLoadFPImm a second time.
With a little bit more code we can directly select the fli instruction
in the custom handler and avoid a second call.
Remove the incorrect mayRaiseFPException flag from the FLI instructions.
Reviewed By: joshua-arch1
Differential Revision: https://reviews.llvm.org/D146093
-Return false from RISCVDAGToDAGISel::selectFPImm for fli
constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145766
fli.h requires Zfh or Zvfh. We need to check for this in
isFPImmLegal. Zvfh support will come in another patch.
I had to split the test file because there are other issues with
Zfhmin and some intrinsics.
We can form a MU TU operation and remove the merge if they use the
same merge value.
My primary interest was a case involving VP intrinsics from our downstream,
but it requires another optimization that isn't in upstream yet. So I've used
RVV intrinsics to get the desired instructions.
Co-authored-by: Nitin John Raj <nitin.raj@sifive.com>
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D145272
To do this we need to remove the always matching behavior from condop.
This requires us to add more 'select' isel patterns with a bare GPR
as the condition.
Rename condop/invcondop to riscv_setne/riscv_seteq.
This centralizes the ADDI/XORI/XOR tricks into one place.
XVentanaCondOps check the condition operand for zero or non-zero.
We use this to optimize seteq/setne that would otherwise becomes
xor/xori/addi+snez/seqz. These patterns avoid the snez/seqz.
This patch adds two ComplexPatterns to match the varous cases and
emit the xor/xori/addi instruction.
These patterns can also be used by D144681.
Reviewed By: philipp.tomsich
Differential Revision: https://reviews.llvm.org/D144700
Instead of special casing Zfa in the custom inserters, select the
correct instructions during isel.
BuildPairF64 we can do with pattern, but SplitF64 requires custom
selection due to the two destinations.
If we didn't need SplitF64 without Zfa, I would have an extract low
and extract high ISD opcode for Zfa to avoid that issue.
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.
This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144229
The vsetvli insertion pass can replace it with MU if needed by
a using instruction. The vsetvli insertion pass will not convert
MU to MA so we need to start at MA.
Reviewed By: eopXD
Differential Revision: https://reviews.llvm.org/D143790
After D142953, non-zero cases were handled in RISCVDAGToDAGISel::Select
and zeros were handled with isel patterns. The zeros cases are
sufficiently similar to zero that we might as well handle them all
together. We already needed to detect the cases to skip out to
tablegen.
Delete the opt intrinsics since they are now identical.
I left the side effects due to user expectations about how these
interact with things like inline assembly or function calls. Or
that they wouldn't be hoisted. I think we should look at other
ways to address thoughs.
If I could, I'd rename them these somehow to distance them from
the vsetvli instruction. In some sense they only query the VL for
a particular SEW and LMUL. They don't guarantee a vsetvli
instruction will be emitted.
Fixes https://github.com/llvm/llvm-project/issues/59359
Reviewed By: rogfer01, kito-cheng
Differential Revision: https://reviews.llvm.org/D143220
We try to shift both X and C left by 32 to replace the zext.w
with a SLLI and use mulhu. If C is already a simm32, this likely
makes a constant that is more expensive to materialize.
If the shift amount is (sub C, X) where C is -1 modulo the size of
the shift, we can replace the sub with a NOT.
We could also use XORI X, size-1, but NOT would work better with
c.not from the future Zce extension.
There is no compressed form of ORI but there is a compressed form
for ADDI.
This also works for XORI since DAGCombine will turn Xor with disjoint
bits in Or.
Note: The compressed forms require a simm6 immediate, but I'm doing
this for the full simm12 range.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D140674
We can recursively look through SRLI if the shift amount is less
than the demanded bits. We can reduce the demanded bit count by
the shift amount and check the users of the SRLI.
Follow up to the series:
1. https://reviews.llvm.org/D140161
2. https://reviews.llvm.org/D140349
3. https://reviews.llvm.org/D140331
4. https://reviews.llvm.org/D140323
Completes the work from the previous two for remaining targets.
This creates the following named passes that can be run via
`llc -{start|stop}-{before|after}`:
- arc-isel
- arm-isel
- avr-isel
- bpf-isel
- csky-isel
- hexagon-isel
- lanai-isel
- loongarch-isel
- m68k-isel
- msp430-isel
- mips-isel
- nvptx-isel
- ppc-codegen
- riscv-isel
- sparc-isel
- systemz-isel
- ve-isel
- wasm-isel
- xcore-isel
A nice way to write tests for SelectionDAGISel might be to use a RUN:
line like:
llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o -
Fixes: https://github.com/llvm/llvm-project/issues/59538
Reviewed By: asb, zixuan-wu
Differential Revision: https://reviews.llvm.org/D140364
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.
This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".
I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.
This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.
Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.
Step 1 to fixing PR59538.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140161
We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We should limit the recursion to SelectionDAG::MaxRecursionDepth levels.
We need to add a Depth argument, all existing callers should pass 0 to the Depth. The new recursive calls should increment it by 1. At the top of the function we should give up and return false if Depth >= SelectionDAG::MaxRecursionDepth.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139462