-Return false from RISCVDAGToDAGISel::selectFPImm for fli
constants so we don't try to use integer expansion.
-Support fli.h with Zvfh+Zfhmin.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D145766
fli.h requires Zfh or Zvfh. We need to check for this in
isFPImmLegal. Zvfh support will come in another patch.
I had to split the test file because there are other issues with
Zfhmin and some intrinsics.
We can form a MU TU operation and remove the merge if they use the
same merge value.
My primary interest was a case involving VP intrinsics from our downstream,
but it requires another optimization that isn't in upstream yet. So I've used
RVV intrinsics to get the desired instructions.
Co-authored-by: Nitin John Raj <nitin.raj@sifive.com>
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D145272
To do this we need to remove the always matching behavior from condop.
This requires us to add more 'select' isel patterns with a bare GPR
as the condition.
Rename condop/invcondop to riscv_setne/riscv_seteq.
This centralizes the ADDI/XORI/XOR tricks into one place.
XVentanaCondOps check the condition operand for zero or non-zero.
We use this to optimize seteq/setne that would otherwise becomes
xor/xori/addi+snez/seqz. These patterns avoid the snez/seqz.
This patch adds two ComplexPatterns to match the varous cases and
emit the xor/xori/addi instruction.
These patterns can also be used by D144681.
Reviewed By: philipp.tomsich
Differential Revision: https://reviews.llvm.org/D144700
Instead of special casing Zfa in the custom inserters, select the
correct instructions during isel.
BuildPairF64 we can do with pattern, but SplitF64 requires custom
selection due to the two destinations.
If we didn't need SplitF64 without Zfa, I would have an extract low
and extract high ISD opcode for Zfa to avoid that issue.
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.
This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144229
The vsetvli insertion pass can replace it with MU if needed by
a using instruction. The vsetvli insertion pass will not convert
MU to MA so we need to start at MA.
Reviewed By: eopXD
Differential Revision: https://reviews.llvm.org/D143790
After D142953, non-zero cases were handled in RISCVDAGToDAGISel::Select
and zeros were handled with isel patterns. The zeros cases are
sufficiently similar to zero that we might as well handle them all
together. We already needed to detect the cases to skip out to
tablegen.
Delete the opt intrinsics since they are now identical.
I left the side effects due to user expectations about how these
interact with things like inline assembly or function calls. Or
that they wouldn't be hoisted. I think we should look at other
ways to address thoughs.
If I could, I'd rename them these somehow to distance them from
the vsetvli instruction. In some sense they only query the VL for
a particular SEW and LMUL. They don't guarantee a vsetvli
instruction will be emitted.
Fixes https://github.com/llvm/llvm-project/issues/59359
Reviewed By: rogfer01, kito-cheng
Differential Revision: https://reviews.llvm.org/D143220
We try to shift both X and C left by 32 to replace the zext.w
with a SLLI and use mulhu. If C is already a simm32, this likely
makes a constant that is more expensive to materialize.
If the shift amount is (sub C, X) where C is -1 modulo the size of
the shift, we can replace the sub with a NOT.
We could also use XORI X, size-1, but NOT would work better with
c.not from the future Zce extension.
There is no compressed form of ORI but there is a compressed form
for ADDI.
This also works for XORI since DAGCombine will turn Xor with disjoint
bits in Or.
Note: The compressed forms require a simm6 immediate, but I'm doing
this for the full simm12 range.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D140674
We can recursively look through SRLI if the shift amount is less
than the demanded bits. We can reduce the demanded bit count by
the shift amount and check the users of the SRLI.
Follow up to the series:
1. https://reviews.llvm.org/D140161
2. https://reviews.llvm.org/D140349
3. https://reviews.llvm.org/D140331
4. https://reviews.llvm.org/D140323
Completes the work from the previous two for remaining targets.
This creates the following named passes that can be run via
`llc -{start|stop}-{before|after}`:
- arc-isel
- arm-isel
- avr-isel
- bpf-isel
- csky-isel
- hexagon-isel
- lanai-isel
- loongarch-isel
- m68k-isel
- msp430-isel
- mips-isel
- nvptx-isel
- ppc-codegen
- riscv-isel
- sparc-isel
- systemz-isel
- ve-isel
- wasm-isel
- xcore-isel
A nice way to write tests for SelectionDAGISel might be to use a RUN:
line like:
llc -mtriple=<triple> -start-before=<arch>-isel -stop-after=finalize-isel -o -
Fixes: https://github.com/llvm/llvm-project/issues/59538
Reviewed By: asb, zixuan-wu
Differential Revision: https://reviews.llvm.org/D140364
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.
This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".
I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.
This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.
Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.
Step 1 to fixing PR59538.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140161
We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We should limit the recursion to SelectionDAG::MaxRecursionDepth levels.
We need to add a Depth argument, all existing callers should pass 0 to the Depth. The new recursive calls should increment it by 1. At the top of the function we should give up and return false if Depth >= SelectionDAG::MaxRecursionDepth.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139462
This matches what we get for something like.
%0 = shl i32 %x, C
%1 = zext i32 %0 to i64
%2 = getelementptr i32, ptr %y, %1
The shift before the zext and the shift implied by the GEP get
combined with an AND after them. We need to split it back into
2 shifts so we can fold one into shXadd.uw.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137886
For vector strided instructions, as the RVV spec says:
> When rs2=x0, then an implementation is allowed, but not required, to
> perform fewer memory operations than the number of active elements, and
> may perform different numbers of memory operations across different
> dynamic executions of the same static instruction.
So compiler shouldn't assume that fewer memory operations will be
performed when rs2=x0.
We add a target feature to specify whether u-arch supports optimized
zero-stride vector load. And we do vector splat optimization iff this
feature is supported.
This feature is enabled by default since most designs implement this
optimization.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137699