The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.
DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra
(shl x)) A) where y has a single bit set. Previously, DAGCombiner relies
on `shouldAvoidTransformToShift` to decide when to do the combine, but
`shouldAvoidTransformToShift` is only about shift cost. This patch
introuduces a specific hook to decide when to do the combine and disable
the combine when Zicond enabled and AndMask <= 1024.
The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ only due to the use of addi on rv32 vs addiw on rv64 when the
high bits are don't care.
As an aside, we don't need to worry about the non-zero immediate
restriction on the compressed variants because we're not directly
forming the compressed variants. If we happen to get a zero immediate
for the ADDI, then either a later optimization will strip the useless
instruction or the encoder is responsible for not compressing the
instruction.
This makes Zicond and XVentanaCondOps use the same code path.
The instructions have identical semantics.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D155391
This patch is a step towards altering how we handle the emission of
condops. Marking ISD::SELECT as legal is a major change in the codegen
path, and gives few options for maintaining the old codegen path when
it is believed to be better (e.g. a better branchless sequence is
possible using non-zicond instructions, or the branch-based sequence is
preferable).
This removes the existing SelectionDAG patterns and moves the logic into
lowerSELECT. Along some small codegen changes you'll note a few minor
regressions in the generated code quality - this are due to the fact
that by lowering the SELECT node early we miss out on combines that
would kick in later when setcc condcodes that aren't natively supported
have been expanded (thus exposing opportunities for optimisation by
performing logical negation and swapping truev/falsev). I've opted to
split out work that addresses these into follow-on patches (especially
as zicond is still 'experimental').
matchSetCC is a straight-forward translation from the version in
RISCVISelDAGToDAG. Ideally, in the future it can be converted to a
helper shared between both files.
Differential Revision: https://reviews.llvm.org/D155083
This directly matches the codegen for xventanacondops with vt.maskcn =>
czero.nez and vt.maskc => czero.eqz. An additional difference is that
zicond is available on RV32 in addition to RV64 (xventanacondops is RV64
only).
Differential Revision: https://reviews.llvm.org/D147147
Add patterns with seteq/setne conditions.
We don't have instructions for seteq/setne except for comparing
with zero and need to emit an ADDI or XOR before a seqz/snez to
compare other values.
The select ISD node takes a 0/1 value for the condition, but the
VT_MASKC(N) instructions check all XLen bits for zero or non-zero.
We can use this to avoid the seqz/snez in many cases.
This is pretty ridiculous number of patterns. I wonder if we could
use some ComplexPatterns to merge them, but I'd like to do that as
a follow up and focus on correctness of the result in this patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D140421
Similar for sub, or, and xor. These are all operations that have 0
as a neutral value. This is based on a similar tranform in InstCombine.
This allows us to remove some XVentanaCondOps patterns and
some code from DAGCombine for RISCVISD::SELECT_CC.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D140465
These are test for select (and (x , 0x1) == 0), (z ^ y), y ) and select (and (x , 0x1) == 0), (z | y), y )
These can be made branchless by using ((x-1) & z ) ^ y.
The negate operation is never compressible (as the destination and rs1 register must differ). The two shift versions will be equal size if the input GPR is reused, or smaller if this is the only use of the input.
For clarity, the operation being performed is (select (low-bit-of x), -1, 0).
Differential Revision: https://reviews.llvm.org/D140319
Similar to previous patches for ADDI/ADDIW/SLLI/ADD, but restricted
to only cases where the register is x8-x15(GPRC reg class).
I've restricted it so that we can be precise about whether the
resulting instruction would be compressible. Changing the register
allocation may make some other instruction not compressible so we
should try to be accurate.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D138740
This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes.
Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload.
Differential Revision: https://reviews.llvm.org/D134881