Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).
This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.
I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.
Hopefully this will help some of the regression issues in #143102 etc.
Make the fixed-vector lowering of ISD::[L]LRINT use the custom-lowering
routine, lowerVectorXRINT, and fix issues in lowerVectorXRINT related to
this new functionality.
Based on the comments and tests, we only want to call
EmitLoweredCascadedSelect on selects of FP registers.
Everytime we add a new branch with immediate opcode, we've been
excluding it here.
This patch switches to checking that the comparison operands are both
registers so branch on immediate is automatically excluded.
This wasn't scalable and made the RISCVCC enum effectively just
a different way of spelling the branch opcodes.
This patch reduces RISCVCC back down to 6 enum values. The primary user
is select pseudoinstructions which now share the same encoding across
all
vendor extensions. The select opcode and condition code are used to
determine the branch opcode when expanding the pseudo.
The Cond SmallVector returned by analyzeBranch now returns the opcode
instead of the RISCVCC. reverseBranchCondition now works directly on
opcodes. getOppositeBranchCondition is also retained.
Stacked on #145622
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.
This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.
The Subtarget function is just deprecated for the moment.
With proper co-author.
Original message:
We need to pass the operand of LLA to GetSupportedConstantPool.
This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.
Co-authored-by: Carl Nettelblad carl.nettelblad@rapidity-space.com
We need to pass the operand of LLA to GetSupportedConstantPool.
This replaces #142292 with test from there added as a pre-commit
for both medlow and pic.
I happened to notice that when legalizing get.active.lane.mask with
large vectors we were materializing via constant pool instead of just
shifting by a constant.
We should probably be doing a full cost comparison for the different
lowering strategies as opposed to our current adhoc heuristics, but the
few cases this regresses seem pretty minor. (Given the reduction in vset
toggles, they might not be regressions at all.)
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
We can convert non-power-of-2 types into extended value types
and then they will be widen.
Reviewers: lukel97
Reviewed By: lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/114971
Put one copy on RISCVTargetLowering as a static function so that both
locations can use it, and rename the method to getM1VT for slightly
improved readability.
See #143580 for MR with the test commit.
Performs the following transformations:
(select c, c1, t) -> (add (czero_nez t - c1, c), c1)
(select c, t, c1) -> (add (czero_eqz t - c1, c), c1)
@mgudim
This patch adds the support of generating vector instructions for
`memcmp`. This implementation is inspired by X86's.
We convert integer comparisons (eq/ne only) into vector comparisons
and do a vector reduction and to get the result.
The range of supported load sizes is (XLEN, VLEN * LMUL8] and
non-power-of-2 types are not supported.
Fixes#143294.
Reviewers: lukel97, asb, preames, topperc, dtcxzyw
Reviewed By: topperc, lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/114517
This involves a codegen regression at the moment due to the issue
described in 443cdd0b, but this aligns the lowering paths for this case
and makes it less likely future bugs go undetected.
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.
For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).
This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
Trampoline will use a alternative sequence when branch CFI is on.
The stack of the test is organized as follow
```
56 $ra
44 $a0 f
36 $a1 p
32 00038067 jalr t2
28 010e3e03 ld t3, 16(t3)
24 018e3383 ld t2, 24(t3)
20 00000e17 auipc t3, 0
sp+16 00000023 lpad 0
```
As done for the existing vnsrl cases, we can split a two source
deinterleave2 into two single source deinterleave2 and a slideup.
We can also use a concat-then-deinterleave2 tactic. Both are equally
valid (except in the m8 source type case), and the
concat-then-deinterleave2 saves one instruction for fractional LMUL cases.
Additionally, if we happen to know the exact VLEN and our fixed vectors
are an even number of vector registers, we can avoid the need to split or
concat entirely and just use both registers sources.
ISD::SELECT is legal by default, so this change to the conditional makes
it clearer that XTHeadCondMov and XMipsCMove both leave this operation
legal rather than custom expanding it.
If we have xrivosvizip, we can use the vunzip2{a,b} instructions for
these cases *provided* that we can prove the layout in the two registers
matches the fixed length semantics.
The majority of this patch is a straight-forward port of the existing
vnsrl logic which has the same requirement (though for slightly
different reasoning).
The one complicated bit is the addition of the scalable splitting logic
inside lowerVZIP to exploit the independent register operands, and allow
the use of lower LMUL. This bit is annoyingly complicated, and really
"should" be a DAG combine - except that the VL and mask reduction
becomes hard when it's not known to be a constant.
I'd missed a bitcast in the lowering. Unfortunately, that bitcast
happens to be semantically required here as the partial_reduce_* source
expects an i8 element type, but the pseudos and patterns expect an i32
element type.
This appears to only influence the .vx matching from the cases I've
found so far, and LV does not yet generate anything which will exercise
this. The reduce path (instead of the partial.reduce one) used by SLP
currently manually constructs the i32 value, and then goes directly to
the pseudo's with their i32 arguments, not the partial_reduce nodes.
We're basically loosing the .vx matching on this path until we teach
splat matching to be able to manually splat the i8 value into an i32 via
LUI/ADDI.
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.
If we're only reversing a single byte, we can use BREV8 directly.
If we let it type legalize we'll get (srl (bitreverse X), XLen-8). In op
legalization, we'll expand that to (srl (brev8 (bswap X)), XLen - 8).
Then, SimplifyDemandedBits can reduce it to (srl (brev8 (shl X, XLen -
8)), XLen - 8). We could add a DAGCombine to pull the shl through the
brev8 to put it next to the srl which will allow it to become (and
(brev8 X), 255). Unless we can prove the upper XLen-8 bits are 0 or that
they aren't demanded, we can't remove the `and`.
By emitting BREV8 directly when we still know the type is i8, we can
avoid this. We already DAGCombine i16 and i32 (bitreverse (bswap X)) to
BREV8 early for the same reason.
I've added an i7 test case so we can still see the opportunity for
improvement on weird sizes.
Fixes the RISC-V part of #141863.
This change adds new parameters to the method
`shouldFoldSelectWithIdentityConstant()`. The method now takes the
opcode of the select node and the non-identity operand of the select
node. To gain access to the appropriate arguments, the call of
`shouldFoldSelectWithIdentityConstant()` is moved after all other checks
have been performed. Moreover, this change adjusts the precondition of
the fold so that it would work for `SELECT` nodes in addition to
`VSELECT` nodes.
No functional change is intended because all implementations of
`shouldFoldSelectWithIdentityConstant()` are adjusted such that they
restrict the fold to a `VSELECT` node; the same restriction as before.
The rationale of this change is to make more fine grained decisions
possible when to revert the InstCombine canonicalization of
`(select c (binop x y) y)` to `(binop (select c x idc) y)` in the
backends.
This is a follow on to 9b4de7 which handles the fixed vector cases. In
retrospect, this is simple enough if probably should have just been part
of the original commit, but oh well.
RISC-V does not use address spaces and leaves them available for user
code to make use of. Intrinsics, however, required pointer types to use
the default address space, complicating handling during lowering to
handle non-default address spaces. When the intrinsics are overloaded,
this is handled without extra effort.
This commit does not yet update Clang builtin functions to also permit
pointers to non-default address spaces.