The elements that aren't sNans need to get passed through this fadd
instruction unchanged. With the agnostic mask policy they might be
forced to all ones.
I think the behaviors are the same if this describes their behavior.
AVGFLOORS sign extends the inputs by 1 bit, adds them, then does an
arithmetic shift right by 1 before truncating to the original bit width.
This is vaadd with rdn rounding mode.
AVGCEILS sign extends the inputs by 1 bit, adds them, then does an
arithmetic shift right by 1. If the bit shifted out is 1, it adds 1 to
the shifted value. Then truncates to the original bit width. This is vaadd
with rnu rounding mode.
I think this wasn't implemented previously because there was some
confusion about what average means. Some may expect average to round
towards zero, but there is no way to do that in RISC-V or with the
SelectionDAG nodes. Related issue
https://github.com/riscv/riscv-v-spec/issues/935
This patch fixes:
llvm/lib/Target/RISCV/RISCVISelLowering.cpp:19848:11: error:
enumeration value 'SW_GUARDED_BRIND' not handled in switch
[-Werror,-Wswitch]
Doing so avoids negative interactions with other combines which don't
know the shl_add is a single instruction. From the commit log, we've had
several combine loops already.
This was originally posted as part of #88791, where a bug was pointed
out. That bug was fixed by #89789 which hits the same issue from another
angle. To confirm the fix, I included the reduced test case here.
Android supports per thread stack protectors that are individually
managed and
initialized, which can provide stronger protections than using the
global stack
protector cookie. This patch matches the convention for other
architectures
targeting Android platforms.
The vlseg and vsseg intrinsic functions are not overloaded on pointer
type, so cannot handle non-default address spaces.
This fixes an error we see after #90583.
This moves our last major category tablegen driven multiply strength
reduction into the post legalize combine framework. The one slightly
tricky bit is making sure that we use a leading shl if we can form a
slli.uw, and trailing shl otherwise. Having the trailing shl is critical
for shNadd matching, and folding any following sext.w.
As can be seen in the TD deltas, this allows us to kill off both the
actual multiply patterns and the explicit add (mul X, C) Y patterns. The
later are now handled by the generic shNadd matching code, with the
exception of the THead only C=200 case because we don't (yet) have a
multiply expansion with two shNadd + a shift.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
This is the insert_subvector equivalent to #79949, where we can avoid
sliding up by the full LMUL amount if we know the exact subregister the
subvector will be inserted into.
This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this
in two parts:
- We handle fixed length subvector types by converting the subvector to
a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to
convert the vector being inserted into too.
- Whenever we don't need a vslideup because either the subvector fits
exactly into a vector register group *or* the vector is undef, we need
to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select
doesn't correctly handle fixed length subvectors yet: see d7a28f7ad
A subvector exactly fits into a vector register group if its size is a
known multiple of the size of a vector register, and this adds a new
overload for TypeSize::isKnownMultipleOf for scalable to scalable
comparisons to help reason about this.
I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code
paths are the same.
We should teach it to properly handle fixed length subvectors in a
follow-up patch, so that the "exact subregsiter" logic is handled in one
place instead of being spread across both RISCVISelDAGToDAG.cpp and
RISCVISelLowering.cpp.
We can think of this as two separate combines
(czero_eqz x, (setne y, 0)) -> (czero_eqz x, y)
and
(czero_eqz x, x) -> x
Similary the (czero_nez x, (seteq x, 0)) -> x combine can be broken into
(czero_nez x, (seteq y, 0)) -> (czero_eqz x, y)
and
(czero_eqz x, x) -> x
isel already does the (czero_eqz x, (setne y, 0)) -> (czero_eqz x, y)
and (czero_nez x, (seteq y, 0)) -> (czero_eqz x, y) combines, but doing
them early could expose other opportunities.
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.
Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
In RISC-V ISel, the instruction `czero.eqz a0, a0, a0` is meaningless.
This patch does the following folds in ISel:
```
czero_eqz x, (setcc x, 0, ne) -> x
czero_nez x, (setcc x, 0, eq) -> x
```
---------
Signed-off-by: Zhijin Zeng <zhijin.zeng@spacemit.com>
With the tag merging in place, we can safely change the default for
+seq-cst-trailing-fence to the default, according to the recommendation
in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc
This tag changes the default for the feature flag, and moves to more
consistent naming with respect to existing features.
This is a change I really don't like posting, but I think we're out of
other options. As can be seen in the test differences, we have cases
where adding the freeze inhibits real optimizations.
Given no other target handles the undef semantics correctly here, I
think the practical answer is that we shouldn't either. Yuck.
As examples, consider:
* combineMulSpecial in X86.
* performMulCombine in AArch64
The only other real option I see here is to move all of the strength
reduction code out of ISEL. We could do this either via tablegen rules,
or as an MI pass, but other than shifting the point where we ignore
undef
semantics, I don't this is meaningfully different.
Note that the particular tests included here would be fixed if we added
SHA/SHL to canCreateUndefOrPoison. However, a) that's already been tried
twice and exposes its own set of regressions, and b) these are simply
examples. You can create many alternate examples.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT
is the scalar element VT for strided store and the MemoryVT is the
same as the store value's VT for unit-stride store.
After combining `riscv.masked.strided.store` to `masked.store`, we
just use the scalar element VT to construct `masked.store`, which is
wrong.
With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store`
to truncated `masked.store` because `TLI.canCombineTruncStore` returns
true.
So, we should use the store value's VT as the MemoryVT.
This fixes#89833.
The interesting bit is the zext folding. This is the first case where we
end up with a profitable fold of shNadd (zext x), y to shNadd.uw x, y.
See zext_mul68 from rv64zba.ll.
The test differences are cases where we can legally fold (only because
there's no one use check). These are not profitable or harmful, but we
can't a oneuse check without breaking the zext_mul68 case.
Note that XTHeadBa doesn't appear to have the equivalent patterns so
this only shows up in Zba.
Changes since original commit:
* Rebase over improved test coverage for theadba
* Revert change to use TargetConstant as it appears to prevent the uimm2
clause from matching in the XTheadBa patterns.
* Fix an order of operands bug in the THeadBa pattern visible in the new
test coverage.
Original commit message follows:
This implements a RISCV specific version of the SHL_ADD node proposed in
https://github.com/llvm/llvm-project/pull/88791.
If that lands, the infrastructure from this patch should seamlessly
switch over the to generic DAG node. I'm posting this separately because
I've run out of useful multiply strength reduction work to do without
having a way to represent MUL X, 3/5/9 as a single instruction.
The majority of this change is moving two sets of patterns out of
tablgen and into the post-legalize combine. The major reason for this is
that I have an upcoming change which needs to reuse the expansion logic,
but it also helps common up some code between zba and the THeadBa
variants.
On the test changes, there's a couple major categories:
* We chose a different lowering for mul x, 25. The new lowering involves
one fewer register and the same critical path, so this seems like a win.
* The order of the two multiplies changes in (3,5,9)*(3,5,9) in some
cases. I don't believe this matters.
* I'm removing the one use restriction on the multiply. This restriction
doesn't really make sense to me, and the test changes appear positive.
This reverts commit 5a7c80ca58c628fab80aa4f95bb6d18598c70c80. Noticed failures
with the following command:
$ llc -mtriple=riscv64 -mattr=+m,+xtheadba -verify-machineinstrs < test/CodeGen/RISCV/rv64zba.ll
I think I know the cause and will likely reland with a fix tomorrow.
This implements a RISCV specific version of the SHL_ADD node proposed in
https://github.com/llvm/llvm-project/pull/88791.
If that lands, the infrastructure from this patch should seamlessly
switch over the to generic DAG node. I'm posting this separately because
I've run out of useful multiply strength reduction work to do without
having a way to represent MUL X, 3/5/9 as a single instruction.
The majority of this change is moving two sets of patterns out of
tablgen and into the post-legalize combine. The major reason for this is
that I have an upcoming change which needs to reuse the expansion logic,
but it also helps common up some code between zba and the THeadBa
variants.
On the test changes, there's a couple major categories:
* We chose a different lowering for mul x, 25. The new lowering involves
one fewer register and the same critical path, so this seems like a win.
* The order of the two multiplies changes in (3,5,9)*(3,5,9) in some
cases. I don't believe this matters.
* I'm removing the one use restriction on the multiply. This restriction
doesn't really make sense to me, and the test changes appear positive.
topperc pointed this out in review of
https://github.com/llvm/llvm-project/pull/88791, but I believe the
problem applies
here as well. Worth noting is that the code I introduced with this bug
was mostly copied from other targets - which
also have this bug.
Planning to declare all extensions in tablegen so we can generate the
tables for RISCVISAInfo.cpp. This requires making "e" consistent with
other extensions.
This is largely a revert of commit
e81796671890b59c110f8e41adc7ca26f8484d20.
As #88029 shows, there exists hardware that only supports unaligned
scalar.
I'm leaving how this gets exposed to the clang interface to a future
patch.
This vendor extension has the same shift_add as zba, and most of the same
patterns are duplicated. Enable it here too so the configurations don't
diverge.
With zba, we can expand this to (add (shl X, C1), (shXadd X, X)).
Note that this is our first expansion to a three instruction sequence. I
believe this to general be a reasonable tradeoff for most architectures,
but we may want to (someday) consider a tuning flag here.
I plan to support 2^n + (2/4/8 + 1) eventually as well, but that comes
behind 2^N - 2^M. Both are also three instruction sequences.
---------
Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
The former is better as a zero extend can be folded into the sll,
whereas the later currently produces a seperate zext.w due to bad
interactions with other combines.
Bug fix: Handle RVV return type in calling convention correctly.
Return values are handled in a same way as function arguments.
One thing to mention is that if a type can be broken down into
homogeneous
vector types, e.g. {<vscale x 4 x i32>, {<vscale x 4 x i32>, <vscale x 4
x i32>}},
it is considered as a vector tuple type and need to be handled by tuple
type rule.
Copy the pointer info, flags, alignment, AAInfo, and ranges, but let
getLoad rebuild the MMO using the scalable type used for the the new
load/store. This makes sure the LLT minimum size matches the ContainerVT
minimum size. This is important since vscale_range may have been used to
determine that the fixed vector was the exact size of a scalable vector.
Fixes#88799
For experimental.cttz.elts, we can use a vfirst instruction, but we need
to correct the result if input vector can be 0. cttz.elts returns the
vector length while vfirst returns -1.
This expansion is directly inspired by the analogous code in the x86
backend for LEA. shXadd and (this sub-case of) LEA are largely
equivalent.
This is an alternative to
https://github.com/llvm/llvm-project/pull/87105.
This expansion is also supported via the decomposeMulByConstant
callback, but restricted because of interactions with other combines
since that code runs before legalization. As discussed in the other
review, my original plan had been to support post legalization expansion
through the same interface, but that ended up being more complicated
than seems justified.
Instead, lets go ahead and do the general expansion post-legalize. Other
targets use the combine approach, and matching that structure makes it
easier for us to adapt ideas from other targets to RISCV.