Pass the original MMO instead of different individual values.
getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).
reduceLoadWidth can handle this for scalar loads, but not for vectors.
Noticed while triaging D152928
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.
Fixes#74248#75557.
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.
Fixes#74248.
The combine was implicitly assuming that the index on the outer
insert_subvector meant the same thing when the source was switched to be
the index of the inner insert_subvector. This is not true if the
innermost sub-vector is fixed, and the outer subvector is scalable.
I could do a less restrictive fix here - i.e. allow the case where the
scalability of the subvectors are the same - but there's no test
coverage which shows this transform actually has profit. Given that, go
for the simplest fix.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
DAGCombiner folds (select_cc seteq (and x, y), 0, 0, A) to (and (sra
(shl x)) A) where y has a single bit set. Previously, DAGCombiner relies
on `shouldAvoidTransformToShift` to decide when to do the combine, but
`shouldAvoidTransformToShift` is only about shift cost. This patch
introuduces a specific hook to decide when to do the combine and disable
the combine when Zicond enabled and AndMask <= 1024.
Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes
We currently limit the fold to cases where we're extending from the same source type, but we can safely perform this using the wider of mismatching source types (we're really just interested in having extension bits on both sources), ensuring we don't create additional extensions/truncations.
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.
A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.
I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.
This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.
The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.
If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.
Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.
On x86 we can use this re-ordering to:
1) get better `and` constants for `C0` (zero extended moves or
avoid imm64).
2) covert `srl` to `shl` if `shl` will be implementable with `lea`
or `add` (both of which can be preferable).
Proofs: https://alive2.llvm.org/ce/z/qzGM_w
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152116
The BuildVectorSDNode::isConstantSplat function could depend on
endianness, and it takes a bool argument that can be used to indicate
if big or little endian should be considered when internally casting
from a vector to a scalar. However, that argument is default set to
false (= little endian). And in many situations, even in target
generic code such as DAGCombiner, the endianness isn't specified when
using the function.
The intent with this patch is to highlight that endianness doesn't
matter, depending on the context in which the function is used.
In DAGCombiner the code is slightly refactored. Back in the days when
the code was written it wasn't possible to request a MinSplatBits
size when calling isConstantSplat. Instead the code re-expanded the
found SplatValue to match with the EltBitWidth. Now we can just
provide EltBitWidth as MinSplatBits and remove the logic for doing
the re-expand.
While being at it, tidying up around isConstantSplat, this patch also
adds an explicit check in BuildVectorSDNode::isConstantSplat to break
out from the loop if trying to split an on VecWidth into two halves.
Haven't been able to prove that there could be miscompiles involved
if not doing so. There are lit tests that trigger that scenario,
although I think they happen to later discard the returned SplatValue
for other reasons.
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.
Fixes#68855.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Following on from D135150, this patch fixes another crash caused by this
DAG combine:
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with
(fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem
was that the (fma C, D, E) node got CSEd away. In this new case, the
problem is that the outer fadd node gets CSEd away. To fix it we have
to return SDValue(N, 0) from the combine and be careful not to add a
deleted node to the worklist.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Previously, we where taking `CurVT` before finalizing `ToCast` which
meant potentially returning an `SDValue` with an illegal `ValueType`
for the operation.
Fix is to just take `CurVT` after we have finalized `ToCast` with
`PeekThroughCastsAndTrunc`.
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154868
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.
This implements:
```
(fmul C, (uitofp Pow2))
-> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
-> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.
The patch is intentionally conservative about the transform, only
doing so if we:
1) have IEEE floats
2) C is normal
3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
bounds.
Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154805