Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.
Fixes#178403
Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/180635
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.
Fixes#178403
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/180635
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.
Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)
Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/174205
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.
Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/174205
Before casting the value to FP type, need to check, if the type for
reduced during minbitwidth analysis and need to restore the original
source type to generate correct bitcast operation.
Fixes#178884
If the reduction forms reversed bitcast, we can represent it as
a bitcast + bswap, if the source elements are byte sized
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/178513
Added support for reorder reduction of shl(zext)-like construct. Such
constructs are modelled currently as shuffle + bitcast.
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/178292
Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/177855
Recommit after revert in d733771113339608aff6002d1fa89aaf4a51c502, which
was related to a crash in SelectionDAG
…The cose modeling logic was attempting to set a bit in APInt for an
out-of-bounds index, causing an assertion failure. This patch ignores
OOB indices as they produce poison- which is already handled.
Fixes#176780
this is the same test result which produces this bug
<img width="1600" height="964" alt="image"
src="https://github.com/user-attachments/assets/80593902-9d15-4e18-850b-a558bca8518e"
/>
Fixes#176208. Scaled back version of #176515 that only affects the RISCV backend.
Only modifies the cost for cases when DIV is a legal operation.
Updates the cost for both Scalar and Vector types.
Used `TTI::TCC_Expensive` as suggested by
https://github.com/llvm/llvm-project/issues/176208#issuecomment-3760902537.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
Gathered loads forming DAG instead of trees in SLP vectorizer. When
doing the throttling analysis for such graphs, need to consider partially
matched gathered loads DAG nodes and consider extract and/or gather
operations and their costs.
The patch adds this analysis and allows cutting off the expensive
sub-graphs with gathered loads.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/177855
If multiple nodes are generated from same PHI node for the same block,
still need to vectorize vector nodes, even if the value for the incoming block was already emitted.
Fixes#177124
If the copyables have parents, used in PHI nodes, this causes complex
schedulable/non-schedulable dependecies, which require complex
processing, but with small profitability. Cut such case early for now to
prevent compiler crashes and compile time blow up.
Fixes#176658
If the the node to throttle is a vector node, which is used in split
node, and at least one scalar of such a node is used in many split
nodes, such vector node should be throttled. otherise there might be
wrong def-use chain, which crashes the compiler.
Fixes#175967
If the user instruction is used several times in the node, and in one
cases its operand is copyable, but in another is not, need to check all
operands to be sure we do not miss scheduling
If the user instruction is used several times in the node, and in one
cases its operand is copyable, but in another is not, need to check all
operands to be sure we do not miss scheduling
strided-stores-vectorized.ll crashes for RV32 without fixing the
relevant logic in vectorizeTree, because the argument can't be
represented as a 32-bit unsigned value:
```
llvm::APInt::APInt(unsigned int, uint64_t, bool, bool): Assertion `llvm::isUIntN(BitWidth, val) && "Value is not an N-bit unsigned value"' failed.
```
It is intended to be signed, so we simply use ConstantInt::getSigned
instead. This fixes other stride-related instances in the file as well.
For further context, this change is part of unblocking rv32gcv
llvm-test-suite in CI.
The compiler should not generate subvectors with the same extractelement
instructions, it may cause a crash and leads to inefficient
vectorization.
Fixes#174773
If the node is non-scedulable, all instructions are used outside only
and parent is non-schedulable non-phi node, the dependency count should be
increased for such nodes
Fixes#174599
Suppose we are given pointers of the form: `%b + x * %s + y * %c_i`
where `%c_i`s are constants and %s is a run-time fixed value.
If the pointers can be rearranged as follows:
```
%b + 0 * %s + 0
%b + 0 * %s + 1
%b + 0 * %s + 2
...
%b + 0 * %s + w
%b + 1 * %s + 0
%b + 1 * %s + 1
%b + 1 * %s + 2
...
%b + 1 * %s + w
...
```
It means that the memory can be accessed with a strided loads of width `w`
and stride `%s`.
This is motivated by x264 benchmark.
Currently reductions can handles only same/alternate instructions,
skipping potential support for copyables. Patch adds support for
copyables in the reduced values.
Recommit after revert in 1febc3f088ef444af378c0a90aaba2195c30472b
Currently reductions can handles only same/alternate instructions,
skipping potential support for copyables. Patch adds support for
copyables in the reduced values.
If any user of the extractelement instruction is part of the node to be
deleted/gathered, such extractelements instructions should not be
considered for deletion.
Fixes#174020