Split out from https://github.com/llvm/llvm-project/pull/150248:
Specify that the argument of lifetime.start/lifetime.end is ignored and
will be removed in the future.
Remove lifetime size handling from SDAG. The size was previously
discarded during isel, so was always ignored for stack coloring anyway.
Where necessary, obtain the size of the full frame index.
getNode updates flags correctly for CSE. Calling setFlags after getNode
may set the flags where they don't apply.
I've added a Flags argument to getSelectCC and the signature of getNode that takes
an ArrayRef of EVTs.
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).
The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.
The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.
I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.
Fixes#148084
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed
that the argument is an alloca, so we don't need to look at underlying
objects (which was not a correct thing to do anyway).
This also drops the offset argument for lifetime nodes in SDAG. The
offset is fixed to zero now. (Peculiarly, while SDAG pretended to have
an offset, it just gets silently dropped during selection.)
calculateByteProvider only cares about scalars or a single element
within a vector. For the later there is the VectorIndex parameter to
identify the element. All other properties, and specificially Index, are
related to the underyling scalar type and thus when taking the size of a
type it's the scalar size that matters.
Fixes https://github.com/llvm/llvm-project/issues/148387
getNode has logic to intersect flags correctly if the new node happens
to CSE with an existing node. Setting node flags after getNode bypasses
this logic and may change the node for other uses where the flags don't
hold.
Change isBuildVectorAll* -> isConstantSplatVectorAll* in VSelect in case
the fold happens after BuildVector has been canonically transformed to
Splat or if the Splat is initially in vselect already
- Fixes#73454
- Update related test cases, add extra tests in wasm
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
This allows truncated splat / buildvector in isBoolConstant, to allow
certain not instructions to be recognized post-legalization, and allow
vselect to optimize.
An override for x86 avx512 predicated vectors is required to avoid an
infinite recursion from the code that detects zero vectors. From:
```
// Check if the first operand is all zeros and Cond type is vXi1.
// If this an avx512 target we can improve the use of zero masking by
// swapping the operands and inverting the condition.
```
DAGCombiner can already constant fold build vectors of constants/undefs
to a new vector type, but it has to be incredibly careful after
legalization to not affect a target's canonicalized constants.
This patch proposes we move the implementation inside SelectionDAG to
make it easier for targets to manually use the constant folding whenever
it deems it safe to do so.
I've also altered the method to take the BuildVectorSDNode input
directly and consistently use the same SDLoc.
ISD::ABDS can be used if the signed subtraction will not overwrap (this
is an extension to handle cases where the NSW flag has been lost)
ISD::ABDU can be used if both operands have at least 1 zero sign bit.
Fixes#147049
This PR resolves https://github.com/llvm/llvm-project/issues/144513
The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X
1-4 have been migrated to DAGCombine. 5 still in x86 code.
The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.
For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).
This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.
I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.
Hopefully this will help some of the regression issues in #143102 etc.
After 901e1390c9778a191256335d37802bc631c2d183 (#127770), the DAG
combine would transform `fma(x, 0.0, 1.0)` into `1.0` if
`-fp-contract=fast` was enabled, in addition to when 'x' is marked
nnan/ninf.
It's only valid in the latter case, not the former, so delete the extra
condition.
Although nice to have to prove the freeze can be moved, this can fail
immediately after freeze(op(...)) -> op(freeze(),freeze(),...) creation
if any of the new freeze nodes now prevents value tracking from seeing
through to the source values (e.g. shift amounts/element indices are in
bounds etc.).
This will allow us to remove the isGuaranteedNotToBeUndefOrPoison checks
inside canCreateUndefOrPoison that were discussed on #146361
Remove `UnsafeFPMath` in `visitFMULForFMADistributiveCombine`,
`visitFSUBForFMACombine` and `visitFDIV`.
All affected tests are fixed by add fast math flags manually.
Propagate fast math flags when lowering fdiv in NVPTX backend, so it can
produce optimized dag when `unsafe-fp-math` is absent.
This patch focuses on generic DAG combines, plus an AMDGPU-target-specific one
that is closely connected.
The generic DAG combine is based on a part of PR #105669 by rgwott, which was
adapted from work by jrtc27, arichardson, davidchisnall in the CHERI/Morello
LLVM tree. I added some parts and removed several disjuncts from the
reassociation condition:
- `isNullConstant(X)`, since there are address spaces where 0 is a perfectly
normal value that shouldn't be treated specially,
- `(YIsConstant && ZOneUse)` and `(N0OneUse && ZOneUse && !ZIsConstant)`, since
they cause regressions in AMDGPU.
For SWDEV-516125.
This patch reassociates `add(add(vecreduce(a), b), add(vecreduce(c),
d))` into `add(vecreduce(add(a, c)), add(b, d))`, to combine the
reductions into a single node. This comes up after unrolling vectorized
loops.
There is another small change to move reassociateReduction inside fadd
outside of a AllowNewConst block, as new constants will not be created
and it should be OK to perform the combine later after legalization.
If we simplify a udiv/sdiv using the exact flag we shouldn't
propagate that simplifaction to any urem/srem that happens to
use the same operands. If the exact flag is wrong, the udiv/sdiv
will produce poison, but that doesn't mean we can make the urem/srem
simplify to 0.
Fixes#145360.
If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.
In the i128 tests, SelectionDAGBuilder aggressively truncates the
`zext nneg` to i64 to match `getShiftAmountTy`. If we don't preserve
the `nneg` we can't see that the shift amount argument being `signext`
means we don't need to do any extension
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/143864
Add (~a | x) & (a | y) -> (a & (x ^ y)) ^y for foldMaskedMerge func
using SDPatternMatch
aftering adding this pattern, run ```ninja check-llvm-codegen```, all
other cases remain unchanged, so I add a
testcase(fold-masked-merge-demorgan.ll) for it
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
This saves 2 instructions in the ARM soft float case for fcmp ueq.
This code is written in an confusingly overly general way. The point
of getCmpLibcallCC is to express that the compiler-rt implementations
of the FP compares are different aliases around functions which may
return -1 in some cases. This does not apply to the call for unordered,
which returns a normal boolean.
Also stop overriding the default value for the unordered compare for ARM.
This was setting it to the same value as the default, which is now assumed.
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.
For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).
This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
Allow freeze to sink through fmul by treating it as a
non-poison-generating op
when operands are not poison.
Adding `ISD::FMUL` to `AllowMultipleMaybePoisonOperands` lets DAG
combine
push freeze through fmul. This helps expose patterns like `fmul+fadd`
for `FMA` fusion.
When rebuilding the node, we drop flags like nnan/ninf/nsz that imply
poison,
but keep contract, reassoc, afn, and arcp.
Closes: https://github.com/llvm/llvm-project/issues/141622
If opaque constants are involved we can have an AND with 2 constant
operands that hasn't been simplified. If this is the case, we need
to modify at least one of the constants if it is out of range.
Fixes#142004
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.