This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`
as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)`
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`
iff valid for the target.
Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT
Followup to #112547
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.
Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.
X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type
Minor cleanup that helps with #107423
Reapplied after regression fix ba1255def64a9c3c68d97ace051eec76f546eeb0
Noticed while triaging the regression from #112710 noticed by @mstorsjo - don't rely on isConstantIntBuildVectorOrConstantInt+getNode to guarantee constant folding (if it fails to constant fold it will infinite loop), use FoldConstantArithmetic instead.
Alter both isConstantIntBuildVectorOrConstantInt + isConstantFPBuildVectorOrConstantFP to return a bool instead of the underlying SDNode, and adjust usage to account for this.
Update isConstantIntBuildVectorOrConstantInt to peek though bitcasts when attempting to find a constant, in particular this improves canonicalization of constants to the RHS on commutable instructions.
X86 is the beneficiary here as it often bitcasts rematerializable 0/-1 vector constants as vXi32 and bitcasts to the requested type
Minor cleanup that helps with #107423
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
Cleanup for #112682
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
Cleanup for #112682
Don't rely on isConstantFPBuildVectorOrConstantFP followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
Cleanup for #112682
Don't rely on isConstantIntBuildVectorOrConstantInt followed by getNode() will constant fold - FoldConstantArithmetic will do all of this for us.
Cleanup for #112682
Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.
All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.
This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
Reapply #111774.
Closes#108218.
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.
This PR has two parts:
* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.
Remaining Tasks:
- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?
@topperc
---------
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv.
Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.
Cache negative search result from getStoreMergeCandidates() so that
mergeConsecutiveStores() does not iterate quadratically over a
potentially long sequence of unmergeable stores.
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.
Fixes#106202
In visitFREEZE we have been collecting a set/vector of
MaybePoisonOperands that later was iterated over, applying a freeze to
those operands. However, C-level fuzzy testing has discovered that the
recursiveness of ReplaceAllUsesOfValueWith may cause later operands in
the MaybePoisonOperands vector to be replaced when replacing an earlier
operand. That would then turn up as
Assertion `N1.getOpcode() != ISD::DELETED_NODE &&
"Operand is DELETED_NODE!"' failed.
failures when trying to freeze those later operands.
So we need to make sure that the vector with MaybePoisonOperands is
mutated as well when needed. Or as the solution used in this patch, make
sure to keep track of operand numbers that should be frozen instead of
having a vector of SDValues. And then we can refetch the operands while
iterating over operand numbers.
The problem was seen after adding SELECT_CC to the set of operations
including in "AllowMultipleMaybePoisonOperands". I'm not sure, but I
guess that this could happen for other operations as well for which we
allow multiple maybe poison operands.
Type legalization lacks generic support for these operations. They are
normally only created when the type is legal. This scalarization case is
new.
We could update type legalization, but there some corner cases that make
it not straightforward. For example, if the promoted type isn't 2x the
narrow type we need to over promote.
Fixes#104480