If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).
Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).
I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
reassociationCanBreakAddressingModePattern tries to prevent bad add
reassociations that would break adrressing mode patterns. This adds
support for vscale offset addressing modes, making sure we don't break
patterns that already exist. It does not optimize _to_ the correct
addressing modes yet, but prevents us from optimizating _away_ from
them.
This is useful when the inner add has multiple uses, and so cannot be
canonicalized by pushing the constants down through the mul. This patch
adds patterns for both `add(mul(add(A, CA), CM), CB)` and with an extra add
`add(add(mul(add(A, CA), CM), B) CB)` as the second can come up when
lowering geps.
Previously we recursively looked through extends and truncates on both
SourceValue and WideVal.
SourceValue is the largest source found for each of the stores we are
combining. WideVal is the source for the current store.
Previously we could incorrectly look through a (zext (trunc X)) pair and
incorrectly believe X to be a good source.
I think we could also look through a zext on one store and a sext on
another store and arbitrarily pick one of the extends as the final
source.
With this patch we only look through one level of extend or truncate.
And we don't look through extends/truncs on both SourceValue and WideVal
at the same time.
This may lose some optimization cases, but keeps everything we had tests
for.
Fixes#90936.
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.
This requires special case handling to create a new ShuffleVectorSDNode.
Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.
When looking through a right shift, we need to make sure that all of
the bits we are using from the shift come from the shift input and
not the sign or zero bits that are shifted in.
Fixes#90936.
In #70452 DAGCombiner::mayAlias was taught to handle scalable sizes, but
when it checks via AA->isNoAlias it didn't take into account the case
where the size is scalable but there was an offset too.
For the fixed length case the offset was just accounted for by adding to
the LocationSize, but for the scalable case there doesn't seem to be a
way to represent both a scalable and fixed part in it. So this patch
works around it by bailing if there is an offset.
Fixes#90559
This reverts commit 16bd10a38730fed27a3bf111076b8ef7a7e7b3ee.
Re-applies:
b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"
with a fix in DAGCombiner::visitFREEZE.
This reverts:
b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)"
(because it updates a test case that I don't know how to resolve the conflict for)
8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)"
73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)"
Due to a test suite failure on AArch64 when compiling for SVE.
https://lab.llvm.org/buildbot/#/builders/197/builds/13955
clang: ../llvm/llvm/include/llvm/CodeGen/ValueTypes.h:307: MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison
Handle SELECT_CC similarly as SETCC.
Handle these operations that only propagate poison/undef based on the
input operands:
SADDSAT, UADDSAT, SSUBSAT, USUBSAT, MULHU, MULHS,
SMIN, SMAX, UMIN, UMAX
These operations may create poison based on shift amount and exact
flag being violated:
SRL, SRA
One goal here is to allow pushing freeze through these operations
when allowed, as well as letting analyses such as
isGuaranteedNotToBeUndefOrPoison to not break on such operations.
Since some problems have been observed with pushing freeze through
SRA/SRL we block that explicitly in DAGCombiner::visitFreeze now.
That way we can still model SRA/SRL properly in
SelectionDAG::canCreateUndefOrPoison, e.g. when used by
isGuaranteedNotToBeUndefOrPoison, even if we do not want to push
freeze through those instructions.
It's really unfortunate that STORE and ATOMIC_STORE are separate
opcodes. This duplicates a basic simplify demanded for the truncating
case. This avoids some AMDGPU lit regressions in a future patch.
I'm not sure how to craft a test that exposes this without first
introducing the regressions by promoting half to i16.
(Sorry, not an actual build_pair node just a similar pattern).
For cases where we're concatenating 2 integers into a double width integer, see if both integer sources are NOT patterns.
We could take this further and handle all logic ops with a constant operands, but I just wanted to handle the case reported on #89533 initially.
Fixes#89533
Avoid turning a BUILD_VECTOR that can be recognized as "all zeros",
"all ones" or "constant" into something that depends on
freeze(undef), as that would destroy those properties.
Instead we replace undef by 0/-1 in such vectors, making it possible
to fold away the freeze. We typically use -1 if the BUILD_VECTOR
would identify as "all ones", and otherwise we use the value 0.
Ensure that the sum of the shift amounts does not overflow the
shift amount type when combining shifts in combineShiftOfShiftedLogic.
Solves a miscompile bug found when testing the C23 BitInt feature.
Targets like X86 that only use an i8 for shift amounts after
legalization seems to be extra susceptible for bugs like this as it
isn't legal to shift more than 255 steps.
This requires type legalization to keep them the same. This means we no
longer need to legalize the operand since it will be legalized when we
legalize the second result.
In the attach test case we were creating a UADDO_CARRY with i1 carry out
and i41 carry in. i41 exceeds is larger than the setcc result type for
AArch64 which is i32. i41 needs to be promoted to i64 since it is larger
than i32. The type legalizer tried to use promoteTargetBoolean, but that
can only promote from a type smaller than setcc result type.
The easiest fix here is to force the carryin type to match the carryout
type at the type of creation. This should ensure the node won't exceeed
setcc result type as long as the output type doesn't.
I think we should explore requiring the types to match for this node.
Fixes#88966
If the extract_subvector is cheap, attempt to extract directly from an inserted subvector
Reapplied with a check to ensure we only attempt this for fixed vectors
As we canonicalizes smin with non-negative operands into umin in the
middle-end, the saturation pattern will be broken.
This patch reverts the transform in DAGCombine to fix the regression on
ARM.
Fixes https://github.com/llvm/llvm-project/issues/85706.
Only allow to change build_vector to concat_vector when the splat type
and vector element type is same. It's to fix assertion of failing to
bitcast types of different sizes.
SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds.
This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements.
There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads?
Fixes#86968
Hi all,
This patch is a follow-up of #79101. It migrates logic from
`visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this
patch we can do the following combinations:
```
vp.select undef, T, F --> T (if T is a constant), F otherwise
vp.select <condition>, undef, F --> F
vp.select <condition>, T, undef --> T
vp.select false, T, F --> F
vp.select <condition>, T, T --> T
```
I'm a total newbie to llvm and I'm sure there's room for improvements in
this patch. Please let me know if you have any advice. Thank you in
advance!
The folding of sign/zero extensions into an atomic load by specifying an
extension type is not target specific, and therefore belongs in the
DAGCombiner rather than in the SystemZ backend.
- Handle atomic loads similarly to regular loads by adding
AtomicLoadExtActions with set/get methods.
- Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
We check DAG.haveNoCommonBitsSet so the operands will be known to be
disjoint.
I couldn't think of a codegen test case since most targets aren't
checking hasDisjoint yet, apart from RISCV in the or_is_add pattern, but
it also falls back to computeKnownBits.
Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.
2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.
Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
Fixes#84831
When matching carry pattern with `getAsCarry`, it may produce different
type of carryout. This patch checks such case and does early exit.
I'm new to DAG, any suggestion is appreciated.