SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds.
This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements.
There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads?
Fixes#86968
Hi all,
This patch is a follow-up of #79101. It migrates logic from
`visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this
patch we can do the following combinations:
```
vp.select undef, T, F --> T (if T is a constant), F otherwise
vp.select <condition>, undef, F --> F
vp.select <condition>, T, undef --> T
vp.select false, T, F --> F
vp.select <condition>, T, T --> T
```
I'm a total newbie to llvm and I'm sure there's room for improvements in
this patch. Please let me know if you have any advice. Thank you in
advance!
The folding of sign/zero extensions into an atomic load by specifying an
extension type is not target specific, and therefore belongs in the
DAGCombiner rather than in the SystemZ backend.
- Handle atomic loads similarly to regular loads by adding
AtomicLoadExtActions with set/get methods.
- Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
We check DAG.haveNoCommonBitsSet so the operands will be known to be
disjoint.
I couldn't think of a codegen test case since most targets aren't
checking hasDisjoint yet, apart from RISCV in the or_is_add pattern, but
it also falls back to computeKnownBits.
Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.
2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.
Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
Fixes#84831
When matching carry pattern with `getAsCarry`, it may produce different
type of carryout. This patch checks such case and does early exit.
I'm new to DAG, any suggestion is appreciated.
Add ones for every high bit that will cleared.
This will allow us to evaluate variables that have their bits known to
see if they have no risk of overflow despite the shift amount being
greater than the difference between the two types.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
Resolves#84745.
Based on SDPatternMatch introduced by #78654, this patch replaces some
of basic patterns in `visitADDLike` with corresponding patterns in
SDPatternMatch.
This patch only replaces original folds, instead of introducing new ones.
An EXTRACT_VECTOR_ELT can extend the element to the width of its result
type, leaving the high bits undefined. Previously if we attempted to
query the bytes in these high bits we would recurse and hit an
assertion. This fixes it by bailing if the index is outside of the
vector element size.
I think the assertion Index < ByteWidth may still be incorrect, since
ByteWidth is calculated from Op.getValueSizeInBits(). I believe this
should be Op.getScalarValueSizeInBits() whenever VectorIndex is set
since we're querying the element now, not the vector. But I couldn't
think of a test case to trigger it. It can be addressed in a follow-up
patch.
Fixes#83920
This is another smaller step of #70452, changing the signature of
computeAliasing() from optional<uint64_t> to LocationSize, and follow-up
changes in DAGCombiner::mayAlias(). There are some test change due to
the previous AA->isNoAlias call incorrectly using an unknown size
(~UINT64_T(0)). This should then be improved again in #70452 when the
types are known to be scalable.
This patch also pick the MatchContext framework from DAGCombiner to an
indiviual header file to make the framework be used from other files in
llvm/lib/CodeGen/SelectionDAG/.
If we have a sext and a zext nneg with the same types and operand
we should combine them into the sext. We can't go the other way
because the nneg flag may only be valid in the context of the uses
of the zext nneg.
This treats the zext nneg as sext if X is known to have sufficient sign
bits to allow the zext or truncate or both to removed. This code is
taken from the same optimization for sext.
We already have the PtrOff factored into MachinePointerInfo. Any calls
to getAlign on the new load with do commonAlignment with the
MachinePointerInfo offset and the base alignment.
The getAlign function for a load returns the commonAlignment of the
"base align" and the offset stored in the MachinePointerInfo.
We're splitting a load here, so we should take the base alignment from
the original load without any offset that may already exist in the
original load. The new load can then maintain its own alignment using
just the base alignment and its own offset.
Noticed by inspection.
The load combine replaces a number of original loads with one new loads
and also replaces the output chains of the original loads with the
output chain of the new load. This is incorrect if the original load is
retained (due to multi-use), as it may get incorrectly reordered.
Fix this by using makeEquivalentMemoryOrdering() instead, which will
create a TokenFactor with both chains.
Fixes https://github.com/llvm/llvm-project/issues/80911.