These vectors are resized in the constructor and never change size.
We can manually allocate two arrays instead.
This reduces the size of TreePatternNode by removing the
unneeded capacity end pointer fields from the std::vector.
We reserved 16 AddrSpaces in every TypeSetByHwMode. But we only ever
use the first one on targets that make use of the AddrSpace feature.
The vector was populated by pushing for each entry in the ArrayRef
passed to the TypeSetByHwMode constructor. Each entry is a
ValueTypeByHwMode that stores one VT for each HwMode.
The vector is accessed by a loop in TypeSetByHwMode::getValueTypeByHwMode.
That loop is over HwModes with in the TypeSetByHwMode. This is
unrelated to how the vector was created. The entries in the vector
don't represent HwModes.
The targets that use AddrSpace don't make use of HwModes so the
loop in getValueTypeByHwMode will only run 1 iteration. So we only
the first entry in the vector is meaningful used.
This patch simplifies things by storing only 1 AddrSpace in
TypeSetByMode. Reducing the memory used by TypeSetByHwMode.
More work will be needed to support HwModes with AddrSpace if we
need a different AddrSpace for each HwMode.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D148194
An intrusive reference counter uses less memory than the control
block of std::shared_ptr.
This should allow some additional code simplifications if we
don't need to pass around shared_ptr in order to create new
shared_ptrs.
We don't need to copy the ChildVariants vector into a new vector.
We're using std::move on every entry we copy so they clearly
aren't needed again. We can swap entries in the vector and reuse
it instead.
Previously we were passing 'this' and the std::shared_ptr version
of 'this'. This replaces all uses of 'this' with the shared_ptr and
makes the method static.
We add entries to the map in two places. One already used
std::piecewise_construct with map::emplace. The other was using
map::insert. Change both to map::try_emplace.
InfoByHwMode is the base class of TypeSetByHwMode. The two methods
did the same thing. No reason to have two ways to do it.
Also use the getSimple() access instead of Map.begin()->second.
Comparing types is quite expensive when hardware modes are being
used. Checking the operator first can let us detect mismatches
earlier without checking types.
Use the predicate condition instead of checkFeatures in *GenDAGISel.inc.
This makes the code similar to isel pattern predicates.
checkFeatures is still used by code created by SubtargetEmitter so
we can't remove the string. Backends need to be careful to keep
the string and predicates in sync, but I don't think that's a big issue.
I haven't measured it, but this should be a compile time improvement
for isel since we don't have to do any of the string processing that's
inside checkFeatures.
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D146012
RFC to add a way to ignore COPY instructions when pattern-matching MIR in GISel.
- Add a new "GISelFlags" class to TableGen. Both `Pattern` and `PatFrags` defs can use it to alter matching behaviour.
- Flags start at zero and are scoped: the setter returns a `SaveAndRestore` object so that when the current scope ends, the flags are restored to their previous values. This allows child patterns to modify the flags without affecting the parent pattern.
- Child patterns always reuse the parent's pattern, but they can override its values. For more examples, see `GlobalISelEmitterFlags.td` tests.
- [AMDGPU] Use the IgnoreCopies flag in BFI patterns, which are known to be bothered by cross-regbank copies.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136234
Simplifies the implementation of `TypeSize` while retaining its interface.
There is no need for abstract concepts like `LinearPolyBase`, `UnivariateLinearPolyBase` or `LinearPolySize`.
Differential Revision: https://reviews.llvm.org/D140263
Simplifies the implementation of `TypeSize` while retaining its interface.
There is no need for abstract concepts like `LinearPolyBase`, `UnivariateLinearPolyBase` or `LinearPolySize`.
Differential Revision: https://reviews.llvm.org/D140263
The TableGen implementation was using a homegrown implementation of
FunctionModRefInfo. This switches it to use MemoryEffects instead.
This makes the code simpler, and will allow exposing the full
representational power of MemoryEffects in the future. Among other
things, this will allow us to map IntrHasSideEffects to an
inaccessiblemem readwrite, rather than just ignoring it entirely
in most cases.
To avoid layering issues, this moves the ModRef.h header from IR
to Support, so that it can be included in the TableGen layer.
Differential Revision: https://reviews.llvm.org/D137641
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.
The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.
I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
This change introduces the HasNoUse builtin predicate in PatFrags that
checks for the absence of use of the first result operand.
GlobalISelEmitter will allow source PatFrags with this predicate to be
matched with destination instructions with empty outs. This predicate is
required for selecting the no-return variant of atomic instructions in
AMDGPU.
Differential Revision: https://reviews.llvm.org/D125212
The previous code had a bug when dealing with matching iPTR against a
set of integer types. It was trying to handle it all in a compact way,
but that implementation couldn't be modified to correct the problem in
a simple way. The code wasn't long, and it was easier to rewrite it.
The actual issue was that non-scalar-integer types were considered when
matching against iPTR. For example {iPTR} intersected with {i32 f32}
was {iPTR} (due to multiple types in the other set), but should be just
{i32}, because i32 is the only integer scalar in the other set.
This commits removes TableGens reliance on managed static global record state
by moving the RecordContext into the RecordKeeper. The RecordKeeper is now
treated similarly to a (LLVM|MLIR|etc)Context object and is passed to static
construction functions. This is an important step forward in removing TableGens
reliance on global state, and in a followup will allow for users that parse tablegen
to parse multiple tablegen files without worrying about Record lifetime.
Differential Revision: https://reviews.llvm.org/D125276
Previously, all children would be checked to see if any were an
explicit Register. If anywhere no commutable patterns would be
generated. This patch loosens the restriction to only check the
children that are being commuted.
Digging back through history, this code predates the existence of
commutable intrinsics and commutable SDNodes with more than 2
operands. At that time the loop would count the number of children that
weren't registers and if that was equal to 2 it would allow commuting.
I don't think this loop was re-considered when commutable
intrinsics were added or when we allowed SDNodes with more than 2
operands.
This important for RISCV were our isel patterns have a V0 mask
operand after the commutable operands on some RISCVISD opcodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D117955
Since 65b13610a5226b84889b923bae884ba395ad084d, raw_string_ostream has
been unbuffered by default. Based on an audit of llvm/utils/, this
commit removes every call to `raw_string_ostream::flush()` and any call
to `raw_string_ostream::str()` whose result is ignored or that doesn't
help with clarity.
I left behind a few calls to `str()`. In these cases, the underlying
std::string was declared pretty far away and never used again, whereas
stream recently had its last write. The code is easier to read as-is;
the no-op call to `flush()` inside `str()` isn't harmful, and when
https://reviews.llvm.org/D115421 lands it'll be gone anyway.
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. This addresses that
discrepancy. The test case is not representative of most real-world uses
but is sufficient to demonstrate inference is working.
Some of these uses also make use of ValueTypeByHwMode rather than
SimpleValueType and so the existing type inference is extended to
support that alongside the new type inference.
There are also currently various cases of using ComplexPatterns with an
untyped type, but only for non-leaf nodes. For compatibility this is
permitted, and uses the old behaviour of not inferring for non-leaf
nodes, but the existing logic is still used for leaf values. This
remaining discrepancy should eventually be eliminated, either by
removing all such uses of untyped so the special case goes away (I
imagine Any, or a more specific type in certain cases, would be
perfectly sufficient), or by copying it to the leaf value case so
they're consistent with one another if this is something that does need
to keep being supported.
All non-experimental targets have been verified to produce bit-for-bit
identical TableGen output with this change applied.
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D109035
The operand of the second any_of in EnforceSmallerThan should be
B not S like the FP code in the if below.
Unfortunately, fixing that causes an infinite loop in the build
of RISCV. So I've added a workaround for that as well.
Fixes PR44768.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111502
This gives a nice message about the location of errors in a large
tablegen file, which is much more useful for users
Differential Revision: https://reviews.llvm.org/D102740
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.
At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.
This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)
Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.
Differential Revision: https://reviews.llvm.org/D103284
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.
The changes to isPow2VectorType and getPow2VectorType are copied from EVT.
The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.
This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8
Reviewed By: frasercrmck, sdesmalen
Differential Revision: https://reviews.llvm.org/D102262
After D100691, predicates should be cheap to compare again so
we don't need to filter anymore.
This is mostly just a revert of several patches going back to 2018.
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D100695