Currently isGuaranteedNotToBeUndef() is the same as
isGuaranteedNotToBeUndefOrPoison(). This function is used in places
where we only care about undef (due to multi-use issues), not poison.
Make it more precise by only considering instructions that can create
undef (like loads or call), and ignore those that can only create
poison. In particular, we can ignore poison-generating flags.
This means that inferring more flags has less chance to pessimize other
transforms.
Shufflevector semantics have changed so that poison mask elements
return poison rather than undef. Reflect this in the
canCreateUndefOrPoison() implementation.
[TLI] Pass replace-with-veclib works with Scalable Vectors.
The pass is heavily refactored.
It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors.
Improve tests for ArmPL and SLEEF Intrinsics:
- Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines.
- Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]`
- Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.
When the sub arguments are ptr2int it is not possible to determine
computeKnownBits() of its arguments.
For scalar case generally sub of 2 ptr2int are converted to sub of
indexes.
However a loop with recursive GEP/PHI where the arguments to sub is of
type ptr2int, if it is possible to determine that a sub of this GEP and
another pointer with the same base is KnownNonZero we can return this.
This helps subsequent passes to optimize the loop further.
`createFunctionType` returns a FunctionType that may contain a mask,
which is currently placed as the last parameter to the Function.
The placement happens according to `VFParameters` of `VFInfo`, and it
should be able to handle VFABI specification changes.
Regarding the return type, it uses the scalar type of the input instruction,
as the specification does not encode in the mangled name such information.
If that ever happens, that information should be available from `VFInfo`.
The specialisation will not be valid when ConstantInt gains native
support for vector types.
This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.
Co-authored-by: Nikita Popov <github@npopov.com>
With commit https://reviews.llvm.org/D152366 I introduced functionality
that permitted the hoisting of runtime memory checks from a vectorised
inner loop to the preheader of the next outer-most loop. This is useful
for benchmarks like SPEC2017's x264 where the inner loop is vectorised
and only has a small trip count. In such cases the runtime memory checks
become expensive and since the checks never fail in the case of x264 it
makes sense to do this. However, this behaviour was controlled by the
flag -hoist-runtime-checks which was off by default.
This patch enables this flag by default for all targets, since I believe
this is a generally beneficial thing to do. I have tested this with
SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1
and neoverse-n1, respectively. Similarly, I saw slight improvements in
the overall geomean on both machines. The only other notable changes
were a 1% drop in the roms benchmark, which was compensated for by a 1%
improvement in fotonik3d.
Add the `dead_on_unwind` attribute, which states that the caller will
not read from this argument if the call unwinds. This allows eliding
stores that could otherwise be visible on the unwind path, for example:
```
declare void @may_unwind()
define void @src(ptr noalias dead_on_unwind %out) {
store i32 0, ptr %out
call void @may_unwind()
store i32 1, ptr %out
ret void
}
define void @tgt(ptr noalias dead_on_unwind %out) {
call void @may_unwind()
store i32 1, ptr %out
ret void
}
```
The optimization is not valid without `dead_on_unwind`, because the `i32
0` value might be read if `@may_unwind` unwinds.
This attribute is primarily intended to be used on sret arguments. In
fact, I previously wanted to change the semantics of sret to include
this "no read after unwind" property (see D116998), but based on the
feedback there it is better to keep these attributes orthogonal (sret is
an ABI attribute, dead_on_unwind is an optimization attribute). This is
a reboot of that change with a separate attribute.
Code generation can sometimes simplify expensive operations when
an operand is constant. An example of this is divides on AArch64
where they can be rewritten using a cheaper sequence of multiplies
and subtracts. Doing this is often better than hoisting expensive
constants which are likely to be hoisted by MachineLICM anyway.
The general convention inside LVI is that std::nullopt means that
a value has been pushed to the worklist. However, getEdgeValueLocal()
used it as an additional spelling for getOverdefined() instead.
Make the layering here similar to all the other methods:
LazyValueInfoImpl implements the underlying API returning a
ValueLatticeElement, and then LazyValueInfo exposes this as a
ConstantRange publicly.
Use the helper in the getConstantRange() and
getConstantRangeAtUse() APIs as well. For that purpose move the
handling of isUnknown() into the helper as well.
The code only works on integer casts, and the only bitcasts
involving integers are trivial. The code as previously written
would try to handle things like float to integer bitcasts by
fetching a ConstantRange of a float value, which is an ill-defined
operation.
The current implementation using a worklist and visited map adds
a significant amount of additional complexity and compile-time
overhead. All we really care about here is that we don't overflow
the stack or cause exponential complexity in degenerate cases. We
can achieve this with a simple depth limit.
When attempting to hoist runtime checks out of a loop we currently avoid
creating pointer diff checks and prefer to do expanded range checks
instead. This gives us the opportunity to hoist runtime checks out of a
loop, since these checks are loop invariant. However, in some cases the
pointer diff checks would also be loop invariant and so will naturally
get hoisted. Therefore, since diff checks are cheaper so we should
prefer to use those instead.
Loop guards tend to provide better results when it comes to reasoning
about ranges than isLoopEntryGuardedByCond(). See the test change for
the motivating case.
I have retained both the loop guard check and the implied cond based
check for now, though the latter only seems to impact a single test and
only via side effects (nowrap flag calculation) at that.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
The returned attribute can be used when it is possible to
"losslessly bitcast" between the argument and return type,
including between two vector types.
computeKnownBits() would crash in this case, isKnownNonZero()
would potentially produce a miscompile.
Fixes https://github.com/llvm/llvm-project/issues/74722.
Operator allows the phi operand to be a ConstantExpr. A ConstantExpr is
a valid operand to a phi, but is never going to be a recurrence.
We can only match a BinaryOperator so use that instead.
I'm not sure whether it's possible to cause a miscompile due to
the missing check right now, as the affected values mechanism
effectively protects us against this. This becomes a problem for
an upcoming patch though.
AST checks aliasing with MustAlias sets by only checking the
representative pointer (getSomePointer). This is only correct if the
Size and AATags information of that pointer also includes the
Size/AATags of all other pointers in the set.
When we add a new pointer to the AliasSet, we do perform this update
(see the code in AliasSet::addPointer). However, if a pointer already in
the MustAlias set is used with a new size, we currently do not update
the representative pointer, resulting in miscompilations. Fix this by
adding the missing update.
This is a targeted fix using the current representation. There are a
couple of alternatives:
* For MustAlias sets, don't store per-pointer Size/AATags at all. This
would make it clear that there is only one set of common Size/AATags for
all pointers.
* Check against all pointers in the set even for MustAlias. This is what
https://github.com/llvm/llvm-project/pull/65731 proposes to do as part
of a larger change to AST representation.
Fixes https://github.com/llvm/llvm-project/issues/64897.
Minor simplification applied to VFShape::getScalarShape,
VFShape::get, and VFABI::tryDemangleForVFABI methods.
Also, remove unnecessary `static_cast` in `SLPVectorizer.cpp`
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
It's not safe for InstCombine to add disjoint metadata when converting
Add to Or otherwise.
I've added noundef attribute to preserve existing test behavior.
Use the disjoint flag to convert or to add instead of calling the
haveNoCommonBitsSet() ValueTracking query. This ensures that we can
reliably undo add -> or canonicalization, even in cases where the
necessary information has been lost or is too complex to reinfer in
SCEV.
I have updated the bulk of the test coverage to add the necessary
disjoint flags in advance.
As shown in #70473, the following loop was not considered safe to
vectorize. When determining the memory access dependencies in
a loop which has negative iteration step, we invert the source and
sink of the dependence. Perhaps we should just invert the operands
to getMinusSCEV(). This way the dependency is not regarded to be
true, since the users of the `IsWrite` variables, which correspond to
each of the memory accesses, rely on program order and therefore
should not be swapped.
void vectorizable_Read_Write(int *A) {
for (unsigned i = 1022; i >= 0; i--)
A[i+1] = A[i] + 1;
}
Because AA does not support vectors of pointers, we have to
treat pointers that are inserted into a vector as captures. We
mostly already do so, but missed the case where getelementptr
is used to produce a vector.
We have a bunch of places where we have to guard against undef
to avoid multi-use issues, but would be fine with poison. Use a
different function for these to make it clear, and to indicate that
this check can be removed once we no longer support undef. I've
replaced some of the obvious cases, but there's probably more.
For now, the implementation is the same as UndefOrPoison, it just
has a more precise name.
This patch passes `SimplifyQuery` to `computeKnownBits` directly in
`InstSimplify` and `InstCombine`.
As the `DomConditionCache` in #73662 is only used in `InstCombine`, it
is inconvenient to introduce a new argument `DC` to `computeKnownBits`.
When using `BranchProbabilityPrinterPass`, if a BB has no name, we get pretty unusable information like `edge -> has probability...` (i.e. we have no idea what the vertices of that edge are).
This patch uses `printAsOperand`, which uses the same naming scheme as `Function::dump`, so for example during debugging sessions, the IR obtained from a function and the names used by `BranchProbabilityPrinterPass` will match.
A shortcoming is that `printAsOperand` will result in the numbering algorithm re-running for every edge and every vertex (when `BranchProbabilityPrinterPass` is run on a function). If, for the given scenario, this is a problem, we can revisit this subsequently.
Another nuance is that the entry basic block will be numbered, which may be slightly confusing when it's anonymous, but it's easily identifiable - the first edge would have it as source (and the number should be easily recognizable)
If both icmps have the same operands and the RHS is constant, we
would currently go into the isImpliedCondMatchingOperands() code
path, instead of the isImpliedCondCommonOperandWithConstants()
path. Both are correct, but the latter can produce more accurate
results if the implication is dependent on the sign.