Correctly handle the case of splitting an alloca which backs contiguous
distinct variables, where a slice's size equals the size of a backed variable.
We need to ensure that we don't generate fragments expressions with fragments
of the same size as the variable as this is a verifier error.
Prior to this patch a fragment expression would be created in this
situation. e.g. splitting an alloca i64 with two adjacent 32-bit variables into
two 32-bit allocas, the new dbg.assign expressions would contain
(DW_OP_LLVM_fragment, 0, 32) and (DW_OP_LLVM_fragment, 32, 32) even though
those fragments cover each variable entirely.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D147696
Re-write the code to avoid iteration over users of
constants and global values.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147450
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from
zext <16 x i8> %a to <16 x i32>
to
zext <16 x i8> %a to <16 x i16>
However, we were previously calculating the cost for doing
zext <16 x i16> %a to <16 x i16>
which is incorrect.
Differential Revision: https://reviews.llvm.org/D147152
This patch is making sure that we use getTopMostExitingLoop when
finding out which loops to forget, when dealing with
unswitchNontrivialInvariants and unswitchTrivialSwitch. It seems
to at least be needed for unswitchNontrivialInvariants as detected
by the included test case.
Note that unswitchTrivialBranch already used getTopMostExitingLoop.
This was done in commit 4a9cde5a791cd49b96993e6. The commit
message in that commit says "If the patch makes sense, I will also
update those places to a similar approach ...", referring to these
functions mentioned above. As far as I can tell that never happened,
but this is an attempt to finally fix that.
Fixes https://github.com/llvm/llvm-project/issues/61080
Differential Revision: https://reviews.llvm.org/D147058
Patch generalizes analysis of scalars. The main part is outlined into
lambda, which can be used to find reused inserted scalars and emit
shuffle for them instead of multiple insertelement instructions, if the
permutation is found alreadyi. I.e. some scalars are transformed by the
permutation of previously vectorized nodes, and some are inserted
directly.
Reworked part of D110978
Differential Revision: https://reviews.llvm.org/D146564
(JFYI - This has been heavily reframed since original attempt at landing.)
This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.
In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).
This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.
Differential Revision: https://reviews.llvm.org/D147336
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
/data/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp:32:15: error: function 'decomposeSimpleLinearExpr' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static Value *decomposeSimpleLinearExpr(Value *Val, unsigned &Scale,
^
1 error generated.
canonicalizeLogicFirst reorders logic op / math op for suitable
constants, and this commit makes this function pass through
nsw/nuw flags on the Add.
Differential Revision: https://reviews.llvm.org/D147568
For poison-generating (rather than IUB) metadata, only copy it
from the dominating must-exec load if it is combined with !noundef.
This could be further extended by additionall intersecting the
metadata from all loads, which does not require !noundef.
createFragmentExpression will fail if it determines that the expression cannot
be split over fragments. Handle this case in SROA. Similarly to D147312 this
should be a rare occurrence as the `dbg.assign` will usually reference the
`Value` being stored without modifying it with a `DIExpression`.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D147431
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D136251
LICM currently requests optimized use MSSA form. This is wasteful,
because LICM doesn't actually care about most uses, only those of
invariant pointers in loops. Everything else doesn't need to be
optimized.
LICM already uses the clobber walker in most places. This patch
adjusts one place that was using getDefiningAccess() to use it as
well, so we no longer have a dependence on pre-optimized uses.
This change is not NFC in that the fallback on the defining access
when there are too many clobber calls may now fall back to an
unoptimized use. In practice, I've not seen any problems with this
though. If desired, we could also increase licm-mssa-optimization-cap
to a higher value (increasing this from 100 to 200 has no impact on
average compile-time -- but also doesn't appear to have any impact
on LICM quality either).
This makes for a 0.9% geomean compile-time improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D147437
Ironically, MSan copies uninitialized data off the stack into
VAArgTLSCopy in the callee-side handling of va_start. Clamp the copy
size to the actual length of the buffer, and zero-initialize the
remainder.
Differential Revision: https://reviews.llvm.org/D146858
Since memory does not have an intrinsic type, we do not need to require value type matching on stores in order to sink them. To facilitate that, this patch finds stores which are sinkable, but have conflicting types, and bitcasts the ValueOperand so they are easily sinkable into a PHINode. Rather than doing fancy analysis to optimally insert the bitcast, we always insert right before the relevant store in the diamond branch. The assumption is that later passes (e.g. GVN, SimplifyCFG) will clean up bitcasts as needed.
Differential Revision: https://reviews.llvm.org/D147348
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.
Differential Revision: https://reviews.llvm.org/D146127
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.
This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.
Differential Revision: https://reviews.llvm.org/D146629
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.
This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.
This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
MemorySSA update fixes.
Differential Revision: https://reviews.llvm.org/D137707
The counters for the repeated scalars are ordered in the natural order,
but the original scalars might be reordered during SLP graph reordering
and this order can be dropped. Need to use the scalars after the
reordering, not the original ones, to emit correct code for same value
counters.
Per LangRef:
> If a load instruction tagged with the !invariant.load metadata
> is executed, the memory location referenced by the load has to
> contain the same value at all points in the program where the
> memory location is dereferenceable; otherwise, the behavior is
> undefined.
As invariant.load violation is immediate undefined behavior, it
is sufficient for it to be present on the dominating load (for
the case where K does not move).
patchReplacementInstruction() is used for CSE-style transforms.
Avoid the need to maintain two separate lists of known metadata IDs,
which can and do go out of sync.
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.
This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.
Differential Revision: https://reviews.llvm.org/D137707
If there is an optimisation opportunity and the function argument hasn’t been added to constraint
system through previous facts we fail to optimise it.
It might be a good idea to start the constraint system with all the function arguments added to the system
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D144879
Even if there are no thread-safety concerns, we should not promote
(not guaranteed-to-execute) stores to globals without further
analysis: While the global may be writable, we may not have
provenance to perform the write. The @promote_global_noalias test
case illustrates a miscompile in the presence of a noalias pointer
to the global.
Worth noting that the load-only promotion may also not be well-defined
depending on precise semantics (we don't specify whether load
violating noalias is poison or UB -- though I believe the general
inclination is to make it poison, and only stores UB), but that's
a more general issue.
This is inspired by https://github.com/llvm/llvm-project/issues/60860,
which is a related issue with TBAA metadata.
Differential Revision: https://reviews.llvm.org/D146233
When we replace poison with freeze poison it might appear
that user of poison is a constant (for example vector constant).
In this case we will get that constant will get non-constant operand.
Moreover replacing poison and GlobalValue everywhere in module seems
to be overkill. So the solution will be just make a replacement
only in instructions we visited (contributing to hoisted condition).
Moreover if user of posion is constant, this constant also should need
a freeze and it does not make sense to replace poison with frozen version,
just freeze another constant.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147429
Extract a helper that does the clobber walk while taking into
account the cap. Slightly reflow things to check this first in
the store case, before we start walking over all accesses in the
loop.
instruction with users."' failed.
If the externally used scalar is part of the tree and is replaced by
extractelement instruction, need to add generated extractelement
instruction to the list of the ExternallyUsedValues to avoid deletion
during vectorization.
Move this handling to a centralized place and extend it to handle
saturating add/sub intrinsics.
I originally wanted to make this fully generic rather than
whitelist based, because this is legal and likely profitable for all
speculatable intrinsics. The caveat is that for vector selects,
the intrinsic can't perform cross-lane operations like a shuffle
or reduction, which we don't really expose as a generic property
right now. So for now I'm just extending the list.
Now that we canonicalize to min/max intrinsics, we no longer need
to guard against this here.
In fact, it seems like the issue from PR46271 was the final push
for introducing the intrinsics in the first place...