Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.
Differential Revision: https://reviews.llvm.org/D148036
GVN load widening was disabled in D24096. This removes various
support code that is no longer relevant.
The way this works nowadays is that we return PartialAlias with
an offset from BasicAA and this gets passed on as a clobber by
MDA. However, PartialAlias will only be returned if the load is
properly nested inside the other load.
This just removes the bulk of the code, but some additional
cleanup can be done here now that we don't need to distinguish
between load and store cases.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
This could also move initialization of sret args, causing actually
initialized parts of such return values to be uninitialized. See
discussion on the code review.
> As a result of -ftrivial-auto-var-init, clang generates instructions to
> set alloca'd memory to a given pattern, right after the allocation site.
> In some cases, this (somehow costly) operation could be delayed, leading
> to conditional execution in some cases.
>
> This is not an uncommon situation: it happens ~500 times on the cPython
> code base, and much more on the LLVM codebase. The benefit greatly
> varies on the execution path, but it should not regress on performance.
>
> This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
> MemorySSA update fixes.
>
> Differential Revision: https://reviews.llvm.org/D137707
This reverts commit 50b2a113db197a97f60ad2aace8b7382dc9b8c31
and follow-up commit ad9ad3735c4821ff4651fab7537a75b8f0bb60f8.
Added ShuffleCostEstimator class and the first adjustExtracts member,
which is just a copy of previous AdjustExtractCost lambda.
Differential Revision: https://reviews.llvm.org/D147787
Some dbg.assigns using poison become un-poisoned in SROA. The reason this
happens at all is because dbg.assigns linked to memory intrinsics use poison to
indicate they can't describe the stored value, but the value becomes available
after some optimisations. This needs reworking eventually, but for now we need
to ensure that when it does occur we don't create invalid expressions.
D147312 prevented this occuring when the dbg.assign uses DIArgLists, but that
wasn't a complete fix. We also need to ensure we avoid un-poisoning when the
existing expression uses more than one location operand (DW_OP_arg, n).
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D148020
- As the address space cast may not be valid on a specific target,
`addrspacecast` is not handled when an `alloca` is able to be replaced
with the source of memcpy/memmove. This patch addresses that by
querying a target hook on whether that address space cast is valid.
For example, on most GPU targets, the cast from a global pointer to a
generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
replacement is enhanced to handle that `addrspacecast` as well.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D147025
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
We have now seen two miscompiles because of widening widenable
conditions at incorrect IR points and thereby changing a branch's loop
invariant condition to a loop-varying one (see PR60234 and PR61963).
This patch adds asserts in common guard utilities that we use for
widening to proactively catch these bugs in future.
Note that these asserts will not fire if we were to sink a widenable
condition from out of a loop into a loop (that's also incorrect for the
same reason as above).
Tested this without the fix for PR60234 (guard widening miscompile) and
confirmed the assert fires.
WARNING: Sometimes, the assert can fire if we failed to hoist the
invariant condition out of the loop. This is a pass-ordering issue or a
limitation in LICM, which would need an investigation. See details in
review.
Differential Revision: https://reviews.llvm.org/D147752
Update the planning code constructing VPlan to allow building VPlans to
fail. This allows us to gradually shift some legality checks to VPlan
construction. The first candidate is checking if all users of
first-order recurrence phis can be sunk past the recipe computing the
previous value.
The new functionality will be used by D142886 which is approved and will
be landed shortly.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142885
Avoid divergence b/w different kinds of hoisting with reassociation.
Make them all collect general stat NumHoisted and also specific stats
for each particular transform.
In this case the source GEP might not be hoisted even though it
has invariant operands. For now just bail out, but we might need
additional checks for AllowSpeculation in these special-case
reassociation folds.
Reassociate gep (gep ptr, idx1), idx2 to gep (gep ptr, idx2), idx1
if this would make the inner GEP loop invariant and thus hoistable.
This is intended to replace an InstCombine fold that does this (in
04f61fb73d/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (L2006)).
The problem with the InstCombine fold is that LoopInfo is an optional
dependency, so it is not performed reliably.
Differential Revision: https://reviews.llvm.org/D146813
Suggested as independent cleanup in D147567. Either VF or UF need to be
> 1. Note that if the condition would be false, the code below would use
a nullptr and crash.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.
Depends on D147472.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147567
Loop predication's predicateLoopExit pass does two incorrect things:
It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.
The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.
Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.
Differential Revision: https://reviews.llvm.org/D147662
Instead of iterating over all LCSSA phis in the exit block, collect all
LiveOut users of the FOR splice VPInstruction and only update those
users.
Building on top of D147471, this removes an access to the cost model
after VPlan execution.
Depends on D147471.
Reviewed By: Ayal, michaelmaitland
Differential Revision: https://reviews.llvm.org/D147472
It is not actually used for any computations. Its only purpose is to
check that the loop is finite and find out the type of computed exit
count. Refactor code so that we only store this type.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.
This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).
Depends on D147468.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147471
This removes the need to convert the end of the range to the next
power-of-2 for the end iterator after 4bd3fda5124962 and was suggested
as follow-up TODO in D147468.
Add an iterator to iterate over all VFs in VFRange. This simplifies some
existing code and allows using all_of,any_of and none_of on a VFRange.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147468
Use clone to keep the metadata, the issue is reported
by aeubanks on D141188.
Reviewed By: nikic, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D146702
Using the more robust log2 search allows us to fold more cases (same
logic as exists for idiv/irem).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D146347
dependencies.
Improved compiled time by the precomputing the mapping between gathered
scalars and their gather/buildvector nodes for later use in
isGatherShuffledEntry to avoid recomputing this map each time this
function is called.
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147467
Unfortunately alive2 cannot prove the correctness due to fails by timeout even for
float type half.
However it should be correct. If a and b are not NaN, maximum and minimum will just
return different values (a and b) and take into account a + b == b + a this is the same.
If a or b is NaN, than maximum and minimum are equal to NaN and NaN + NaN is NaN.
a + b is also a NaN.
In terms of preserving fast flags, we cannot preserve ninf due to
minimum(NaN, Infinity) == maximum(NaN, Infinity) == NaN,
minimum(NaN, Infinity) +ninf maximum(NaN, Infinity) == NaN +ninf NaN = NaN
However transformation will change
minimum(NaN, Infinity) + maximum(NaN, Infinity) to NaN +ninf Infinity == poison.
But if fadd is marked as nnan, we can preserve because NaN +ninf/nnan NaN = poison as well.
The same optimization for
maximum(a,b) * minimum(a,b) => a * b
is added.
All said above for fadd is correct for fmul.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147299
Revert commit due to failure on buildbot:
error: 'match_combine_or' may not intend to support class template argument deduction
This reverts commit b86a06ef284f2637bef89bf5bb20157a8b195568.
Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model.
Differential Revision: https://reviews.llvm.org/D147730
The original loop has O(MxN) since `is_contained` iterates over
all incoming values. This change makes it so only the phis
which use the value as an incoming value are iterated over so
it is now O(M).
Differential Revision: https://reviews.llvm.org/D146999
Conditionally setting MaskForGaps is only needed for loads. This avoid
re-computing MaskForGaps for stores.
Suggested as independent cleanup in D147467.
If the value is used in the expression, need to adjust the mask before
applying the mask. Plus, need to fix the analysis of the phi nodes for
reused scalars.
Made the condition for the erasing of the gathered extractelements
stricter, remove it only if it has single vectorized use, otherwise
leave it for instcombiner/instsimplify analysis.