33436 Commits

Author SHA1 Message Date
Craig Topper
d66e42ca41 [LoopIdiomRecognize] Replace getNegativeSCEV(getOne()) with getMinusOne. NFC 2023-04-12 13:42:35 -07:00
Alexey Bataev
b28f407df9 [SLP]Improve reduction cost model for scalars.
Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.

Differential Revision: https://reviews.llvm.org/D148036
2023-04-12 11:32:51 -07:00
Anna Thomas
76e4070843 Revert "[GuardUtils] Add asserts about loop varying widenable conditions"
This reverts commit 5675757f5fc6e27ce01b3b12bdfd04044df53aa3.

Assert maybe too strict. revert and investigate why assert fires.
2023-04-12 10:58:45 -04:00
Nikita Popov
ae28e016d3 [VNCoercion] Drop some redundant functions (NFC)
These load and store APIs now do the same thing, so combine them
into one.
2023-04-12 16:46:54 +02:00
Nikita Popov
6e78fd58cd [GVN][VNCoercion] Remove load widening leftovers (NFCI)
GVN load widening was disabled in D24096. This removes various
support code that is no longer relevant.

The way this works nowadays is that we return PartialAlias with
an offset from BasicAA and this gets passed on as a clobber by
MDA. However, PartialAlias will only be returned if the load is
properly nested inside the other load.

This just removes the bulk of the code, but some additional
cleanup can be done here now that we don't need to distinguish
between load and store cases.
2023-04-12 16:32:46 +02:00
Florian Hahn
78148eba49
[Matrix] Fix crash during dot product lowering.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
2023-04-12 15:08:39 +01:00
Max Kazantsev
3b73892b43 [SimpleLoopUnswitch] Do not try to inject pointer conditions. PR62058
As shown in https://github.com/llvm/llvm-project/issues/62058, canonicalication
may fail with pointer types (and basically this transform is not expected to
work with pointers).
2023-04-12 20:38:17 +07:00
Dmitry Makogon
797da79a92 [LoopUtils] Add isKnownPositiveInLoop and isKnownNonPositiveInLoop functions 2023-04-12 18:46:11 +07:00
Hans Wennborg
a6d9730f40 Revert "Move "auto-init" instructions to the dominator of their users"
This could also move initialization of sret args, causing actually
initialized parts of such return values to be uninitialized. See
discussion on the code review.

> As a result of -ftrivial-auto-var-init, clang generates instructions to
> set alloca'd memory to a given pattern, right after the allocation site.
> In some cases, this (somehow costly) operation could be delayed, leading
> to conditional execution in some cases.
>
> This is not an uncommon situation: it happens ~500 times on the cPython
> code base, and much more on the LLVM codebase. The benefit greatly
> varies on the execution path, but it should not regress on performance.
>
> This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
> MemorySSA update fixes.
>
> Differential Revision: https://reviews.llvm.org/D137707

This reverts commit 50b2a113db197a97f60ad2aace8b7382dc9b8c31
and follow-up commit ad9ad3735c4821ff4651fab7537a75b8f0bb60f8.
2023-04-12 13:37:21 +02:00
Alexey Bataev
d00158cd28 [SLP][NFC]Introduce ShuffleCostEstimator and adjustExtracts member function.
Added ShuffleCostEstimator class and the first adjustExtracts member,
which is just a copy of previous AdjustExtractCost lambda.

Differential Revision: https://reviews.llvm.org/D147787
2023-04-11 12:47:07 -07:00
OCHyams
9106960724 [Assignment Tracking][SROA] Don't un-poison dbg.assigns using multiple loc ops
Some dbg.assigns using poison become un-poisoned in SROA. The reason this
happens at all is because dbg.assigns linked to memory intrinsics use poison to
indicate they can't describe the stored value, but the value becomes available
after some optimisations. This needs reworking eventually, but for now we need
to ensure that when it does occur we don't create invalid expressions.

D147312 prevented this occuring when the dbg.assign uses DIArgLists, but that
wasn't a complete fix. We also need to ensure we avoid un-poisoning when the
existing expression uses more than one location operand (DW_OP_arg, n).

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D148020
2023-04-11 18:18:11 +01:00
Florian Hahn
68afaa3f48
[LV] Use std::make_optional to fix build failure after 082a0046.
Some compilers require std::make_optional(std::move()) to force construction
of the std::optional return value. This should fix the build failure in
  https://lab.llvm.org/buildbot#builders/67/builds/10991
2023-04-11 17:56:15 +01:00
Michael Liao
72fc08a541 [InstCombine] Teach alloca replacement to handle addrspacecast
- As the address space cast may not be valid on a specific target,
  `addrspacecast` is not handled when an `alloca` is able to be replaced
  with the source of memcpy/memmove. This patch addresses that by
  querying a target hook on whether that address space cast is valid.
  For example, on most GPU targets, the cast from a global pointer to a
  generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
  replacement is enhanced to handle that `addrspacecast` as well.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D147025
2023-04-11 11:47:37 -04:00
Ellis Hoag
244be0b0de [InstrProf] Temporal Profiling
As described in [0], this extends IRPGO to support //Temporal Profiling//.

When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.

Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.

In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.

Special thanks to Julian Mestre for helping with reservoir sampling.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Reviewed By: snehasish

Differential Revision: https://reviews.llvm.org/D147287
2023-04-11 08:30:52 -07:00
Anna Thomas
5675757f5f [GuardUtils] Add asserts about loop varying widenable conditions
We have now seen two miscompiles because of widening widenable
conditions at incorrect IR points and thereby changing a branch's loop
invariant condition to a loop-varying one (see PR60234 and PR61963).

This patch adds asserts in common guard utilities that we use for
widening to proactively catch these bugs in future.
Note that these asserts will not fire if we were to sink a widenable
condition from out of a loop into a loop (that's also incorrect for the
same reason as above).

Tested this without the fix for PR60234 (guard widening miscompile) and
confirmed the assert fires.

WARNING: Sometimes, the assert can fire if we failed to hoist the
invariant condition out of the loop. This is a pass-ordering issue or a
limitation in LICM, which would need an investigation. See details in
review.

Differential Revision: https://reviews.llvm.org/D147752
2023-04-11 10:54:07 -04:00
Florian Hahn
082a004690
[VPlan] Allow building a VPlan to may fail.
Update the planning code constructing VPlan to allow building VPlans to
fail. This allows us to gradually shift some legality checks to VPlan
construction. The first candidate is checking if all users of
first-order recurrence phis can be sunk past the recipe computing the
previous value.

The new functionality will be used by D142886 which is approved and will
be landed shortly.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142885
2023-04-11 15:41:18 +01:00
Max Kazantsev
a42f589197 [LICM][NFC] Unify arithmetic statistics collection
Avoid divergence b/w different kinds of hoisting with reassociation.
Make them all collect general stat NumHoisted and also specific stats
for each particular transform.
2023-04-11 17:20:02 +07:00
Max Kazantsev
7b8692a55c [LICM][NFC] Do not forward declaration of hoistMinMax
They all are now handled by hoistArithmetics, and only it should be
forwarded.
2023-04-11 17:06:20 +07:00
Nikita Popov
243df834c6 [LICM] Fix assert failure in no-allowspeculation mode
In this case the source GEP might not be hoisted even though it
has invariant operands. For now just bail out, but we might need
additional checks for AllowSpeculation in these special-case
reassociation folds.
2023-04-11 11:55:54 +02:00
Nikita Popov
b8917ac62a [LICM] Reassociate GEPs to allow hoisting
Reassociate gep (gep ptr, idx1), idx2 to gep (gep ptr, idx2), idx1
if this would make the inner GEP loop invariant and thus hoistable.

This is intended to replace an InstCombine fold that does this (in
04f61fb73d/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (L2006)).
The problem with the InstCombine fold is that LoopInfo is an optional
dependency, so it is not performed reliably.

Differential Revision: https://reviews.llvm.org/D146813
2023-04-11 10:34:04 +02:00
Max Kazantsev
cd24665f13 [NFC] Fix typo in statistic description 2023-04-11 14:18:53 +07:00
Max Kazantsev
e5dc4dbe87 [LICM][NFC] Restructure code to have one entry point for reassociation-based hoistings
We already hoist min/max functions and want to do more of this kind. Some
refactoring to make growth points for it.
2023-04-11 14:18:53 +07:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
954befe2a7
[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI).
Suggested as independent cleanup in D147567. Either VF or UF need to be
> 1. Note that if the condition would be false, the code below would use
a nullptr and crash.
2023-04-10 21:11:41 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Anna Thomas
27f8a62a54 [LoopPredication] Fix where we generate widened condition. PR61963
Loop predication's predicateLoopExit pass does two incorrect things:

It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.

The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.

Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.

Differential Revision: https://reviews.llvm.org/D147662
2023-04-10 10:37:05 -04:00
Florian Hahn
c255eb2c4b
[VPlan] Use VPLiveOut to update FOR live-out users.
Instead of iterating over all LCSSA phis in the exit block, collect all
LiveOut users of the FOR splice VPInstruction and only update those
users.

Building on top of D147471, this removes an access to the cost model
after VPlan execution.

Depends on D147471.

Reviewed By: Ayal, michaelmaitland

Differential Revision: https://reviews.llvm.org/D147472
2023-04-10 13:02:44 +01:00
Max Kazantsev
d0950d05a6 [NFC][IRCE] Do not store latch exit count
It is not actually used for any computations. Its only purpose is to
check that the loop is finite and find out the type of computed exit
count. Refactor code so that we only store this type.
2023-04-10 14:00:14 +07:00
Florian Hahn
0dbcbfe0d0
[VPlan] Don't assign slots for external defs (NFCI).
External defs are VPValues wrapping an IR value and hence will get
printed as ir<>. We don't need to assign a slot for a VPValue number.
2023-04-09 21:01:21 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
Florian Hahn
c7a34d355a
[VPlan] Require VFRange.End to be a power-of-2. (NFCI)
This removes the need to convert the end of the range to the next
power-of-2 for the end iterator after 4bd3fda5124962 and was suggested
as follow-up TODO in D147468.
2023-04-08 13:04:08 +01:00
Florian Hahn
4bd3fda512
[VPlan] Add VFRange::begin() and end() iterators. (NFCI)
Add an iterator to iterate over all VFs in VFRange. This simplifies some
existing code and allows using all_of,any_of and none_of on a VFRange.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147468
2023-04-08 10:22:25 +01:00
Zhongyunde
0e739ddd17 [MergeICmps] Attach metadata to new created loads
Use clone to keep the metadata, the issue is reported
by aeubanks on D141188.

Reviewed By: nikic, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D146702
2023-04-08 10:45:58 +08:00
Alexey Bataev
85327f307b [SLP][NFC]Make clusterSortPtrAccesses static. 2023-04-07 13:24:24 -07:00
Noah Goldstein
513251b765 [InstCombine] Improve transforms for (mul X, Y) -> (shl X, log2(Y)
Using the more robust log2 search allows us to fold more cases (same
logic as exists for idiv/irem).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D146347
2023-04-07 14:58:20 -05:00
Alexey Bataev
6ff177d928 [SLP][NFC]Improve SLP time by precomputing value<->gather nodes
dependencies.

Improved compiled time by the precomputing the mapping between gathered
scalars and their gather/buildvector nodes for later use in
isGatherShuffledEntry to avoid recomputing this map each time this
function is called.
2023-04-07 12:12:02 -07:00
Florian Hahn
11896357d4
[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI).
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147467
2023-04-07 13:11:03 +01:00
Serguei Katkov
f38365aef4 [InstCombine] Add support for maximum(a,b) + minimum(a,b) => a + b
Unfortunately alive2 cannot prove the correctness due to fails by timeout even for
float type half.

However it should be correct. If a and b are not NaN, maximum and minimum will just
return different values (a and b) and take into account a + b == b + a this is the same.
If a or b is NaN, than maximum and minimum are equal to NaN and NaN + NaN is NaN.
a + b is also a NaN.

In terms of preserving fast flags, we cannot preserve ninf due to
minimum(NaN, Infinity) == maximum(NaN, Infinity) == NaN,
minimum(NaN, Infinity) +ninf maximum(NaN, Infinity) == NaN +ninf NaN = NaN
However transformation will change
minimum(NaN, Infinity) + maximum(NaN, Infinity) to NaN +ninf Infinity == poison.

But if fadd is marked as nnan, we can preserve because NaN +ninf/nnan NaN = poison as well.

The same optimization for
  maximum(a,b) * minimum(a,b) => a * b
is added.
All said above for fadd is correct for fmul.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147299
2023-04-07 12:38:04 +07:00
Serguei Katkov
624973806c [InstCombine] Add support for max(a,b) + min(a,b) => a + b. Re-land.
The same optimization for
  max(a,b) * min(a,b) => a * b
is added.

Correctness check:
uadd: https://alive2.llvm.org/ce/z/2rXDek
sadd: https://alive2.llvm.org/ce/z/zNu_er
uadd + nuw/nsw: https://alive2.llvm.org/ce/z/EaiNjB
sadd + nuw/nsw: https://alive2.llvm.org/ce/z/w_2Nrs

umul: https://alive2.llvm.org/ce/z/dgXRLr
smul: https://alive2.llvm.org/ce/z/hBjGzz
umul + nuw/nsw: https://alive2.llvm.org/ce/z/EaiNjB
smul + nuw/nsw: https://alive2.llvm.org/ce/z/87MNeS

Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D147296
2023-04-07 11:56:05 +07:00
Serguei Katkov
4665f3c838 Revert "[InstCombine] Add support for max(a,b) + min(a,b) => a + b."
Revert commit due to failure on buildbot:
error: 'match_combine_or' may not intend to support class template argument deduction

This reverts commit b86a06ef284f2637bef89bf5bb20157a8b195568.
2023-04-07 11:14:28 +07:00
Serguei Katkov
b86a06ef28 [InstCombine] Add support for max(a,b) + min(a,b) => a + b.
The same optimization for
  max(a,b) * min(a,b) => a * b
is added.

Correctness check:
uadd: https://alive2.llvm.org/ce/z/2rXDek
sadd: https://alive2.llvm.org/ce/z/zNu_er
uadd + nuw/nsw: https://alive2.llvm.org/ce/z/EaiNjB
sadd + nuw/nsw: https://alive2.llvm.org/ce/z/w_2Nrs

umul: https://alive2.llvm.org/ce/z/dgXRLr
smul: https://alive2.llvm.org/ce/z/hBjGzz
umul + nuw/nsw: https://alive2.llvm.org/ce/z/EaiNjB
smul + nuw/nsw: https://alive2.llvm.org/ce/z/87MNeS

Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D147296
2023-04-07 10:24:07 +07:00
Alexey Bataev
52dd72a37a [SLP][NFC]Make adjustExtracts/needToDelay members of ShuffleInstructionBuilder.
Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model.

Differential Revision: https://reviews.llvm.org/D147730
2023-04-06 16:27:19 -07:00
Michael Maitland
e86ed9bf2a [LV][NFC] Improve complexity of fixing users of recurrences
The original loop has O(MxN) since `is_contained` iterates over
all incoming values. This change makes it so only the phis
which use the value as an incoming value are iterated over so
it is now O(M).

Differential Revision: https://reviews.llvm.org/D146999
2023-04-06 16:15:51 -07:00
Florian Hahn
3f36b9b456
[LV] Move conditional MaskForGaps construction to load case.
Conditionally setting MaskForGaps is only needed for loads. This avoid
re-computing MaskForGaps for stores.

Suggested as independent cleanup in D147467.
2023-04-06 21:16:37 +01:00
Alexey Bataev
e58a49300e [SLP][NFC]Evaluate FMF for reductions before the loop, no need to
reevaluate it.
2023-04-06 11:57:20 -07:00
Alexey Bataev
50af6ab0ab [SLP]Fix emission of the masks in shuffles for undefs.
If the value is used in the expression, need to adjust the mask before
applying the mask. Plus, need to fix the analysis of the phi nodes for
reused scalars.
2023-04-06 10:16:58 -07:00
Alexey Bataev
cf62adbbd8 [SLP]Fix delete of the extractelement with users.
Made the condition for the erasing of the gathered extractelements
stricter, remove it only if it has single vectorized use, otherwise
leave it for instcombiner/instsimplify analysis.
2023-04-06 09:15:30 -07:00
Philip Reames
92aae9e725 [LV] Remove a cover function with a single use [nfc]
And more importantly, move the fixme to the sole caller where it actually makes sense in context.
2023-04-06 08:27:57 -07:00
Dávid Bolvanský
d5fe5604a6 Revert "xxx"
This reverts commit f60592438a7446595cfbfa3944681c689952d859.
2023-04-06 16:54:00 +02:00
Dávid Bolvanský
f60592438a xxx 2023-04-06 16:51:31 +02:00