31103 Commits

Author SHA1 Message Date
Johannes Doerfert
c72d93a08a [Attributor][NFC] Remove unnecessary overwritten methods 2022-07-21 21:57:02 -05:00
Chenbing Zheng
1a0187c9e7 [InstCombine] remove useless ‘InstCombiner::’. nfc
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130220
2022-07-22 09:24:24 +08:00
Philip Reames
bd75350180 [LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst
This code confuses LV's "Uniform" and LVL/LAI's "Uniform".  Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
  required.  That is, lanes within a single unroll factor are considered
  uniform.  This allows e.g. widenable memory ops to be considered
  uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.

IsUniformMem is in turn defined in terms of LAI's notion.  Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.

The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity.  In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.

This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible.  The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR.  Both before and after IR are correct so the
differences aren't "interesting".

The other test changes are uninteresting.  They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.
2022-07-21 15:44:34 -07:00
Alexander Shaposhnikov
e9afdf838e [GlobalOpt] Enable evaluation of atomic loads
Relax the check to allow evaluation of atomic loads
(but still skip volatile loads).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D130211
2022-07-21 21:36:11 +00:00
Augie Fackler
bd6aa67e02 BuildLibCalls: move inference of freeing memory later
This probably should have been part of D123089, but the effects of it
don't show up until we start removing functions from the table in
D130107. Oops.

Differential Revision: https://reviews.llvm.org/D130184
2022-07-21 15:31:16 -04:00
Sanjay Patel
78c09f0f24 [PatternMatch][InstCombine] match a vector with constant expression element(s) as a constant expression
The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.

Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.

Differential Revision: https://reviews.llvm.org/D130286
2022-07-21 15:23:57 -04:00
David Sherwood
f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
Nikita Popov
1f69503107 [MemoryBuiltins] Add getReallocatedOperand() function (NFC)
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
2022-07-21 14:54:16 +02:00
Nikita Popov
46e6dd84b7 [MemoryBuiltins] Remove isFreeCall() function (NFC)
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
2022-07-21 14:44:23 +02:00
Nikita Popov
5e856a8578 [InstCombine] Use getFreedOperand() (NFC)
Use getFreedOperand() instead of isFreeCall() to remove the
implicit assumption that any pointer operand to a free function
is the operand being freed. This won't actually matter until we
handle allockind(free).
2022-07-21 14:33:55 +02:00
Nikita Popov
3ac8587a2b [Attributor] Use getFreedOperand() (NFC)
Track which operand is actually freed, to avoid the implicit
assumption that it is the first call argument.
2022-07-21 14:26:47 +02:00
Nikita Popov
c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Nikita Popov
8d58c8e57b Reapply [InstCombine] Don't check for alloc fn before fetching alloc size
Reapply the patch with getObjectSize() replaced by getAllocSize().
The former will also look through calls that return their argument,
and we'll end up placing dereferenceable attributes on intrinsics
like llvm.launder.invariant.group. While this isn't wrong, it also
doesn't seem to be particularly useful. For now, use getAllocSize()
instead, which sticks closer to the original behavior of this code.

-----

This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 11:48:24 +02:00
Nikita Popov
70056d04e2 Revert "[InstCombine] Don't check for alloc fn before fetching object size"
This reverts commit c72c22c04df992c95c5912d0075e5263c88f9fec.

This affected an Analysis test that I missed. Reverting for now.
2022-07-21 10:59:12 +02:00
Nikita Popov
c72c22c04d [InstCombine] Don't check for alloc fn before fetching object size
This code is just interested in the allocsize, not any other
allocator properties.
2022-07-21 10:45:03 +02:00
Nikita Popov
f45ab43332 [MemoryBuiltins] Avoid isAllocationFn() call before checking removable alloc
Alloc directly checking whether a given call is a removable
allocation, instead of first checking whether it is an allocation
first.
2022-07-21 09:39:19 +02:00
Chenbing Zheng
8c124c9088 [InstCombine] (ShiftValC >> Y) >s -1/<s 0 --> Y != 0/==0
We can do folds (ShiftValC >> Y) >s -1 --> Y != 0 and
(ShiftValC >> Y) <s 0 --> Y == 0, with ShiftValC < 0.

Alive2: https://alive2.llvm.org/ce/z/-PRHfD

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129726
2022-07-21 10:12:29 +08:00
Chenbing Zheng
8075f680c8 [InstCombine] add fold (X > C - 1) ^ (X < C + 1) --> X != C
Considering the correctness of this pattern, we should avoid that C - 1
is non-negative and C + 1 is negative.

Alive2: https://alive2.llvm.org/ce/z/c_rBaq

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D129622
2022-07-21 10:08:21 +08:00
Johannes Doerfert
ad98ef8be4 [Attributor] Deal with complex PHI nodes better during AAPointerInfo
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.

This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
2022-07-20 17:34:50 -05:00
Johannes Doerfert
142897dd7d [Attributor] Only non-exact accesses require a uniform bit-pattern (=0)
If we only have exact accesses we should never require the bit-pattern
to be uniform (in this case 0). Only a non-exact access should force us
to require only 0 values.
2022-07-20 17:34:50 -05:00
Alexander Shaposhnikov
67f1fe8597 [GlobalOpt] Enable evaluation of atomic stores
Relax the check to allow evaluation of atomic stores
(but still skip volatile stores).

Test plan:
1/ ninja check-llvm check-clang
2/ Bootstrapped LLVM/Clang pass tests

Differential revision: https://reviews.llvm.org/D129841
2022-07-20 22:33:58 +00:00
Schrodinger ZHU Yifan
304027206c [ThinLTO] Support aliased GlobalIFunc
Fixes https://github.com/llvm/llvm-project/issues/56290: when an ifunc is
aliased in LTO, clang will attempt to create an alias summary; however, as ifunc
is not included in the module summary, doing so will lead to crash.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D129009
2022-07-20 15:30:38 -07:00
Craig Topper
d76c8f5127 [InstCombine] Add mul with negated power of 2 constant to canEvaluateShifted.
If we are right shifting a multiply by a negated power of 2 where
the power of 2 is the same as the shift amount, we can replace with
a negate followed by an And.

New tests have not been committed yet but the patch shows the diffs.
Let me know if you want any changes or additional tests.

Differential Revision: https://reviews.llvm.org/D130103
2022-07-20 11:00:22 -07:00
Ruobing Han
2b98b8e8fb fix bug for useless malloc elimination in CodeGenPrepare
Put AllocationFn check before I->willReturn can allow CodeGenPrepare to remove useless malloc instruction

Differential Revision: https://reviews.llvm.org/D130126
2022-07-20 16:29:51 +00:00
Philip Reames
523a526a02 [LV] Fix miscompile due to srem/sdiv speculation safety condition
An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block.

Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose.

Differential Revision: https://reviews.llvm.org/D130106
2022-07-20 05:35:23 -07:00
Nicolai Hähnle
1ddc51d89d Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

v3 (resubmit after revert at 344378808778c61d5599f4e0ac783ef7e6f8ed05):
- update Clang tests

Differential Revision: https://reviews.llvm.org/D129860
2022-07-20 14:17:23 +02:00
Florian Hahn
5124b21648
[VPlan] Initial def-use verification.
This patch introduces some initial def-use verification. This catches
cases like the one fixed by D129436.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D129717
2022-07-20 11:06:32 +01:00
Fangrui Song
e931c2e870 [LegacyPM] Remove InstrOrderFileLegacyPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-19 23:58:51 -07:00
Kazu Hirata
0387da6f4f Use value instead of getValue (NFC) 2022-07-19 21:18:26 -07:00
Kazu Hirata
41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Johannes Doerfert
f84712f0b8 [Attributor] Teach checkForAllUses to follow returns into callers
If we can determine all call sites we can follow a use in a return
instruction into the caller. AAPointerInfo utilizes this feature.
2022-07-19 18:17:40 -05:00
Johannes Doerfert
4f2ccdd0b1 [Attributor][NFC] Improve debug messages 2022-07-19 18:17:40 -05:00
Nick Desaulniers
1cf6b93df1 Revert "[Local] Allow creating callbr with duplicate successors"
This reverts commit 08860f525a2363ccd697ebb3ff59769e37b1be21.

Crashes during PPC64LE linux kernel builds as reported by @nathanchance.
https://reviews.llvm.org/D129997#3663632
2022-07-19 15:03:27 -07:00
Johannes Doerfert
bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80589d1c5a1154b92cc3af9495f8f86c.
2022-07-19 16:24:42 -05:00
Arthur Eubanks
13aa2c1c3b [DSE] Revisit pointers that may no longer escape after removing another store
In dependent-capture, previously we'd see that %tmp4 is captured due to
the first store. We'd cache this info in CapturedBeforeReturn and
InvisibleToCallerAfterRet. Then the first store is then removed, causing
the cached values to be wrong.

We also need to revisit everything because normally we work backwards
when removing stores at the end of the function, but in this case
removing an earlier store causes a later store to be removable.

No compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=56796ae1a8db4c85dada28676f8303a5a3609c63&to=21b7e5248ffc423cd36c9d4a020085e363451465&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D123686
2022-07-19 09:30:34 -07:00
Sanjay Patel
3d6c10dcf3 [SimplifyLibCalls] avoid converting pow() to powi() with no FMF
powi() is not a standard math library function; it is specified
with non-strict semantics in the LangRef. We currently require
'afn' to do this transform when it needs a sqrt(), so I just
extended that requirement to the whole-number exponent too.

This bug was introduced with:
b17754bcaa14
...where we deferred expansion of pow() to later passes.
2022-07-19 12:26:53 -04:00
Arnold Schwaighofer
bc4870f09e [coro async] Add missing llvm.coro.id.async intrinsic to declaresCoroCleanupIntrinsics
rdar://97214593

Differential Revision: https://reviews.llvm.org/D130038
2022-07-19 07:25:04 -07:00
Andrew Turner
b850762b62 Add the FreeBSD AArch64 memory layout
Use the FreeBSD AArch64 memory layout values when building for it.
These are based on the x86_64 values, scaled to take into account the
larger address space on AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125883
2022-07-19 09:58:07 -04:00
Andrew Turner
e13bd2644e Add the FreeBSD AArch64 shadow offset to llvm
AArch64 has a larger address space than 64 but x86. Use the larger
shadow offset on FreeBSD AArch64.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125873
2022-07-19 09:58:07 -04:00
William Schmidt
bccc9aa81c Don't vectorize PHIs in catchswitch blocks
We currently assert in vectorizeTree(TreeEntry*) when processing a PHI
bundle in a block containing a catchswitch.  We attempt to set the
IRBuilder insertion point following the catchswitch, which is invalid.
This is done so that ShuffleBuilder.finalize() knows where to insert
a shuffle if one is needed.

To avoid this occurring, watch out for catchswitch blocks during
buildTree_rec() processing, and avoid adding PHIs in such blocks to
the vectorizable tree.  It is unlikely that constraining vectorization
over an exception path will cause a noticeable performance loss, so
this seems preferable to trying to anticipate when a shuffle will and
will not be required.
2022-07-19 06:10:17 -07:00
Nikita Popov
08860f525a [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This
patch removes a limitation which prevents optimizations from actually
producing such callbrs.

Differential Revision: https://reviews.llvm.org/D129997
2022-07-19 14:28:22 +02:00
Florian Hahn
a75760a269
[LV] Remove unnecessary cast in widenCallInstruction. (NFC) 2022-07-19 11:23:24 +01:00
Max Kazantsev
82309831c3 [LoopSimplifyCFG] Prevent use-def dominance breach by handling dead exits. PR56243
One of the transforms in LoopSimplifyCFG demands that the LCSSA form is
truly maintained for all values, tokens included, otherwise it may end up creating
a use that is not dominated by def (and Phi creation for tokens is impossible).
Detect this situation and prevent transform for it early.

Differential Revision: https://reviews.llvm.org/D129984
Reviewed By: efriedma
2022-07-19 15:54:12 +07:00
Ellis Hoag
3580daacf3 [InstrProf] Allow CSIRPGO function entry coverage
The flag `-fcs-profile-generate` for enabling CSIRPGO moves the pass
`pgo-instrumentation` after inlining. Function entry coverage works fine
with this change, so remove the assert. I had originally left this
assert in because I had not tested this at the time.

Reviewed By: davidxl, MaskRay

Differential Revision: https://reviews.llvm.org/D129407
2022-07-18 15:10:11 -07:00
Florian Hahn
30e53b8c03
[LV] Sink module variable and use State to set it in widenCall. (NFC)
Limits the lifetime of the variable and makes it independent of
CallInst.
2022-07-18 19:41:48 +01:00
Arnold Schwaighofer
28ebd13d63 [coro async] Fix code to run coro.async.end cleanup like the legacy pass did
The code executed for the Switch ABI does not change.

rdar://97074714

Differential Revision: https://reviews.llvm.org/D129865
2022-07-18 10:41:29 -07:00
Nicolai Hähnle
3443788087 Revert "Inliner: don't mark call sites as 'nounwind' if that would be redundant"
This reverts commit 9905c379819fafdc2246bcd24dd7165bd72d7659.

Looks like there are Clang changes that are affected in trivial ways. Will look into it.
2022-07-18 17:43:35 +02:00
Nicolai Hähnle
9905c37981 Inliner: don't mark call sites as 'nounwind' if that would be redundant
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.

If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.

v2:
- just check CI->doesNotThrow

Differential Revision: https://reviews.llvm.org/D129860
2022-07-18 17:28:52 +02:00
Sanjay Patel
26fbb79c33 [InstCombine] reduce code for signbit folds; NFC 2022-07-18 11:04:58 -04:00
Nikita Popov
21e2f133a8 [LoopSimplifyCFG] Revert accidental change
This change was included in an unrelated change
b57d61384c9938e3dfa54b55bf8b2a0a05e67e28
and was of course not intended for commit...
2022-07-18 15:30:13 +02:00