31753 Commits

Author SHA1 Message Date
Sanjay Patel
8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Nikita Popov
779fd39684 Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify
Relative to the previous attempt, this is rebased over the
InstSimplify fix in ac74e7a7806480a000c9a3502405c3dedd8810de,
which addresses the miscompile reported in PR58401.

-----

foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.

This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.

This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.

Differential Revision: https://reviews.llvm.org/D134954
2022-10-17 16:11:05 +02:00
Florian Hahn
699396131f
Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify"
This reverts commit 333246b48ea4a70842e78c977cc92d365720465f.

It looks like this patch causes a mis-compile:
https://github.com/llvm/llvm-project/issues/58401

Fixes #58401.
2022-10-17 12:56:28 +01:00
Sjoerd Meijer
233659c7ae [LoopFlatten] Enable it by default
LoopFlatten has been in the code base off by default for years, but this
enables it to run by default. Downstream this has been running for
years, so it has been exposed to quite some code. Then around the time
we switched to the NPM, several fixes went in related to updating the
MemorySSA state and we moved it to a loop pass manager, which both
helped preventing rerunning certain analysis passes, and thus helped a
bit with compile-times.

About compile-times, adding a pass isn't free, but this should see only
very minor increases. The pass is relatively simple and there shouldn't
be anything algorithmically expensive because all it does is looking at
inner/outer loops and it checks assumptions on loop increments and
indices. If we see increases, I expect this to mainly come from
invalidation of analysis info, and perhaps subsequent passes to trigger
and do more. Despite its simplicity/restrictions, it triggers in most
code-bases, which makes it worth to enable this by default.

Differential Revision: https://reviews.llvm.org/D109958
2022-10-17 17:11:39 +05:30
Chuanqi Xu
1cedc51ff5 [Coroutines] Don't merge readnone calls in presplit coroutines
Another alternative to fix the thread identification problem in
coroutines.

We plan to fix this problem by unifying memory effecting attributes. See
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
But it may be a long-term project. And it is a pity that the coroutines
can't resume in different threads for years. So this one is temporary
fix. It may cause unnecessary performance regression for coroutines. But
correctness are more important. And this one is planned to be reverted
after we are able to unify the memory effecting attributes actually.

Reviewed By: jdoerfert, rjmccall

Differential Revision: https://reviews.llvm.org/D135550
2022-10-17 10:22:43 +08:00
Kazu Hirata
5ea3155565 [llvm] Use llvm::find (NFC) 2022-10-16 16:21:00 -07:00
Florian Hahn
462ab9810d
[ConstraintElim] Fix signed integer overflow for inbounds GEP.
For inbounds GEPs, signed overflow yields poison, so it is fine for the
coefficients to wrap as well. This fixes an UBSan failure.
2022-10-16 23:25:28 +01:00
Florian Hahn
aec0c1009f
[ConstraintElim] Replace custom GEP index handling by using existing code
Instead of duplicating the existing decomposition code for GEP indices
just use the existing code by calling the existing decompose function on
the index expression and multiply the result's coefficients by the scale of
the index.

This both reduces code duplication and generalizes the pattern we can
handle.
2022-10-16 21:53:11 +01:00
Florian Hahn
a4635ec710
[ConstraintElim] Support add nsw for unsigned preds with positive ops.
If both operands of an `add nsw` are known positive, it can be treated
the same as `add nuw` and added to the unsigned system.

https://alive2.llvm.org/ce/z/6gprff
2022-10-16 20:25:14 +01:00
Sanjay Patel
e5ee0b06d6 [InstCombine] try to determine "exact" for sdiv
If the divisor is a power-of-2 or negative-power-of-2 and the dividend
is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)

This isn't the most direct optimization (we could create ashr in these
examples instead of relying on existing folds for exact divides), but
it's possible that there's a more general constraint than just a pow2
divisor, so this might be extended in the future.

This should solve issue #58348.

Differential Revision: https://reviews.llvm.org/D135970
2022-10-16 10:59:56 -04:00
Sanjay Patel
340ae45be0 [InstCombine] use isKnownNonNegative() for readability; NFCI
This should be functionally equivalent - both calls are thin
wrappers around computeKnownBits(). We'll probably want to use
known-bits directly in follow-up patches because that could
determine "exact" for example (see issue #58348).
2022-10-16 10:59:56 -04:00
Kazu Hirata
b2f41e9ac1 [Vectorize] Use std::conditional_t (NFC) 2022-10-15 14:52:25 -07:00
Florian Hahn
7c1b80e35c
[ConstraintElim] Support unsigned decomposition of mul/shl nuw..const
Support decomposition for `mul/shl nuw` with constant operand for unsigned
queries. Those expressions should not wrap in the unsigned sense and can
be added directly to the unsigned system.
2022-10-15 21:28:08 +01:00
Florian Hahn
f12684d36e
[ConstraintElim] Support signed decomposition of add nsw.
Add support decomposition for `add nsw` for signed queries.
`add nsw` won't wrap and can be directly added to the signed
system.
2022-10-15 18:34:03 +01:00
Zequan Wu
82035ec777 Revert "[PGO] Make emitted symbols hidden"
This reverts commit ecac223b0e4b05a65cf918f90824380db6b9ce64.

The commit causes instrprof-darwin-dead-strip.c to fail on mac.
2022-10-14 15:23:26 -07:00
Florian Hahn
16cf666bb7
[Loop] Move block and loop dispo invalidation to makeLoopInvariant.
makeLoopInvariant may recursively move its operands to make them
invariant, before moving the passed in instruction. Those recursively
moved instructions are currently missed when invalidating block and loop
dispositions.

To address this, move the invalidation code to Loop::makeLoopInvariant.

Fixes #58314.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135909
2022-10-14 21:58:14 +01:00
Zain Jaffal
0c8dde551c [ConstraintElimination] Move logic for replacing ssub overflow users (NFC)
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134044
2022-10-14 21:14:21 +01:00
Argyrios Kyrtzidis
d877e3fe71 [Transforms/ObjCARC] Fix non-deterministic output of ObjCARCOptPass
`ProvenanceAnalysis::related()` was assuming that the order of parameters for `relatedCheck()` was not affecting
the result but this was not the case when both parameters were `PHINode`s.
Due to this assumption `ProvenanceAnalysis::related()` was ordering the parameters based on pointer value which resulted in
non-deterministic behavior.

To address this change `relatedPHI()` so that it gives the same result independent of the parameter order.

rdar://100325456

Differential Revision: https://reviews.llvm.org/D135376
2022-10-14 12:26:58 -07:00
Craig Topper
d3366efd43 [LV] Simplify register usage code and avoid double map lookups. NFC
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.

For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135977
2022-10-14 11:55:48 -07:00
Florian Hahn
5a68e578ca
[ConstraintElim] Add debug message when decomposition fails. 2022-10-14 11:02:05 +01:00
Wolfgang Pieb
b43a1d1bd9 [PGO] Do not create block count annotations when all weights are 0,
avoiding an assertion.

A BB with a nonzero count, whose successor blocks all have 0 counts, could
cause an assertion. Don't create any branch weights in this case.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D134203
2022-10-13 14:57:42 -07:00
Sanjay Patel
d85505a932 [InstCombine] fold logical and/or to xor
(A | B) & ~(A & B) --> A ^ B

https://alive2.llvm.org/ce/z/qpFMns

We already have the equivalent fold for real
logic instructions, but this pattern may occur
with selects too.

This is part of solving issue #58313.
2022-10-13 16:12:20 -04:00
Florian Hahn
572d5d374c
[ConstraintElim] Add support for GEPs with multiple indices.
Lift restriction on GEPs with a single index by iterating over all
indices and joining the {Coefficient, Variable} entries for all indices
together.
2022-10-13 21:08:33 +01:00
Alex Brachet
ecac223b0e [PGO] Make emitted symbols hidden
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562

Bug: https://github.com/llvm/llvm-project/issues/58265

Differential Revision: https://reviews.llvm.org/D135340
2022-10-13 19:47:15 +00:00
Florian Hahn
71c49d189a
[ConstraintElim] Move check-and-replace logic to helper function (NFC).
Move logic to check and replace conditions to a helper function. This
isolates the code, allows using early returns, reduces the
indentation and simplifies eliminateConstraints.
2022-10-13 18:58:37 +01:00
Nikita Popov
b54b84fde6 [MemCpyOpt] Add additional debug output (NFC) 2022-10-13 17:03:44 +02:00
Alexey Bataev
c787986cdd [SLP]Improve costs of vectorized loads/stores by analyzing GEPs.
When generating masked gathers nodes, SLP vectorizer accounts the cost
of the GEPs for loads as part of the scalar-vector transformation cost
estimation. But it does not do it for vectorized loads/stores, while it
may completely remove some of the GEPs completely. Because of this in
some cases masked gather operation can be much more profitable rather
than regular vectorization (masked-gather cost + vector GEP - scalar
loads + GEPs comparing to vectorized loads - scalar loads).
Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation.

Differential Revision: https://reviews.llvm.org/D135282
2022-10-13 07:20:41 -07:00
Philip Reames
fe755af3a9 Revert "Remove PlaceSafepoints pass"
This reverts commit cb66e123c6bc82a793300b6fb3ecbed79c58f557.  It was reported via https://reviews.llvm.org/rGcb66e123c6bc82a793300b6fb3ecbed79c58f557#1132969 that the Microsoft.NET compiler is still using this pass.
2022-10-13 07:17:25 -07:00
Florian Hahn
019049a1ca
[ConstraintElim] Use MulOverflow to avoid UB on signed overflow.
This fixes an UBSan failure after 359bc5c541ae. For inbounds GEP with
index sizes <= 64, having the coefficients overflowing is fine.
2022-10-13 13:57:43 +01:00
Nikita Popov
d44cd1bbeb Revert "[FunctionAttrs] Make location classification more precise"
This reverts commit b05f5b90a12098660a4fc16da0b4d421ddfe14e2.

There are thread sanitizer buildbot failures in simple_stack.c.
I think that's because this ended up affecting the handling of
volatile accesses to allocas. Reverting for now.
2022-10-13 12:11:04 +02:00
Nikita Popov
b05f5b90a1 [FunctionAttrs] Make location classification more precise
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
2022-10-13 11:24:23 +02:00
Florian Hahn
359bc5c541
[ConstraintElim] Bail out for GEPs when index size > 64 bits.
Limit pointer decomposition to pointers with index sizes of at most 64
bits. int64_t is used for coefficients, so as long as the index size <=
64 bits we should be able to represent all pointer offsets.

Pointer decomposition is limited to inbounds GEPs, so if a index
computation would overflow the result is poison, so it doesn't matter
that the coefficient overflows.

This allows replacing MulOverflow with regular multiplications.
2022-10-13 10:19:30 +01:00
Nikita Popov
440ce05fbf [FunctionAttrs] Handle potential access of captured argument
We have to account for accesses to argument memory via captures.
I don't think there's any way to make this produce incorrect
results right now (because as soon as "other" is set, we lose the
ability to infer argmemonly), but this avoids incorrect results
once we have more precise representation.
2022-10-13 11:15:36 +02:00
Nikita Popov
5b3776842f [FunctionAttrs] Account for memory effects of inalloca/preallocated
The code for inferring memory attributes on arguments claims that
inalloca/preallocated arguments are always clobbered:
d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)

However, we would still infer memory attributes for the whole
function without taking this into account, so we could still end
up inferring readnone for the function. This adds an argument
clobber if there are any inalloca/preallocated arguments.

Differential Revision: https://reviews.llvm.org/D135783
2022-10-13 10:20:17 +02:00
Florian Hahn
0ebd288338
[ConstraintElim] Move GEP decomposition code to separate fn (NFC).
Breaks up a large function and allows for the use to early exits.
2022-10-12 20:39:05 +01:00
Arthur Eubanks
f59e1bcc22 [PrintPipeline] Handle CoroConditionalWrapper and add more verification
Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable.
Can be disabled because we don't expect to handle every pass in -print-pipeline-passes.

Fixes #58280.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135703
2022-10-12 09:36:45 -07:00
Sanjay Patel
7b9482df3d [InstCombine] fold sdiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/CLKzqT

This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.

This extends 008a89037a49ca0d9 and is another transform
derived from issue #58137.
2022-10-12 11:32:15 -04:00
Alexey Bataev
d71ad41080 [SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine  process. Without it we may endup with the incorrect
transformation.

Differential Revision: https://reviews.llvm.org/D135591
2022-10-12 08:18:30 -07:00
Sanjay Patel
008a89037a [InstCombine] fold udiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/E5eaxU

This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
2022-10-12 11:12:26 -04:00
Jordan Rupprecht
cbae57c0e1 [NFC] Ignore unused var in no-asserts builds 2022-10-12 08:11:10 -07:00
Alexey Bataev
1be3428ea0 [SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.

Differential Revision: https://reviews.llvm.org/D135382
2022-10-12 07:32:54 -07:00
Sanjay Patel
fe97f95036 [InstCombine] propagate "exact" through folds of div
These folds were added recently with:
6b869be8100d
8da2fa856f1b
...but they didn't account for the "exact" attribute,
and that can be safely propagated:
https://alive2.llvm.org/ce/z/F_WhnR
https://alive2.llvm.org/ce/z/ft9Cgr
2022-10-12 09:25:05 -04:00
Sanjay Patel
d117ee25b8 [InstCombine] add helper function for div+shl folds; NFC
There are at least 2 similar patterns that could be added here,
and the existing fold can be improved because it fails to
propagate "exact".
2022-10-12 09:25:04 -04:00
Florian Hahn
c1fe52bfa6
[VPlan] Remove dead recipes before sinking.
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133762
2022-10-12 12:49:42 +01:00
Max Kazantsev
fbad5fdc03 [NFC] Perform all legality checks for non-trivial unswitch in one function
They have been scattered over the code. For better structuring, perform
them in one place. Potential CT drop is possible because we collect exit
blocks twice, but it's small price to pay for much better code structure.
2022-10-12 18:35:12 +07:00
Max Kazantsev
6bfcac612f [SimpleLoopUnswitch][NFC] Separate legality checks from cost computation
These are semantically two different stages, but were entwined in the
old implementation. Now cost computation does not do legality checks,
and they all are done beforehead.
2022-10-12 13:31:36 +07:00
Max Kazantsev
421728b40c [NFC] Factor out computation of best unswitch cost candidate
Split out a major peice of this method to make code more readable.
2022-10-12 12:36:46 +07:00
Fangrui Song
8ef3fd8d59 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-10-11 15:30:07 -07:00
Florian Hahn
be611ef7fa
[LoopRotation] Also drop block dispositions.
LoopRotation may also fold basic blocks, so cached block dispositions
also need to be dropped.

Fixes #58291.
2022-10-11 15:25:27 +01:00
Sanjay Patel
7ec604a317 [InstCombine] try harder to cancel out mul/div
((Op1 * X) / Y) / Op1 --> X / Y
https://alive2.llvm.org/ce/z/JYxWjA

InstSimplify handles the more basic mul+div pattern with
shared operand, but we don't seem to have any reassociation
folds to handle cases where the common op is further away.

This is a generalization of 9cff4711ac72 and another
transform derived from issue #58137.
2022-10-11 09:51:51 -04:00