31400 Commits

Author SHA1 Message Date
Momchil Velikov
078899cd64 [SimplifyCFG] Allow SimplifyCFG hoisting to skip over non-matching instructions
SimplifyCFG does some common code hoisting, which is limited
to hoisting a sequence of identical instruction in identical
order and stops at the first non-identical instruction.

This patch allows hoisting instruction pairs over
same-length sequences of non-matching instructions. The
linear asymptotic complexity of the algorithm stays the
same, there's an extra parameter
`simplifycfg-hoist-common-skip-limit` serving to limit
compilation time and/or the size of the hoisted live ranges.

The patch improves SPECv6/525.x264_r by about 10%.

Reviewed By: nikic, dmgreen

Differential Revision: https://reviews.llvm.org/D129370
2022-09-05 15:13:46 +01:00
Tian Zhou
8fa432be4f [InstCombine] reduce test-for-overflow of shifted value
Fixes #57338.

The added code makes the following transformations:

For unsigned predicates / eq / ne:
icmp pred (x << 1), x --> icmp getSignedPredicate(pred) x, 0
icmp pred x, (x << 1) --> icmp getSignedPredicate(pred) 0, x

Some examples:
https://alive2.llvm.org/ce/z/ckn4cj
https://alive2.llvm.org/ce/z/h-4bAQ

Differential Revision: https://reviews.llvm.org/D132888
2022-09-05 09:51:51 -04:00
Florian Hahn
408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Nikita Popov
388b684354 [LICM] Separate check for writability and thread-safety (NFCI)
This used a single check to make sure that the object is both
writable and thread-local. Separate them out to make the
deficiencies in the current code more obvious.
2022-09-05 09:43:17 +02:00
Florian Hahn
ba3d29f871
[LCSSA] Update unreachable uses with poison.
Users of LCSSA may not expect non-phi uses when checking the uses
outside a loop, which may cause crashes. This is due to the fact that we
do not update uses in unreachable blocks.

To ensure all reachable uses outside the loop are phis, update uses in
unreachable blocks to use poison in dead code.

Fixes #57508.
2022-09-04 22:26:18 +01:00
Kazu Hirata
7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Fangrui Song
9fc679b87c [SanitizerCoverage] Simplify pc-table and improve test. NFC 2022-09-03 14:29:21 -07:00
Kazu Hirata
9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Kazu Hirata
fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Sanjay Patel
22e1f66f26 [SCCP] add helper function for replacing signed operations; NFC
Preliminary refactoring for planned enhancement in D133198.
2022-09-03 10:30:10 -04:00
Sanjay Patel
5c759edc57 [InstCombine] reduce another or-xor bitwise logic pattern
~(A & ?) | (A ^ B) --> ~((A & ?) & B)
https://alive2.llvm.org/ce/z/mxex6V

This is similar to 9d218b61cc50 where we peeked through
another logic op to find a common operand.
2022-09-03 09:32:08 -04:00
Richard Smith
053841c562 Revert "[AggressiveInstCombine] Lower Table Based CTTZ"
This reverts commit fec01ee3f5244bb9a04bc4310fc892c56c5b6bab.

According to asan, this patch introduces a heap use after free.
2022-09-02 16:19:09 -07:00
Francis Visoiu Mistrih
c5b10f348e [Matrix] Use print instead of dump for matrix-print-after-transpose-opt
We should be able to use this option even if LLVM_ENABLE_DUMP is not on.

(should fix the bots too)
2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih
81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Sameer Sahasrabuddhe
46b293cb3f [Attributor] Simplify offset calculation for a constant GEP
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D132931
2022-09-02 23:53:51 +05:30
Arthur Eubanks
57fd866551 [LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests
The current code is basically just emulating what the analysis manager does.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132581
2022-09-02 10:55:53 -07:00
Djordje Todorovic
fec01ee3f5 [AggressiveInstCombine] Lower Table Based CTTZ
This patch introduces recognition of table-based ctz implementation
during the AggressiveInstCombine.

This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

Differential Revision: https://reviews.llvm.org/D113291
2022-09-02 17:26:55 +02:00
Jolanta Jensen
958abe864a [LoopLoadElim] Add stores with matching sizes as load-store candidates
We are not building up a proper list of load-store candidates because
we are throwing away stores where the type don't match the load.
This patch adds stores with matching store sizes as candidates.
Author of the original patch: David Sherwood.

Differential Revision: https://reviews.llvm.org/D130233
2022-09-02 13:11:25 +01:00
Muhammad Omair Javaid
18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec494c52a63e3b957e28d449262656fb.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Mikael Holmen
51d4c7ceea [GlobalOpt] Fix debug variance problem in hasOnlyColdCalls
hasOnlyColdCalls skipped over calls to intrinsics, but it did so after
checking the linkage of the called function. This meant that the presence
of a call to a debug intrinsic could affect the outcome of the
optimization.

In my original reproducer (for an out of tree target) it was particularly
interesting, because the actual IR after GlobalOpt was not different with
debug instrinsics present, so -print-after-all printouts didn't show
anything there.

However, without debuginfo, GlobalOpt went further and ran
BlockFrequencyAnalysis and (more importanly) LoopAnalysis, and later on in
the pipeline, instcombine behaved in different ways when LoopInfo was
present.

So a call to a dbg.declare prevented running LoopAnalysis in
GlobalOpt, which later prevented InstCombine from doing an optimization.

The dbg-intrinsic-loopanalysis.ll testcase tries to expose this.

Then I also noted that adding a dbg.declare actually made the existing
testcase colccc_coldsites.ll generate different code, so I modified that
to now test it behaves the same way with and without the dbg.declare.

Reviewed By: nikic, fhahn

Differential Revision: https://reviews.llvm.org/D133193
2022-09-02 12:29:44 +02:00
Sergey Kachkov
be37caca00 [JumpThreading] Process range comparisions with non-local cmp instructions
Use getPredicateOnEdge method if value is a non-local
compare-with-a-constant instruction, that can give more precise
results than getConstantOnEdge.

Differential Revision: https://reviews.llvm.org/D131956
2022-09-02 12:22:45 +02:00
Nikita Popov
c453e5b901 Revert "[DSE] Eliminate noop store even through has clobbering between LoadI and StoreI"
This reverts commit cd8f3e75813995c1d2da35370ffcf5af3aff9c2f.

As pointed out by Eli on the review, this is missing an alignment
check. The value might be written at an offset.
2022-09-02 09:28:48 +02:00
Nikita Popov
639d912282 [LICM] Allow load-only scalar promotion in the presence of unwinding
Currently, we bail out of scalar promotion if the loop may unwind
and the memory may be visible on unwind. This is because we can't
insert stores of the promoted value on unwind edges.

However, nowadays scalar promotion also has support for only
promoting loads, while leaving stores in place. This kind of
promotion is safe even in the presence of unwinding.

Differential Revision: https://reviews.llvm.org/D133111
2022-09-02 09:27:13 +02:00
luxufan
cd8f3e7581 [DSE] Eliminate noop store even through has clobbering between LoadI and StoreI
For noop store of the form of LoadI and StoreI,
An invariant should be kept is that the memory state of the related
MemoryLoc before LoadI is the same as before StoreI.
For this example:
```
define void @pr49927(i32* %q, i32* %p) {
  %v = load i32, i32* %p, align 4
  store i32 %v, i32* %q, align 4
  store i32 %v, i32* %p, align 4
  ret void
}
```
Here the definition of the store's destination is different with the
definition of the load's destination, which it seems that the
invariant mentioned above is broken. But the definition of the
store's destination would write a value that is LoadI, actually, the
invariant is still kept. So we can safely ignore it.

Differential Revision: https://reviews.llvm.org/D132657
2022-09-02 06:37:41 +00:00
Vitaly Buka
ad3a77df2d [msan] Fix debug info with getNextNode
When we want to add instrumentation after
an instruction, instrumentation still should
keep debug info of the instruction.

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133091
2022-09-01 20:13:56 -07:00
Chenbing Zheng
d30cf77cb1 [InstCombine] complete fold extractvalue (any_mul_with_overflow X, -1)
When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp),
which left partly failed to match vector constant with poison element.
This patch try to fix it.

Alive2: https://alive2.llvm.org/ce/z/2rGp_3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132996
2022-09-02 10:58:42 +08:00
Vitaly Buka
ad2b356f85 [msan] Use no-origin functions when possible
Saves 1.8% of .text size on CTMark

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133077
2022-09-01 19:18:38 -07:00
Arthur Eubanks
c911befaec [InstCombine] Treat passing undef to noundef params as UB
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133036
2022-09-01 15:16:45 -07:00
Rong Xu
0caa4a9559 [PGO] Support PGO annotation of CallBrInst
We currently instrument CallBrInst but do not annotate it with
the branch weight. This patch enables PGO annotation of CallBrInst.

Differential Revision: https://reviews.llvm.org/D133040
2022-09-01 14:13:50 -07:00
Vitaly Buka
ef0f866718 [msan] Combine shadow check of the same instruction
Reduces .text size by 1% on our large binary.

On CTMark (-O2 -fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-param-retval)
Size -0.4%
Time -0.8%

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133071
2022-09-01 13:55:59 -07:00
Vitaly Buka
9110673062 [nfc][msan] Group checks per instruction
It's a preparation of to combine shadow checks of the same instruction

Reviewed By: kda, kstoimenov

Differential Revision: https://reviews.llvm.org/D133065
2022-09-01 13:10:16 -07:00
Jordan Rupprecht
3031a250de [MSan] Fix determinism issue when using msan-track-origins.
When instrumenting `alloca`s, we use a `SmallSet` (i.e. `SmallPtrSet`). When there are fewer elements than the `SmallSet` size, it behaves like a vector, offering stable iteration order. Once we have too many `alloca`s to instrument, the iteration order becomes unstable. This manifests as non-deterministic builds because of the global constant we create while instrumenting the alloca.

The test added is a simple IR file, but was discovered while building `libcxx/src/filesystem/operations.cpp` from libc++. A reduced C++ example from that:

```
// clang++ -fsanitize=memory -fsanitize-memory-track-origins \
//   -fno-discard-value-names -S -emit-llvm \
//   -c op.cpp -o op.ll
struct Foo {
  ~Foo();
};
bool func1(Foo);
void func2(Foo);
void func3(int) {
  int f_st, t_st;
  Foo f, t;
  func1(f) || func1(f) || func1(t) || func1(f) && func1(t);
  func2(f);
}
```

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D133034
2022-09-01 09:15:57 -07:00
Nuno Lopes
858fe8664e Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 17:04:26 +01:00
Nikita Popov
f5c178b6a4 [LICM] Remove unnecessary condition (NFC) 2022-09-01 15:42:35 +02:00
Nikita Popov
315aef667e [LICM] Fix thread safety checks for promotion of byval args
This code was relying on a very subtle contract: The expectation
was that for non-allocas, the unwind safety check would already
perform a capture check, so we don't need to perform it later.
This held true when this unwind safety was only handled for allocas
and noalias calls, but became incorrect when byval support was
added.

To avoid this kind of issue, just remove the dependency between the
unwind and thread-safety checks entirely. At worst, this means we
perform a redundant capture check. If this should turn out to be
problematic for compile-time, we can cache that query in a more
explicit way.
2022-09-01 15:33:46 +02:00
Sanjay Patel
c3d1504d63 [InstCombine] fix crash on type mismatch with fcmp fold
The existing predicate doesn't work for a single-element
vector, so make sure we are not crossing scalar/vector types.

Test (was crashing) based on the post-commit example for:
482777123427
2022-09-01 08:57:55 -04:00
Sanjay Patel
addbdac5d5 [InstCombine] fold power-of-2 ctlz/cttz with inverted result
When X is a power-of-two or zero and zero input is poison:
ctlz(i32 X) ^ 31 --> cttz(X)
cttz(i32 X) ^ 31 --> ctlz(X)

https://alive2.llvm.org/ce/z/Cs7sFE
2022-09-01 08:57:55 -04:00
Nikita Popov
3f8b1d0f15 [LICM] Add some debug output to scalar promotion (NFC) 2022-09-01 14:46:30 +02:00
Alexey Bataev
982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Yuanbo Li
ebd0249fcf [DebugInfo] Missing debug location after replacement in processSRem function
This patch fixes an issue in which CorrelatedValuePropagation::processSRem
would create new instructions to represent the SRem instruction, but would not
correctly copy any existing debug location metadata to the new instruction.

Differential Revision: https://reviews.llvm.org/D132218
2022-09-01 13:18:17 +01:00
Florian Hahn
fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Nuno Lopes
fa154a9170 Revert "Expand Div/Rem: consider the case where the dividend is zero"
This reverts commit 4aed09868b5a51a29aade11d9d412c3313310f29.
2022-09-01 12:11:22 +01:00
Nuno Lopes
4aed09868b Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 12:00:03 +01:00
Pavel Samolysov
527b9a9d90 [DeadArgElim] Use structure bindings in foreach loops. NFC
Differential Revision: https://reviews.llvm.org/D133026
2022-09-01 13:48:46 +03:00
Nikita Popov
43e7d9af1d [InstCombine] Fold extractvalue of phi
Just as we do for most other operations, we should push
extractvalue instructions through phis, if this does not increase
unfolded instruction count.
2022-09-01 10:51:54 +02:00
Arthur Eubanks
04f3c20989 [NFC][LICM] Stop passing around unused BFI
Uses of this were removed in 1a25d0bfbb6b587caa03bacd121b67086a774598.
2022-08-31 19:15:34 -07:00
Vitaly Buka
53d1ae88f8 [nfc][msan] Prepare the code for check sorting 2022-08-31 15:36:49 -07:00
Nikita Popov
ab6876a40d reland: [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs.

This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in 8201e3ef5c.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129997

Originally landed as
commit 08860f525a23 ("[Local] Allow creating callbr with duplicate successors")

Reverted in
commit 1cf6b93df168 ("Revert "[Local] Allow creating callbr with duplicate successors"")
2022-08-31 13:23:00 -07:00
Alexey Bataev
588115c117 [SLP][NFC]Add a check for SelectInst to match description, NFC. 2022-08-31 13:04:21 -07:00
Alexey Bataev
d8d9ee10bb [SLP][NFC]Fix comment and make function following naming standard, NFC. 2022-08-31 12:37:55 -07:00