34262 Commits

Author SHA1 Message Date
Matt Arsenault
fbeda975d2 InstCombine: Drop some typed pointer cast handling 2023-07-31 10:34:31 -04:00
Alexey Bataev
662efdee9b [SLP][NFC]Improve handling of MinBWs container, NFC.
Replaced by DenseMap instead of MapVector(the order is not important,
just lookup is used) + reduced number of lookups.
2023-07-31 07:26:55 -07:00
Nikita Popov
72ec2c007e [InstCombine] Fix handling of irreducible loops (PR64259)
Fixes a regression introduced by D75362 for irreducible control
flow. In that case, we may visit the predecessor that renders
the current block live only later, and incorrectly determine
that a block is dead.

Instead, switch to using the same DeadEdges based implementation
we also use during the main InstCombine iteration.

This temporarily regresses some cases that need replacement of
dead phi operands with poison, which is currently only done during
the main run, but not worklist population. This will be addressed
in a followup, to keep it separate from the correctness fix here.

Fixes https://github.com/llvm/llvm-project/issues/64259.
2023-07-31 16:20:22 +02:00
Alexey Bataev
85635c7f60 [SLP][NFC]Use ScalarTy consistently in getEntryCost, NFC. 2023-07-31 06:52:56 -07:00
Nikita Popov
09156b36c6 [InstCombine] Move worklist preparation into InstCombinerImpl (NFC) 2023-07-31 15:18:12 +02:00
Matt Arsenault
d74c89fdb4 InstCombine: Drop some typed pointer bitcasts 2023-07-31 08:05:58 -04:00
Matt Arsenault
d388222be2 InstCombine: Drop some typed pointer bitcast handling 2023-07-31 08:05:12 -04:00
Nikita Popov
41895843b5 [InstCombine] Only perform one iteration
InstCombine is a worklist-driven algorithm, which works roughly
as follows:

* All instructions are initially pushed to the worklist.
  The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
  worklist.
* When the use-count of an instruction decreases, it gets added
  back to the worklist.
* And a few of other heuristics on when we should revisit
  instructions.

On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:

    "instcombine.NumOneIteration": 411380,
    "instcombine.NumTwoIterations": 117921,
    "instcombine.NumThreeIterations": 236,
    "instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.

This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.

This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.

In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).

Differential Revision: https://reviews.llvm.org/D154579
2023-07-31 10:56:49 +02:00
Alexandros Lamprineas
893d3a61c0 Reland [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-31 08:25:48 +01:00
Noah Goldstein
09b6765e7d [InstCombine] Remove trailing space in comment; NFC 2023-07-30 18:05:56 -05:00
Nikita Popov
ad7f02010f [InstCombine] Process blocks in RPO
InstComine currently processes blocks in an arbitrary
depth-first order. This can break the usual invariant that the
operands of an instruction should be simplified before the
instruction itself, if uses across basic blocks (particularly
inside phi nodes) are involved.

This patch switches the initial worklist population to use RPO
instead, which will ensure that predecessors are visited before
successors (back-edges notwithstanding).

This allows us to fold more cases within a single InstCombine
iteration, in preparation for D154579. This change by itself
is a minor compile-time regression of about 0.1%, which will
be more than recovered by switching to single-iteration InstCombine.

Differential Revision: https://reviews.llvm.org/D75362
2023-07-30 18:38:45 +02:00
Florian Hahn
822c749aec
[LV] Shrink operands before creating new instr to force eval order.
Shrink operands before creating the new instruction to make sure the
same evaluation order is used on all platforms. This fixes buildbot
failures due to different argument evaluation order on different
systems.
2023-07-30 17:16:37 +01:00
Aleksandr Popov
236e6787de [IRCE] Add NSW to OverflowingBinaryOperator but not BinaryOperator
Fix incorrect setting NSW flag to non-overflowing indvar base (D154954)

Reviewed By: danilaml

Differential Revision: https://reviews.llvm.org/D156577
2023-07-30 11:14:26 +02:00
Nuno Lopes
eb1617a582 [InstCombineVectorOps] Use poison instead of undef as placeholder [NFC]
It's used to create a vector where only 1 element is used
While at it, change OOB extractelement to yield poison per LangRef
2023-07-29 15:28:13 +01:00
Aaron Ballman
1a53b5c367 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.

Addressing issues found by:
https://lab.llvm.org/buildbot/#/builders/245/builds/11732
https://lab.llvm.org/buildbot/#/builders/187/builds/12251
https://lab.llvm.org/buildbot/#/builders/186/builds/11099
https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-28 09:41:38 -04:00
Alexey Bataev
48bc5b0a29 [SLP][PR64099]Fix unsound undef to poison transformation when handling
insertelement instructions.

If the original vector has undef, not poison values, which are not
rewritten by later insertelement instructions, need to transform shuffle
with the undef vector, not a poison vector, and actual indices, not
PoisonMaskElem, otherwise the transformation may produce more poisons
output than the input.
2023-07-27 16:09:49 -07:00
William Huang
66ba71d913 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-07-27 23:08:27 +00:00
Noah Goldstein
edf2e0e075 [InstCombine] Folding @llvm.ptrmask with itself
`@llvm.ptrmask` is basically just `and` with a `ptr` operand. This is
a trivial combine to do with `and` (many others could also be added).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D154006
2023-07-27 17:43:08 -05:00
Douglas Yung
32683b231e Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35.

The test in this change was failing on many buildbots:

https://lab.llvm.org/buildbot/#/builders/164/builds/41292
https://lab.llvm.org/buildbot/#/builders/258/builds/4491
https://lab.llvm.org/buildbot/#/builders/192/builds/3566
https://lab.llvm.org/buildbot/#/builders/123/builds/20411
https://lab.llvm.org/buildbot/#/builders/58/builds/42553
https://lab.llvm.org/buildbot/#/builders/247/builds/7037
https://lab.llvm.org/buildbot/#/builders/139/builds/46259
https://lab.llvm.org/buildbot/#/builders/216/builds/24650
https://lab.llvm.org/buildbot/#/builders/234/builds/12571
https://lab.llvm.org/buildbot/#/builders/232/builds/12574
https://lab.llvm.org/buildbot/#/builders/235/builds/975
2023-07-27 13:47:52 -07:00
Alexandros Lamprineas
96ff464dd3 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

In addition to the last revision this one fixes the bug reported on
Phabricator.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-27 19:24:11 +01:00
spupyrev
bc59faa863 A new code layout algorithm for function reordering [2/3]
We are bringing a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.

The algorithm shares some similarities with C^3 and an approach for basic block
reordering (ext-tsp). It works with chains (ordered lists)
of functions. Initially all chains are isolated functions. On every iteration,
we pick a pair of chains whose merging yields the biggest increase in the
objective, which is a weighted combination of frequency-based and distance-based
locality. That is, we try to co-locate hot functions together (so they can share
the cache lines) and functions frequently executed together. The merging process
stops when there is only one chain left, or when merging does not improve the
objective. In the latter case, the remaining chains are sorted by density in the
decreasing order.

**Complexity**
We regularly apply the algorithm for large data-center binaries containing 10K+
(hot) functions, and the algorithm takes only a few seconds. For some extreme
cases with 100K-1M nodes, the runtime is within minutes.

**Perf-impact**
We extensively tested the implementation extensively on a benchmark of isolated
binaries and prod services. The impact is measurable for "larger" binaries that
are front-end bound: the cpu time improvement (on top of C^3) is in the range
of [0% .. 1%], which is a result of a reduced i-TLB miss rate (by up to 20%) and
i-cache miss rate (up to 5%).

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D152834
2023-07-27 09:20:53 -07:00
Maksim Kita
cbfcf90152 [AggressiveInstCombine] Fold strcmp for short string literals with size 2
Fold strcmp for short string literals with size 2.
Depends D155742.

Differential Revision: https://reviews.llvm.org/D155743
2023-07-27 18:45:21 +03:00
Nikita Popov
70aca7b122 [InstCombine] Explicitly track dead edges
This allows us to handle dead blocks with multiple incoming edges,
where we can determine that all of those edges are dead (or cycles).

This allows InstCombine to handle certain dead code patterns that
can be produced by LoopVectorize in a single iteration.

This is in preparation for D154579.
2023-07-27 16:41:03 +02:00
Ramkumar Ramachandra
23caf9e9e7 Local: fix debug output of replaceDominatedUsesWith()
The debug output of replaceDominatedUsesWith() prints incorrect
information, and the user is left confused about what exactly was
replaced. Fix this.

Differential Revision: https://reviews.llvm.org/D156318
2023-07-27 13:23:38 +01:00
witstorm95
77ef88d7ee [Coroutines] Add an O(n) algorithm for computing the cross suspend point
information.

Fixed https://github.com/llvm/llvm-project/issues/62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D154695
2023-07-27 17:28:51 +08:00
Chuanqi Xu
97615ed2f0 Revert "[Coroutines] Add an O(n) algorithm for computing the cross suspend point"
This reverts commit bb4121e65251275b5b16a63423c2bb2be79aeebb. Sorry for
forgetting adding Differential Revision information. It may worth
reverting this one and commit it again given this is a relative big
patch.
2023-07-27 17:27:45 +08:00
witstorm95
bb4121e652 [Coroutines] Add an O(n) algorithm for computing the cross suspend point
information.

Fixed https://github.com/llvm/llvm-project/issues/62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross
suspend point information.

Before the patch:

```
n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata
552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata
1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata
3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata
6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata
10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

```
After the patch:

```
n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata
551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata
1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata
3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata
6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata
10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

```
2023-07-27 17:25:32 +08:00
Florian Mayer
9a67c6beb2 [NFC] [HWASan] simplify code
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D156382
2023-07-26 17:05:19 -07:00
Florian Mayer
12982d250d [NFC] [HWASan] remove unused include 2023-07-26 14:34:31 -07:00
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Maksim Kita
ac357a4773 [InstCombine] Fold icmp or sub chain ((x1 - y1) | (x2 - y2)) == 0
Improve ((x1 ^ y1) | (x2 ^ y2)) == 0 transform to also support sub ((x1 - y1) | (x2 - y2)) == 0.
Depends D155703.

Differential Revision: https://reviews.llvm.org/D155704
2023-07-26 19:16:41 +03:00
Ivan Kosarev
e9df4c9892 [ADT] Support iterating size-based integer ranges.
It seems the ranges start with 0 in most cases.

Reviewed By: dblaikie, gchatelet

Differential Revision: https://reviews.llvm.org/D156135
2023-07-26 16:28:41 +01:00
Alexandros Lamprineas
c52ab9ea2f Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
Reverting due to the crash reported in D154852.

Also reverting the subsequent commit as collateral damage:

"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
2023-07-26 12:33:41 +01:00
Alexandros Lamprineas
20c8f58c11 [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-07-26 12:03:46 +01:00
Johannes Doerfert
2f7ef7be1f [Attributor][FIX] Swap cases in ternary op to avoid nullptr reference
The case was wrong before, and somehow I only looked at the condition
before.
2023-07-26 00:03:06 -07:00
Johannes Doerfert
a598e39063 [AAInterFnReachabilityFunction][NFC] Remove unused members 2023-07-26 00:03:06 -07:00
Johannes Doerfert
b3fec1067a [Attributor] Improve NonNull deduction
We can improve our deduction if we stop at PHI and select instructions
and also iterate the returned values explicitly. The latter helps with
isImpliedByIR deductions.
2023-07-25 20:31:21 -07:00
Johannes Doerfert
88b5d23021 [Attributor] Allow multiple LHS/RHS values when simplifying comparisons
We use to deal with multiple values but not in the handleCmp function.
Now we also allow multiple simplified operands there.
2023-07-25 20:31:21 -07:00
Johannes Doerfert
0cd8a28941 [Attributor][FIX] No IntraFnReachability does not mean unreachable
Also, first check inter fn reachability as it seems to be cheaper in
practise.
2023-07-25 17:47:33 -07:00
Johannes Doerfert
ba0be698c5 [Attributor][NFC] Rename variable to be less confusing 2023-07-25 17:47:33 -07:00
Johannes Doerfert
4223c9b354 [Attributor] Always deduce nosync from readonly + non-convergent
This adds the deduction also if the function is not IPO amendable.
2023-07-25 17:47:33 -07:00
Johannes Doerfert
ae6ad31763 [Attributor] Try to remove argmem effects after signature rewrite
If the new signature has only readnone pointers we can remove any argmem
effect from the function signature.
2023-07-25 17:47:33 -07:00
Johannes Doerfert
e31724f1a6 [Attributor] Check readonly call sites for nosync in AANoSync
See @nosync_convergent_callee_test() in nosync.ll. The other changes are
call sites now annotated with `nosync`.
2023-07-25 17:47:33 -07:00
Florian Mayer
6cc9244baa Enable hwasan-use-after-scope by default
This has been in use for a long time without any issues.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D156267
2023-07-25 17:36:15 -07:00
Teresa Johnson
5986559caa [SimplifyCFG] Guard branch folding by speculate blocks flag
Guard FoldBranchToCommonDest in SimplifyCFG with the SpeculateBlocks
flag as it can also speculate instructions.

This was split out of D155997.

Differential Revision: https://reviews.llvm.org/D156194
2023-07-25 06:46:19 -07:00
Matt Arsenault
e93ae281db Attributor: Fix typo 2023-07-25 07:25:11 -04:00
Alexandros Lamprineas
59a5c582dc [FuncSpec][NFC] Leave a comment for future improvements.
Adds a TODO for checking inlinining opportunities while traversing
the users of the specialization arguments. This was brought up in
the review of D154852.
2023-07-25 11:58:42 +01:00
Alexandros Lamprineas
03f1d09fe4 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-25 11:00:20 +01:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Fangrui Song
e8e7a959c7 [SLP] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D154891 2023-07-24 09:47:50 -07:00