33580 Commits

Author SHA1 Message Date
Yingwei Zheng
6d667d4b26
[InstCombine] Combine const GEP chains
This patch reverts rGae739aefd7473517d3f08b5c8d08a66c7f469198 to address performance regressions reported by our [CI](https://github.com/dtcxzyw/llvm-ci/issues/137) after rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.

For example:
```
define ptr @const_gep_chain(ptr %p, i64 %a) {
    %p1 = getelementptr inbounds i8, ptr %p, i64 %a
    %p2 = getelementptr inbounds i8, ptr %p1, i64 1
    %p3 = getelementptr inbounds i8, ptr %p2, i64 2
    %p4 = getelementptr inbounds i8, ptr %p3, i64 3
    ret ptr %p4
}
```
The last three GEPs will not be folded since rG2ec1d0f427c7822540352c0c14d057e7bfe4f77b.

I think it is appropriate to remove this code because there is no compile-time regression reported in our benchmarks.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149240
2023-05-02 00:28:39 +08:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00
Vitaly Buka
e8893133d1 Revert "[NFC][HWASAN] Handle tags as Int8"
More tests need updates.

This reverts commit e876ba5db98830db427395ed9b3718d20bf519fb.
2023-04-30 20:59:43 -07:00
Vitaly Buka
e876ba5db9 [NFC][HWASAN] Handle tags as Int8 2023-04-30 19:58:01 -07:00
Vitaly Buka
0b97aff4d2 [NFC][HWASAN] Rename local variable 2023-04-30 19:49:25 -07:00
Vitaly Buka
f42f863c33 [NFC][HWASAN] Set constant type from another operand 2023-04-30 19:07:57 -07:00
Vitaly Buka
37f6c9f852 [HWASAN] Untag before tagging alloca pointers
This is folloup to b5595836, which missed the
Replacemen variable.

Before b5595836 the code assumed that alloca
ptrs are not tagged so tagging is implemented
as simple OR.

So this patch completes support of tagged SP
by passing untagged alloca pointers into
tagPointer.
2023-04-30 18:26:58 -07:00
Valentin Churavy
bf08973277 Don't loop unswitch vector selects
Otherwise we could produce `br <2x i1>` which are of course not legal.

```
Branch condition is not 'i1' type!
  br <2 x i1> %cond.fr1, label %entry.split.us, label %entry.split
  %cond.fr1 = freeze <2 x i1> %cond
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/vchuravy/builds/llvm/bin/opt -passes=simple-loop-unswitch<nontrivial> -S
```

Fixes change introduced by https://reviews.llvm.org/D138526

Reviewed By: caojoshua

Differential Revision: https://reviews.llvm.org/D149560
2023-04-30 19:19:29 -04:00
Florian Hahn
0b24436591
[LV] Clarify comment for selectVectorizationFactor (NFC).
The comment is stale, as UserVF is handled before selectVectorizationFactor
is called. Clarify the comment by remove the mention of UserVF.

Suggested as independent improvement in D143938.
2023-04-30 21:12:15 +01:00
Florian Hahn
a431402fd2
[LV] Remove loop arg from CM::isCandidateForEpilogueVectorization (NFC)
LVP operates on the loop it stores in TheLoop. Use it instead of the
argument, to be in line with other member functions.

Suggested as independent improvement in D143938.
2023-04-30 21:11:12 +01:00
Florian Hahn
6fa07a87ab
[LV] Document selectEpilogueVectorizationFactor (NFC).
Add missing documentation for selectEpilogueVectorizationFactor.

Suggested as independent improvement in D143938.
2023-04-30 21:09:24 +01:00
Florian Hahn
9fce1fc6f8
[LVP] Fix comment for hasPlanWithVF (NFC).
The function checks if there's a plan with the specified VF. Update the
comment to match the implementation.

Pointed out as independent improvement in D143938.
2023-04-30 19:13:53 +01:00
Florian Hahn
8d3ff24e11
[LV] Sink collect* calls to LVP::plan() (NFC).
Move calls of collect* helpers closer to where the cost-model is used.
Should help simplifying D142669 & D142670.

Differential Revision: https://reviews.llvm.org/D142674
2023-04-30 11:41:22 +01:00
Nuno Lopes
8a1373d308 Revert "[InstCombine] Generate better code for std::bit_floor from libstdc++"
This reverts commit d775fc390d3c78cc81872e276c4b1314f19af577.

The patch is wrong wrt undef and the author didn't fix it after 2 weeks.
2023-04-30 09:56:34 +01:00
Joshua Cao
e479ed90b5 [SimpleLoopUnswitch] unswitch selects
The old LoopUnswitch pass unswitched selects, but the changes were never
ported to the new SimpleLoopUnswitch.

We unswitch by turning:

```
S = select %cond, %a, %b
```

into:

```
head:
br %cond, label %then, label %tail

then:
br label %tail

tail:
S = phi [ %a, %then ], [ %b, %head ]
```

Unswitch selects are always nontrivial, since the successors do not exit
the loop and the loop body always needs to be cloned.

Differential Revision: https://reviews.llvm.org/D138526

Co-authored-by: Sergey Kachkov <sergey.kachkov@syntacore.com>
2023-04-29 21:24:26 -07:00
Vitaly Buka
d3c37e2cd1 [NFC][HWASAN] Use pointercast instead of bitcast 2023-04-29 17:51:19 -07:00
Vitaly Buka
a1cca2e2d1 [NFC][HWASAN] Add cont to parameter 2023-04-29 17:51:19 -07:00
Vitaly Buka
2db925659e [NFC][HWASAN] Fix comment 2023-04-29 17:51:19 -07:00
Vitaly Buka
87d473af69 [NFC][HWASAN] Remove unused parameter 2023-04-29 17:51:18 -07:00
Noah Goldstein
dc13624e88 [InstCombine] Fold (cmp eq/ne (umax X, Y),0) -> (cmp eq/ne (or X, Y),0)
`or` is almost always preferable.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149426
2023-04-29 12:38:43 -05:00
Noah Goldstein
ecad53c3f4 [InstCombine] Don't fold uadd.sat to or if it increase instruction count
In the `(cmp eq/ne (uadd.sat X, Y),0)` case, we where missing a
`hasOneUse` check.

Differential Revision: https://reviews.llvm.org/D149425
2023-04-29 12:38:41 -05:00
Matt Arsenault
b52db60cbb GlobalOpt: Drop code to handle typed pointers
Fixes assert with pointers with different address spaces. We
could keep looking through addrspacecast, but it would require
checking for null handling of the access address space.

Fixes #62384
2023-04-29 09:48:21 -04:00
Vitaly Buka
67caff6f32 [msan] Improve handling of Intrinsic::is_fpclass after c55fffe
c55fffe replaced fcmp with fpclass.

```
declare i1 @llvm.is.fpclass(<fptype> <op>, i32 <test>)
declare <N x i1> @llvm.is.fpclass(<vector-fptype> <op>, i32 <test>)
```

Perfect fix will require checking bits of <op> corresponding to <test>
argument. For now just propagate shadow without reporting before
intrinsic. Still existing handling of fcmp is also simple OR, so it's
not making it worse.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D149491
2023-04-28 16:27:31 -07:00
wlei
ba3cbc7aad fix a use-after-free failure 2023-04-28 15:55:51 -07:00
AdityaK
1ce2015f7e [NFC] check for UnreachableInst first as it is cheaper compared to getTerminatingDeoptimizeCall
Reviewers: craig.topper, aeubanks, Peter
Differential Revision: https://reviews.llvm.org/D134490
2023-04-28 15:29:44 -07:00
Florian Hahn
4583d7ef7c
[LV] Rename Preheader -> VecPreheader (NFC).
Clarify variable name as suggested in D147964 to reduce diff.
2023-04-28 22:15:47 +01:00
Teresa Johnson
39f7b48671 [MemProf] Use updated version of hot/cold operator new
Switch to the just updated versions of the API in tcmalloc that change
the name of the hot cold paramter to a reserved identifier __hot_cold_t.
This was based on feedback from Richard Smith, as I also need to add
some follow-on handling to clang so they are annotated properly.

Differential Revision: https://reviews.llvm.org/D149475
2023-04-28 13:35:46 -07:00
wlei
892daede72 [SamplePGO] Stale profile matching(part 2)
Part 2 of https://reviews.llvm.org/D147456
Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:
- Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
- Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.
**The input of the algorithm:**
- IR locations: the anchor is the callee name of direct callsite.
- Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`.
The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`.
```
// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }
```
IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`.
```
; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30
```
Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]`
**Matching heuristic:**
- Match all the anchors in lexical order first.
- Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.
So the example above is matched like:
```
   [1,    2(foo), 3,  5,  6(bar), 7]
    |     |       |   |     |     |
   [1, 2, 3(foo), 4,  7,  8(bar), 9]
```
3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`)

**Clang-self build benchmark: **
Current build version: clang-10
The profiled version:  clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).
1) Regression to using refresh profile with this off : -3.93%
2) Regression to using refresh profile with this on  : -1.1%
So this algorithm can recover ~72% of the regression.
**Internal(Meta) large-scale services.**
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

**Notes or future work:**
- Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
- The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
- Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
- This doesn't handle function name mismatch, we plan to solve it in future.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D147545
2023-04-28 13:07:32 -07:00
wlei
a98d6a11ea [SamplePGO] Stale profile matching(part 1)
AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.

This patch is the part 1 of stale profile matching, summary of the implementation:
 - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile).
 - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`.
 - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location.
 - Added a new switch `--salvage-stale-profile`.
 - Some refactoring for the staleness detection.

Test case is in part 2 with the matching algorithm.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147456
2023-04-28 13:07:32 -07:00
Vasileios Porpodas
fe1e50cd15 [NFC][SLP] Cleanup: Replace Value* operand with Instruction* in vectorizeRootInstruction() and vectorizeHorReduction()
This makes it explicit that these functions work with instructions, and avoids
calling them if the operand is not an instruction.

Differential Revision: https://reviews.llvm.org/D149465
2023-04-28 12:35:09 -07:00
Vasileios Porpodas
92b2a266e9 [NFC][SLP] Cleanup: Moves code that changes the reduction root into a separate function.
This makes `matchAssociativeReduction()` a bit simpler.

Differential Revision: https://reviews.llvm.org/D149452
2023-04-28 10:05:32 -07:00
Jay Foad
56af0e913c [EarlyCSE] Do not CSE convergent calls in different basic blocks
"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.

In code like this, if we allow CSE to combine the two calls:

  x = convergent_call();
  if (cond) {
    y = convergent_call();
    use y;
  }

then we get this:

  x = convergent_call();
  if (cond) {
    use x;
  }

This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.

Differential Revision: https://reviews.llvm.org/D149348
2023-04-28 14:50:48 +01:00
Nikita Popov
0659000ff7 [LICM] Don't duplicate instructions just because they're free
D37076 makes LICM duplicate instructions into exit blocks if the
instruction is free. For GEPs, the motivation appears to be that
this allows the GEP to be folded into addressing modes, while
non-foldable users outside the loop might prevent this. TBH I don't
think LICM is the place to do this (why doesn't CGP apply this
heuristic itself?) but at least I understand the motivation.

However, the transform is also applied to all other "free"
instructions, which are just that (removed during lowering and not
"folded" in some way). For such instructions, this transform seems
somewhere between useless, counter-productive (undoing CSE/GVN) and
actively incorrect. For example, this transform can duplicate freeze
instructions, which is illegal.

This patch limits the transform to just foldable GEPs, though we
might want to drop it from LICM entirely as a followup.

This is a small compile-time improvement, because querying TTI cost
model for every single instruction is expensive.

Differential Revision: https://reviews.llvm.org/D149136
2023-04-28 14:31:23 +02:00
Florian Hahn
2c9d21a2a3
[VPlan] Turn Plan entry node into VPBasicBlock (NFCI).
The entry to the plan is the preheader of the vector loop and
guaranteed to be a VPBasicBlock. Make sure this is the case by
adjusting the type.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149005
2023-04-28 12:29:06 +01:00
Bjorn Pettersson
cf59f649ff Re-apply "[Passes] Remove legacy PM versions of InstructionNamer and MetaRenamer"
A new attempt after removing uses of -instnamer in polly lit tests
in D148530.
2023-04-28 13:18:45 +02:00
Nikita Popov
0aed0dbec2 [LCSSA] Don't invalidate entire loop in SCEV
We already invalidate each individual instruction for which LCSSA
is formed in formLCSSAForInstructions(), so I don't see a reason
why we would need to invalidate the entire loop on top of that.

I believe we also no longer need the instruction-level invalidation
now that SCEV looks through LCSSA phis, but I'll leave that for a
separate patch, as it's less obvious.

Differential Revision: https://reviews.llvm.org/D149331
2023-04-28 12:17:26 +02:00
Jay Foad
31ec0a6845 [SimplifyCFG] Improve the way hoisting skips over non-matching instructions
D129370 introduced the idea that hoisting could skip over non-matching
instructions and continue to look for matching (hoistable) instructions,
but certain types of mismatch still aborted the whole hoisting attempt.

Fix this by splitting out some of the instruction matching checks into a
helper function.

Also forbid hoisting allocas past stacksave/stackrestore, completing the
fix started in D133730, to avoid regressing tests.

Differential Revision: https://reviews.llvm.org/D149365
2023-04-28 10:03:32 +01:00
Alexey Bataev
8bacd75125 [SLP][NFC]Fix a warning because of the missing parens, NFC. 2023-04-27 16:59:37 -07:00
Mingming Liu
b3cb950cf3 [PGO]Implement metadata combine for 'branch_weights' of direct
callsites when none of the instructions folds the rest away.

- Merge cases are added for simplify-cfg {sink,hoist}, based on https://gcc.godbolt.org/z/avGvc38W7 and https://gcc.godbolt.org/z/dbWbjGhaE
- When one instruction folds the others in, do not update branch_weights
  with sum (see test/Transforms/GVN/calls-readonly.ll)

Differential Revision: https://reviews.llvm.org/D148877
2023-04-27 13:04:17 -07:00
Christian Ulmann
c67079f1be [PGO] Fix dead StringRef access
This commit fixes a dead StringRef access introduced in
https://reviews.llvm.org/D149324
2023-04-27 19:42:56 +00:00
Mircea Trofin
460ea85014 [nfc][thinlto] Handle global constant importing separately
This makes the logic for referenced globals reusable for import criteria
that don't use thresholds - in fact, we currently didn't consider any
thresholds when importing.

Differential Revision: https://reviews.llvm.org/D149298
2023-04-27 12:21:50 -07:00
Arthur Eubanks
3db8ae1f68 Revert "[MergeICmps] Adapt to non-eq comparisons, bugfix"
This reverts commit ca94b02e559242e6d1fcdd65320334438be69448.

Causes miscompiles, see D141188
2023-04-27 11:46:36 -07:00
Alexey Bataev
1604a100f1 [SLP][NFC]Avoid extra useless ConstantVector creation, use PointerUnion
instead, NFC.

Better to use PointerUnion<Value *, const TreeEntry *> instead of extra
attempts of creating null vector values, where possible.
2023-04-27 10:48:14 -07:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Alexey Bataev
cf792f664a [SLP]Fix a crash for the replaced vectorized value.
If two nodes share the same value, which is replaced in one of the
nodes, need to automatically replace same value in all nodes. Btter to
use WeakTrackingVH for this to fix compiler crash.
2023-04-27 09:32:00 -07:00
Christian Ulmann
d8e15dc4ae [PGO] Minor instrumentation code cleanup (NFC)
This commit cleans up some parts of the PGO instrumentation. Most
importantly, it removes a template parameter shadowing of a class name
that could lead to confusion.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D149324
2023-04-27 16:10:10 +00:00
Jay Foad
ee88cd82a9 [SimplifyCFG] Remove some unnecessary TTI arguments. NFC.
TTI was already available in the SimplifyCFGOpt class.
2023-04-27 17:05:14 +01:00
Christian Ulmann
a8dd375cbf [PGO] Move CFGMST.h into the include directory
This commit moves the CFGMST.h file into the include directory. The
implemented algorithm is can be helpful for downstream projects that
want to use the PGO data in a non-standard way.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D149336
2023-04-27 14:11:04 +00:00
Zhongyunde
90d30fde12 [InstCombine] Add frozen for the condition value of SelectInst
If the condition value of SelectInst may be a poison or undef value,
infer constant range at SelectInst use is incorrect, similar to D143883.
Fixes https://github.com/llvm/llvm-project/issues/62401

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149339
2023-04-27 21:35:54 +08:00
OCHyams
bd1109307a [DebugInfo][InstCombine] Fix missing source and variable locations after foldOpIntoPhi
Reviewed By: fdeazeve

Differential Revision: https://reviews.llvm.org/D149335
2023-04-27 13:56:09 +01:00