29605 Commits

Author SHA1 Message Date
Johannes Doerfert
3c8a4c6f47 [OpenMP] Eliminate redundant barriers in the same block
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.

This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.

Differential Revision: https://reviews.llvm.org/D118002
2022-02-01 01:07:50 -06:00
Johannes Doerfert
989674f110 [OpenMP] Ensure to remove noinline from all runtime functions eventually
We used to remove noinline from known OpenMP runtime functions (which
are declared in OMPKinds.td). Now we remove noinline from all functions
with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_
2022-02-01 01:07:50 -06:00
Serguei Katkov
28c5e1b760 [RS4GC] Make PointerToBase mapping be independent on call site. NFC.
PointerToBase is a mapping between potentially derived pointer to its base.
As soon as we are in SSA form if there is a base of derived pointer and it
is available at def of derived pointer, the same base will be available at any
point where derived pointer is alive.

So the mapping of derived pointer to base pointer is not a property
of a call site but the same on function level.

Reviewers: reames, yrouban
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D118604
2022-02-01 11:47:36 +07:00
Fangrui Song
7aaf024dac [BitcodeWriter] Fix cases of some functions
`WriteIndexToFile` is used by external projects so I do not touch it.
2022-01-31 16:46:11 -08:00
Fangrui Song
85dfe19b36 [ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See 2d5f857a1eaf5f7a806d12953c79b96ed8952da8 for detail.

EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.

It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D118666
2022-01-31 16:33:57 -08:00
Kirill Stoimenov
a5dd6c7419 [ASan] Fixed null pointer bug introduced in D112098.
Also added some more test to cover the "else if" part.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118645
2022-01-31 21:50:10 +00:00
William S. Moses
8cb9c73609 [LoopIdiom] Keep TBAA when creating memcpy/memmove
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D108221
2022-01-31 16:28:13 -05:00
Alexey Bataev
afaaecc88c [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-01-31 11:11:25 -08:00
Jay Foad
8faad29634 Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.

Apparently it is not safe to modify the condition even if it passes the
hasOneUse test, because StructurizeCFG might have other references to
the condition that are not manifest in the IR use-def chains.
2022-01-31 14:55:36 +00:00
Jay Foad
a6b54ddaba [Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an
xor i1 instruction, and it since it generally runs late in the pipeline,
instcombine does not clean up the xor-of-cmp pattern.

Differential Revision: https://reviews.llvm.org/D118478
2022-01-31 10:44:17 +00:00
Nikita Popov
4810051a82 [Inline][Cloning] Reliably remove unreachable blocks during cloning (PR53206)
The pruning cloner already tries to remove unreachable blocks. The
original cloning process will simplify instructions and constant
terminators, and only clone blocks that are reachable at that point.
However, phi nodes can only be simplified after everything has been
cloned. For that reason, additional blocks may become unreachable
after phi simplification.

The code does try to handle this as well, but only removes blocks
that don't have predecessors. It misses unreachable cycles. This
can cause issues if SEH exception handling code is part of an
unreachable cycle, as the inliner is not prepared to deal with that.

This patch instead performs an explicit scan for reachable blocks,
and drops everything else.

Fixes https://github.com/llvm/llvm-project/issues/53206.

Differential Revision: https://reviews.llvm.org/D118449
2022-01-31 09:31:34 +01:00
Max Kazantsev
70b3beb0e2 [InstCombine] Generalize and-reduce pattern to handle ne case as well as eq
Following Sanjay's proposal from discussion in D118317, this patch
generalizes and-reduce handling to fold the following pattern
```
  icmp ne (bitcast(icmp ne (lhs, rhs)), 0)
```
into
```
  icmp ne (bitcast(lhs), bitcast(rhs))
```

https://alive2.llvm.org/ce/z/WDcuJ_

Differential Revision: https://reviews.llvm.org/D118431
Reviewed By: lebedev.ri
2022-01-31 12:14:08 +07:00
Ricky Zhou
30ac5f9e64 [InstCombine] Do not combine atomic and non-atomic loads
Before this change, InstCombine was willing to fold atomic and
non-atomic loads through a PHI node as long as the first PHI argument
is not an atomic load. The combined load would be non-atomic, which is
incorrect.

Fix this by only combining the loads in a PHI node when all of the
arguments are non-atomic loads.

Thanks to Eli Friedman for pointing out the bug at
https://github.com/llvm/llvm-project/issues/50777#issuecomment-981045342!

Fixes #50777

Differential Revision: https://reviews.llvm.org/D115113
2022-01-30 10:05:11 -05:00
Ricky Zhou
de80b53d1a [InstCombine] Use range for loops (NFC)
Preliminary clean-up for D115113

Differential Revision: https://reviews.llvm.org/D116086
2022-01-30 09:10:39 -05:00
Ricky Zhou
4aabed05a8 [InstCombine] Uppercase some variable names (NFC)
Uppercase some variable names, per LLVM coding standards. This change
intentionally does not rename every miscased variable, as a follow-up
change ( D116086 ) intends to eliminate many of those by switching
loops to range for loops.

Differential Revision: https://reviews.llvm.org/D118553
2022-01-30 09:10:39 -05:00
Florian Hahn
8f12175fed
[VPlan] Use VPlan to check if only the first lane is used.
This removes the remaining dependence on LoopVectorizationCostModel from
buildScalarSteps and is required so it can be moved out of ILV.

It also improves allows us to remove a few unneeded instructions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116554
2022-01-30 13:07:29 +00:00
Nuno Lopes
dd995aceda [InstCombine] remove incorrect gep(x, undef) -> undef optimization
gep(x, undef) carries the provenance of x, so we can't replace it with any
pointer like undef.
This leaves room for improvement for the poison case, but that's currently
not possible as the demanded bits API doesn't distinguish between undef &
poison bits.

Fixes #44790
2022-01-30 11:34:32 +00:00
Nuno Lopes
f1c18acb07 [NewGVN] do phi(undef, x) -> x only if x is not poison
phi([undef, A], [x, B]) -> x is only correct x is guaranteed to be
a non-poison value.
Otherwise we would be changing an undef to poison in the branch A.

Differential Revision: https://reviews.llvm.org/D117907
2022-01-29 21:43:57 +00:00
Florian Hahn
efd4938723
[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

 1. First step to avoid setting both vector and scalar values for the
    same induction def.
 2. Reducing complexity of widenIntOrFpInduction through making things
    more explicit in VPlan
 3. Only need to splat the vector IV for block in masks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116123
2022-01-29 16:25:27 +00:00
Max Kazantsev
3b194ca7ab Recommit "[InstCombine] Fold and-reduce idiom"
Checks of original vector types made more thorough.

Differential Revision: https://reviews.llvm.org/D118317
2022-01-29 11:27:48 +07:00
Philip Reames
6888081e32 [SLP] Use moveBefore to simplify code [NFC] 2022-01-28 12:44:07 -08:00
Ahmed Bougacha
634ca7349d [ObjCARC] Require the function argument in the clang.arc.attachedcall bundle.
Currently, the clang.arc.attachedcall bundle takes an optional function
argument.  Depending on whether the argument is present, calls with this
bundle have the following semantics:

- on x86, with the argument present, the call is lowered to:
    call _target
    mov rax, rdi
    call _objc_retainAutoreleasedReturnValue

- on AArch64, without the argument, the call is lowered to:
    bl _target
    mov x29, x29

  and the objc runtime call is expected to be emitted separately.

That's because, on x86, the objc runtime checks for both the mov and
the call on x86, and treats the combination as the ARC autorelease elision
marker.

But on AArch64, it only checks for the dedicated NOP marker, as that's
historically been sufficiently unique.  Thanks to that, the runtime call
wasn't required to be adjacent to the NOP marker, so it wasn't emitted
as part of the bundle sequence.

This patch unifies both architectures: on AArch64, we now emit all
3 instructions for the bundle.  This guarantees that the runtime call
is adjacent to the marker in the sequence, and that's information the
runtime can use to further optimize this.

This helps simplify some of the handling, in particular
BundledRetainClaimRVs, which no longer needs to know whether the bundle
is sufficient or not: it now always should be.

Note that this does not include an AutoUpgrade for the nullary bundles,
as they are only produced in ObjCContract as part of the obj/asm emission
pipeline, and are not expected to be in bitcode.

Differential Revision: https://reviews.llvm.org/D118214
2022-01-28 12:41:45 -08:00
Philip Reames
746e435ff7 Revert "[SLP] Add a clarifying assert in block scheduling [NFC]"
This reverts commit db49a78900f5e4b59714565876b5dbb5e2dfe840.  The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces.  Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.
2022-01-28 12:10:31 -08:00
Andrew Litteken
3785c1d055 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Recommit of: 8de76bd569732acae6a10fdcb0152a49f7d4cd39
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-28 13:52:21 -06:00
Philip Reames
db49a78900 [SLP] Add a clarifying assert in block scheduling [NFC]
The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me.  After digging through the code, this can only happen if we don't find anything to directly vectorize.  However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable.  The assert conveys both that this situation can happen, but also that it can *only* happen for an immediate gather.

Context: We built the bundle before deciding that vectorization of a bundle is possible.  A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.
2022-01-28 11:08:59 -08:00
Ellis Hoag
eea002a9c4 [InstrProf][NFC] Move function out of InstrProf.h
`createIRLevelProfileFlagVar()` seems to be only used in
`PGOInstrumentation.cpp` so we move it to that file. Then it can also
take advantage of directly using options rather than passing them as
arguments.

Reviewed By: kyulee, phosek

Differential Revision: https://reviews.llvm.org/D118097
2022-01-28 09:24:26 -08:00
Alexey Bataev
cec8b614f3 [SLP]Do not reorder top nodes if they do not require reordering.
No need to reorder the top nodes, if they are not stores or
insertelement instructions and each node should be analized only
once, when the bottom-to-top analysis is performed.
We still endup with extractelements for the top node scalars and
the final shuffle just adds an extra cost and currently
crashes the compiler for PHI nodes.

Differential Revision: https://reviews.llvm.org/D116760
2022-01-28 09:16:18 -08:00
Nikita Popov
7d176844d0 [CodeExtractor] Fix warning in assert (NFC) 2022-01-28 16:33:34 +01:00
Nikita Popov
cf0357a545 [BasicBlockUtils] Fix typo in API name (NFC)
detatch -> detach. As this requires touching all uses, also
lower-case it in accordance with the style guide.
2022-01-28 16:32:13 +01:00
Nikita Popov
0ebbf3435f [ArgPromotion] Don't assume all entry block instrs are executed
We should abort this walk if we hit any instruction that is not
guaranteed to transfer.
2022-01-28 16:08:42 +01:00
Nikita Popov
8b36c437df [ArgPromotion] Make areFunctionArgsABICompatible() static (NFC)
This function used to be shared with the Attributor, but can now
be made private.
2022-01-28 15:26:36 +01:00
Hans Wennborg
fabaca10b8 Revert "[InstCombine] Fold and-reduce idiom"
It causes builds to fail with

llvm/include/llvm/Support/Casting.h:269:
typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*)
[with X = llvm::IntegerType; Y = const llvm::Type; typename llvm::cast_retty<X, Y*>::ret_type = const llvm::IntegerType*]:
Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

See the code review for link to a reproducer.

> This patch introduces folding of and-reduce idiom and generates code
> that is easier to read and which is lest costly in terms of icmp operations.
> The folding is
> ```
>   icmp eq (bitcast(icmp ne (lhs, rhs)), 0)
> ```
> into
> ```
>   icmp eq(bitcast(lhs), bitcast(rhs))
> ```
>
> See PR53419.
>
> Differential Revision: https://reviews.llvm.org/D118317
> Reviewed By: lebedev.ri, spatel

This reverts commit 8599bb0f26738ed88aae62aba57d82f7cf326cf9.

This also revertes the dependent change:

"[Test] Add 'ne' tests for and-reduce pattern folding"

This reverts commit a4aaa5995308ac2ba1bf180c9ce9c321cdb9f28a.
2022-01-28 12:16:03 +01:00
Florian Hahn
b339bbdb19
[Matrix] Use ArrayType for allocas instead of VectorType.
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.

This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.

Reviewed By: anemet, thegameg

Differential Revision: https://reviews.llvm.org/D118239
2022-01-28 10:47:52 +00:00
Nikita Popov
91e5096d82 [InlineFunction] Use phis() iterator (NFC) 2022-01-28 10:36:28 +01:00
Florian Hahn
96400f179f
[VPlan] Record whether scalar IVs are need in induction recipe. (NFC)
This explicitly records whether a scalar IV is needed in the
VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model
during its ::execute.

It will also be used in D116123 to determine if a vector phi will be
generated.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118167
2022-01-28 09:34:03 +00:00
Max Kazantsev
8599bb0f26 [InstCombine] Fold and-reduce idiom
This patch introduces folding of and-reduce idiom and generates code
that is easier to read and which is lest costly in terms of icmp operations.
The folding is
```
  icmp eq (bitcast(icmp ne (lhs, rhs)), 0)
```
into
```
  icmp eq(bitcast(lhs), bitcast(rhs))
```

See PR53419.

Differential Revision: https://reviews.llvm.org/D118317
Reviewed By: lebedev.ri, spatel
2022-01-28 11:20:08 +07:00
Ellis Hoag
11d3074267 [InstrProf] Add single byte coverage mode
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
  * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
  * We use a single byte per function rather than 8 bytes per block

The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.

When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D116180
2022-01-27 17:38:55 -08:00
Vitaly Buka
bddc814b44 [msan] Copy origin of byval arguments
Depends on D117278

Reviewed By: kda, eugenis

Differential Revision: https://reviews.llvm.org/D117285
2022-01-27 16:24:07 -08:00
Florian Hahn
9fd7a2e379
[ConstraintElimination] Use constraints with 0 or 1 coefficients.
isConditionImplied is able to correctly handle 0 or 1 coefficients, so
let it handle those cases, rather than skipping them.
2022-01-27 18:41:33 +00:00
Florian Hahn
258a0a3a55
[ConstraintElimination] Use simplified constraint for == 0.
When checking x == 0, checking x u<= 0 is sufficient and simpler than
x u>= 0 && x u<= 0.

https://alive2.llvm.org/ce/z/btM7d3

    ----------------------------------------
    define i1 @src(i4 %a) {
    %0:
      %c = icmp eq i4 %a, 0
      ret i1 %c
    }
    =>
    define i1 @tgt(i4 %a) {
    %0:
       %c = icmp ule i4 %a, 0
       ret i1 %c
    }
    Transformation seems to be correct!
2022-01-27 13:31:23 +00:00
Florian Hahn
a78ce48c37
[ConstraintElimination] Introduce struct to manage constraints. (NFC)
This patch adds a struct to manage a list of constraints. It simplifies
a follow-up change, that adds pre-conditions that must hold before a
list of constraints can be used.
2022-01-27 12:40:09 +00:00
Nikita Popov
d839afe3f9 [InstCombine] Avoid pointer element type access in PointerReplacer
This code replaces the address space of the pointers while keeping
the element type. Use the appropriate helpers to make this work
with opaque pointers.
2022-01-27 12:28:32 +01:00
Nikita Popov
648faa3b5d [InstCombine] Mark element type access as non-opaque (NFC)
Also make the function static to make it more obvious that it is
only used in the one place.
2022-01-27 11:40:29 +01:00
Florian Hahn
bb5c1b0691
[LoopVersioning] Use IRBuilder for OR simplification. 2022-01-27 09:55:51 +00:00
Nikita Popov
2c736f666b [InstCombine] Skip GEP of bitcast transform with opaque pointers
This transform is fundamentally incompatible with opaque pointers.
Usually we would not hit it anyway because the bitcast is folded
away earlier, but due to worklist order it might survive until
here, so make sure we bail out explicitly.
2022-01-27 10:51:45 +01:00
Nikita Popov
b7179d9279 [InstCombine] Extract GEP of bitcast folds into separate function (NFC) 2022-01-27 10:48:00 +01:00
Nikita Popov
73cd8e29ad [InstCombine] Skip PromoteCastOfAllocation() transform under opaque pointers
I think this can't be hit anyway (because a ptr-to-ptr bitcast would
get folded earlier), but in the interest of being explicit skip
this transform for opaque pointers entirely.
2022-01-27 10:25:45 +01:00
Nikita Popov
8d992862a0 [InstCombine] Remove some pointer element type accesses
One of these is guarded against opaque pointers, and the others
were accessing the call function type in a rather convoluted way.
2022-01-27 10:15:35 +01:00
Benjamin Kramer
f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef8206320769ad31422a803a0d6de6077fd231d2.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille
ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00