32643 Commits

Author SHA1 Message Date
OCHyams
4ece50737d [Assignment Tracking][NFC] Replace LLVM command line option with a module flag
Remove LLVM flag -experimental-assignment-tracking. Assignment tracking is
still enabled from Clang with the command line -Xclang
-fexperimental-assignment-tracking which tells Clang to ask LLVM to run the
pass declare-to-assign. That pass converts conventional debug intrinsics to
assignment tracking metadata. With this patch it now also sets a module flag
debug-info-assignment-tracking with the value `i1 true` (using the flag conflict
rule `Max` since enabling assignment tracking on IR that contains only
conventional debug intrinsics should cause no issues).

Update the docs and tests too.

Reviewed By: CarlosAlbertoEnciso

Differential Revision: https://reviews.llvm.org/D142027
2023-01-20 14:24:15 +00:00
Nikita Popov
d49b842ea2 [SROA] Use copyMetadataForLoad() helper
Instead of copying just nonnull metadata, use the generic helper
to copy metadata to the new load. This helper is specifically
designed for the case where the load type may change, so it's
safe to use in this context.
2023-01-20 15:24:10 +01:00
Nikita Popov
bf23b4031e [ValueTracking] Take poison-generating metadata into account (PR59888)
In canCreateUndefOrPoison(), take not only poison-generating flags,
but also poison-generating metadata into account. The helpers are
written generically, but I believe the only case that can actually
matter is !range on calls -- !nonnull and !align are only valid on
loads, and those can create undef/poison anyway.

Unfortunately, this negatively impacts logical to bitwise and/or
conversion: For ctpop/ctlz/cttz we always attach !range metadata,
which will now block the transform, because it might introduce
poison. It would be possible to recover this regression by supporting
a ConsiderFlagsAndMetadata=false mode in impliesPoison() and clearing
flags/metadata on visited instructions.

Fixes https://github.com/llvm/llvm-project/issues/59888.

Differential Revision: https://reviews.llvm.org/D142115
2023-01-20 12:18:32 +01:00
Sergey Kachkov
e1a702db2f [GVN] Refactor findDominatingLoad function
Improve findDominatingLoad implementation:
1. Result is saved into gvn::AvailableValue struct
2. Search is done in extended BB (while there is a single predecessor or
   limit is reached)

Differential Revision: https://reviews.llvm.org/D141680
2023-01-20 11:54:11 +03:00
Arthur Eubanks
c5ea42bcf4 Revert "[LoopUnroll] Directly update DT instead of DTU."
This reverts commit d0907ce7ed9f159562ca3f4cfd8d87e89e93febe.

Causes `opt -passes=loop-unroll-full` to crash on

```
define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb1, %bb1, %bb
  switch i1 true, label %bb1 [
    i1 true, label %bb2
    i1 false, label %bb1
  ]

bb2:                                              ; preds = %bb1
  ret void
}
```
2023-01-19 17:01:15 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Arthur Eubanks
1f3f3c0ea7 Revert "Reland [pgo] Avoid introducing relocations by using private alias"
This reverts commit da5a8d14b8cc6cea16ee0929413c0672b47c93d9.

Causes more duplicate symbol errors, see https://bugs.chromium.org/p/chromium/issues/detail?id=1408161.
2023-01-19 10:20:38 -08:00
Florian Hahn
d0907ce7ed
[LoopUnroll] Directly update DT instead of DTU.
The scope of DT updates are very limited when unrolling loops: the DT
should only need updating for
* new blocks added
* exiting blocks we simplified branches

This can be done manually without too much extra work.
MergeBlockIntoPredecessor also needs to be updated to support direct
DT updates.

This fixes excessive time spent in DTU for same cases. In an internal
example, time spent in LoopUnroll with this patch goes from ~200s to 2s.

It also is slightly positive for CTMark:
* NewPM-O3: -0.13%
* NewPM-ReleaseThinLTO: -0.11%
* NewPM-ReleaseLTO-g: -0.13%

Notable improvements are mafft (~ -0.50%) and lencod (~ -0.30%), with no
workload regressed.

https://llvm-compile-time-tracker.com/compare.php?from=78a9ee7834331fb4360457cc565fa36f5452f7e0&to=687e08d011b0dc6d3edd223612761e44225c7537&stat=instructions:u

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D141487
2023-01-19 18:11:54 +00:00
Nikita Popov
b3b049a824 [Local] Preserve noundef metadata in copyMetadataForLoad()
If we're only changing the type of the load, preserve the noundef
metadata.
2023-01-19 16:56:09 +01:00
Christian Ulmann
e741b8c2e5 [llvm][ir] Purge MD_prof custom accessors
This commit purges direct accesses to MD_prof metadata and replaces them
with the accessors provided from the utility file wherever possible.
This commit can be seen as the first step towards switching the branch weights to 64 bits.
See post here: https://discourse.llvm.org/t/extend-md-prof-branch-weights-metadata-from-32-to-64-bits/67492

Reviewed By: davidxl, paulkirth

Differential Revision: https://reviews.llvm.org/D141393
2023-01-19 14:26:26 +01:00
Florian Hahn
655c88ca36
[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC)
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.

Depends on D140511.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140512
2023-01-19 12:07:27 +00:00
Quentin Colombet
6b85fa6d81 [InstCombine] Don't optimize idempotent atomicrmw <op>, 0 into load atomic
Turning idempotent `atomicrmw`s into `load atomic` is perfectly legal
with respect to how the loading happens, but it may not be legal for the
whole program semantic.

Indeed, this optimization removes a store that may have some effects on
the legality of other optimizations.
Essentially, we lose some information and depending on the backend
it may or may not produce incorrect code, so don't do it!

This fixes llvm.org/PR56450.

Differential Revision: https://reviews.llvm.org/D141277
2023-01-19 10:04:07 +01:00
Kazu Hirata
83d56fb17a Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)
This patch drops the ZeroBehavior parameter from bit counting
functions like countLeadingZeros.  ZeroBehavior specifies the behavior
when the input to count{Leading,Trailing}Zeros is zero and when the
input to count{Leading,Trailing}Ones is all ones.

ZeroBehavior was first introduced on May 24, 2013 in commit
eb91eac9fb866ab1243366d2e238b9961895612d.  While that patch did not
state the intention, I would guess ZeroBehavior was for performance
reasons.  The x86 machines around that time required a conditional
branch to implement countLeadingZero<uint32_t> that returns the 32 on
zero:

        test    edi, edi
        je      .LBB0_2
        bsr     eax, edi
        xor     eax, 31
.LBB1_2:
        mov     eax, 32

That is, we can remove the conditional branch if we don't care about
the behavior on zero.

IIUC, Intel's Haswell architecture, launched on June 4, 2013,
introduced several bit manipulation instructions, including lzcnt and
tzcnt, which eliminated the need for the conditional branch.

I think it's time to retire ZeroBehavior as its utility is very
limited.  If you care about compilation speed, you should build LLVM
with an appropriate -march= to take advantage of lzcnt and tzcnt.
Even if not, modern host compilers should be able to optimize away
quite a few conditional branches because the input is often known to
be nonzero from dominating conditional branches.

Differential Revision: https://reviews.llvm.org/D141798
2023-01-18 19:58:44 -08:00
Jonas Paulsson
dc3875e468 Add parameter extension attributes in various instrumentation passes.
For the targets that have in their ABI the requirement that arguments and
return values are extended to the full register bitwidth, it is important
that calls when built also take care of this detail.

The OMPIRBuilder, AddressSanitizer, GCOVProfiling, MemorySanitizer and
ThreadSanitizer passes are with this patch hopefully now doing this properly.

Reviewed By: Eli Friedman, Ulrich Weigand, Johannes Doerfert

Differential Revision: https://reviews.llvm.org/D133949
2023-01-18 18:29:12 -06:00
Paul Kirth
da5a8d14b8 Reland [pgo] Avoid introducing relocations by using private alias
In many cases, we can use an alias to avoid a symbolic relocations,
instead of using the public, interposable symbol. When the instrumented
function is in a COMDAT, we can use a hidden alias, and still avoid
references to discarded sections.

Previous versions of this patch allowed the compiler to name the
generated alias, but that would only be valid when the functions were
local. Since the alias may be used across TUs we use a more
deterministic naming convention, and add a ".local" suffix to the alias
name just as we do for relative vtables aliases.

https://reviews.llvm.org/rG20894a478da224bdd69c91a22a5175b28bc08ed9
removed an incorrect assertion on Mach-O which caused assertion failures in LLD.

We addressed the link errors under ThinLTO + PGO + CFI by being more
selective about which comdat functions can be given aliases.
Specifically, we now do not emit an alias in the case of a comdat
function with hidden visibility, since the alias would have the same
linkage and visibility, giving no benefit over using the symbol
directly. This also prevents LowerTypeTest from incorrectly updating the
dangling alias after GlobalOpt replaces uses, and introducing a
duplicate symbol.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D137982
2023-01-18 23:56:35 +00:00
Sanjay Patel
1378e7d8b8 [InstSimplify] add no-wrap parameters to simplifyMul and add more tests; NFC
This gives mul the same capabilities as add/sub.
A potential improvement with nsw was noted in:
1720ec6da040729f17
2023-01-18 13:29:30 -05:00
Sanjay Patel
1720ec6da0 [InstCombine] restrict no-wrap propagation for i1/i2 to avoid miscompiles
This transform was added with 68c197f07eeae71b9b7,
and the post-commit review noted the potential
for miscompiles at narrow bitwidths.

I'm not sure how to expose the i1 nuw bug because we
already simplify that, but other cases show that
there are missing transforms to add in follow-up
patches.
2023-01-18 10:32:12 -05:00
Sanjay Patel
830ac677b7 [InstCombine] reduce code duplication in visitSub(); NFC 2023-01-18 10:17:07 -05:00
Florian Hahn
feee22db52
[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI)
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.

But the new schema ensures the following property when used for
VPDominatorTree:

1. Entry & exit blocks of regions dominate the successors of the region.

This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.

Depends on D140500.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140511
2023-01-18 15:02:41 +00:00
Florian Hahn
22c9f4cf2d
[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 14:23:22 +00:00
Sanjay Patel
c2ab7e2abd [InstCombine] simplify code for matching shift-logic-shift pattern; NFC
We can match and capture in one statement. Also, make the
code more closely resemble the description comment by using
the constant name of an operand value.
2023-01-18 08:13:37 -05:00
Florian Hahn
f615de7e26
[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 12:14:58 +00:00
Matt Arsenault
e7cd42f8e4 Utils: Add utility pass to lower ifuncs
Create a global constructor which will initialize a global table of
function pointers. For now, this is only used as a reduction technique
for llvm-reduce.

In the future this may be useful to support ifunc on systems where the
program loader doesn't natively support it.
2023-01-17 22:33:56 -05:00
Arthur Eubanks
c43f38ec63 Revert ""Reland "[pgo] Avoid introducing relocations by using private alias""
This reverts commit 6e5cbc097a5ac7fa95a8f425af8b03958151c763.

Causes link errors, see http://go/crb/1408161.
2023-01-17 15:41:26 -08:00
Florian Hahn
cdd8fcdbd7
[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-17 21:11:33 +00:00
Florian Hahn
bf1ba6bb52
[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC) 2023-01-17 20:53:14 +00:00
Sanjay Patel
68c197f07e [InstCombine] factor difference-of-squares to reduce multiplication
(X * X) - (Y * Y) --> (X + Y) * (X - Y)
https://alive2.llvm.org/ce/z/BAuRCf

The no-wrap propagation could be relaxed in some cases,
but there does not seem to be an obvious rule for that.
2023-01-17 14:58:40 -05:00
Anshil Gandhi
2449cbabdd [InstCombine] Handle PHI nodes in PtrReplacer
This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in cases where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D136201
2023-01-17 10:56:03 -07:00
Nikita Popov
61bb549cfd [CVP] Avoid duplicate range calculation (NFC)
Calculate the range once for all the sdiv/srem transforms.
2023-01-17 16:54:51 +01:00
Nikita Popov
004e613ce4 [CVP] Avoid duplicate range calculation (NFC)
Calculate the range once and use it in processURem() and
narrowUDivOrURem().
2023-01-17 16:39:27 +01:00
Nikita Popov
a444fe07dd [CVP] Handle use-site conditions in domain-based folds
As a side-effect, this switchem them to use getConstantRange() rather
than getPredicateAt(). getPredicateAt() is not supposed to be more
powerful than getConstantRange() for non-equality comparisons (as
long as block values are used).
2023-01-17 16:35:18 +01:00
Nikita Popov
5c38c6a3aa [CVP] Handle use-site conditions in more folds 2023-01-17 16:14:55 +01:00
Florian Hahn
d47bdae28e
[VPlan] Remove duplicated VPValue IDs (NFCI).
At the moment, both VPValue and VPDef have an ID used when casting via
classof. This duplication is cumbersome, because it requires adding IDs
for new recipes twice and also requires setting them twice. In a few
cases, there's only a VPDef ID and no VPValue ID, which can cause same
confusion.

To simplify things, remove the VPValue IDs for different recipes.
Instead, only retain the generic VPValue ID (= used VPValues without a
corresponding defining recipe) and VPVRecipe for VPValues that are
defined by recipes that inherit from VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140848
2023-01-17 15:11:38 +00:00
luxufan
0ad5909958 [InstCombine] Don't combine smul of i1 type constant one
Fixes: https://github.com/llvm/llvm-project/issues/59876

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D141214
2023-01-17 22:04:48 +08:00
Florian Hahn
c95138392a
[VPlan] Remove unnecessary getNumSuccessors call (NFC).
If ParentWithSuccs is nullptr, the number of successors is guaranteed to
be 0. Simplify the code as suggested by @Ayal in D140511.
2023-01-17 11:44:50 +00:00
Florian Hahn
133f017479
[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC).
This specialization is not needed any longer as VPRecipeBase inherits
from VPUser and getDefiningRecipe returns a VPRecipeBase.
2023-01-17 09:08:33 +00:00
Sergey Kachkov
bfd2dd49ff [GVN] Refactor handling of pointer-select in GVN pass
This patch extends Def memory dependency with support of select
instructions to consistently handle pointer-select conversion.

Differential Revision: https://reviews.llvm.org/D141619
2023-01-17 11:32:06 +03:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Ram-NK
ee7188c8b2 [LoopInterchange] Correcting the profitability check
Before D135808, There would be endless loop interchange posibility (no
proper priority was there in profitability check. Any profitable check
may leads to loop-interchange). With this patch, there  is no endless
interchange (priority in profitable check is defined. Order of decision
is 'Cache cost' check, 'InstrOrderCost', 'Vectorization'). Corrected the
dependency checking inside isProfitableForVectorization(), corrected the
checking of bad order loops in isProfitablePerInstrOrderCost().

Reviewed By: Meinersbur, bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D135808
2023-01-16 14:36:06 -05:00
Sanjay Patel
dedc58da49 [InstCombine] canonicalize a signum (spaceship) that ends in add
(A s>> (BW - 1)) + (zext (A s> 0)) --> (A s>> (BW - 1)) | (zext (A != 0))

https://alive2.llvm.org/ce/z/V-nM8N

This is not the form that we currently match as m_Signum(),
but I'm not sure if one is better than the other, so there's
a follow-up patch needed either way.

For this patch, it should be better for analysis to use a
not-null test and bitwise logic rather than >0 with add.
Codegen doesn't seem significantly different on any targets
that I looked at.

Also note that none of these variants is shown in issue #60012 -
those generally include at least one 'select', so that's likely
where these patterns will end up.
2023-01-16 12:47:21 -05:00
Guillaume Chatelet
135f23d67b Deprecate MemIntrinsicBase::getDestAlignment() and MemTransferBase::getSourceAlignment()
Differential Revision: https://reviews.llvm.org/D141840
2023-01-16 14:22:03 +00:00
Florian Hahn
a6549718d9
[LoopUnroll] Don't update DT for changeToUnreachable.
There is no need to update the DT here, because there must be a unique
latch. Hence if the latch is not exiting it must directly branch back
to the original loop header and does not dominate any nodes.

Skipping a DT update here simplifies D141487.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D141810
2023-01-16 12:25:34 +00:00
Sergey Kachkov
868abc471d Revert "[GVN] Refactor handling of pointer-select in GVN pass"
This reverts commit fc7cdaa373308ce3d72218b4d80101ae19850a6c.
2023-01-16 15:13:17 +03:00
Max Kazantsev
82cee24e3d [JumpThreading] Preserve profile metadata during select unfolding, take 2
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.

This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.

The first version was reverted due to assert caused by i32 overflow,
fixed in this version.

Patch by Roman Paukner!

Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
2023-01-16 19:04:23 +07:00
Sergey Kachkov
fc7cdaa373 [GVN] Refactor handling of pointer-select in GVN pass
This patch introduces new type of memory dependency - Select to
consistently handle it like Def/Clobber dependency.

Differential Revision: https://reviews.llvm.org/D141619
2023-01-16 14:12:28 +03:00
Florian Hahn
56ffd39c3d
[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC).
Various places in the code where still using the VPRecipeBase:: prefix
for VPDef IDs or not prefix at all. Now that the VPDef IDs have been
moved to VPDef, use this prefix instead and consistently use it.
2023-01-16 10:23:52 +00:00
Craig Topper
8e317e693a [InstCombine] Remove dead code from foldICmpShlOne. NFC
This code handles (icmp eq/ne (1 << Y), C) if C is a power of 2.

This case is also handled by the more general foldICmpShlConstConst
which is called before we reach foldICmpShlOne.
2023-01-15 19:10:17 -08:00
Benjamin Kramer
db6961db7a [FuncitonComparator] Clamp StringRef compare output to [-1,1]
The comparison can have different values (but same sign) on big endian
platforms, avoid that to make the unit test green there.
2023-01-16 01:44:55 +01:00
Craig Topper
77f2f34d69 [InstCombine] Generalize (icmp sgt (1 << Y), -1) -> (icmp ne Y, BitWidth-1) to any negative constant.
Similar for the sle version which will be canonicalized to slt first.

Alive2 proof as implemented: https://alive2.llvm.org/ce/z/_YawdM

@spatel's  original Alive2: https://alive2.llvm.org/ce/z/3YB2vs

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141773
2023-01-15 13:36:57 -08:00