Remove LLVM flag -experimental-assignment-tracking. Assignment tracking is
still enabled from Clang with the command line -Xclang
-fexperimental-assignment-tracking which tells Clang to ask LLVM to run the
pass declare-to-assign. That pass converts conventional debug intrinsics to
assignment tracking metadata. With this patch it now also sets a module flag
debug-info-assignment-tracking with the value `i1 true` (using the flag conflict
rule `Max` since enabling assignment tracking on IR that contains only
conventional debug intrinsics should cause no issues).
Update the docs and tests too.
Reviewed By: CarlosAlbertoEnciso
Differential Revision: https://reviews.llvm.org/D142027
Instead of copying just nonnull metadata, use the generic helper
to copy metadata to the new load. This helper is specifically
designed for the case where the load type may change, so it's
safe to use in this context.
In canCreateUndefOrPoison(), take not only poison-generating flags,
but also poison-generating metadata into account. The helpers are
written generically, but I believe the only case that can actually
matter is !range on calls -- !nonnull and !align are only valid on
loads, and those can create undef/poison anyway.
Unfortunately, this negatively impacts logical to bitwise and/or
conversion: For ctpop/ctlz/cttz we always attach !range metadata,
which will now block the transform, because it might introduce
poison. It would be possible to recover this regression by supporting
a ConsiderFlagsAndMetadata=false mode in impliesPoison() and clearing
flags/metadata on visited instructions.
Fixes https://github.com/llvm/llvm-project/issues/59888.
Differential Revision: https://reviews.llvm.org/D142115
Improve findDominatingLoad implementation:
1. Result is saved into gvn::AvailableValue struct
2. Search is done in extended BB (while there is a single predecessor or
limit is reached)
Differential Revision: https://reviews.llvm.org/D141680
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.
Part of D110978
Differential Revision: https://reviews.llvm.org/D141512
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142055
The scope of DT updates are very limited when unrolling loops: the DT
should only need updating for
* new blocks added
* exiting blocks we simplified branches
This can be done manually without too much extra work.
MergeBlockIntoPredecessor also needs to be updated to support direct
DT updates.
This fixes excessive time spent in DTU for same cases. In an internal
example, time spent in LoopUnroll with this patch goes from ~200s to 2s.
It also is slightly positive for CTMark:
* NewPM-O3: -0.13%
* NewPM-ReleaseThinLTO: -0.11%
* NewPM-ReleaseLTO-g: -0.13%
Notable improvements are mafft (~ -0.50%) and lencod (~ -0.30%), with no
workload regressed.
https://llvm-compile-time-tracker.com/compare.php?from=78a9ee7834331fb4360457cc565fa36f5452f7e0&to=687e08d011b0dc6d3edd223612761e44225c7537&stat=instructions:u
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D141487
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.
Depends on D140511.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140512
Turning idempotent `atomicrmw`s into `load atomic` is perfectly legal
with respect to how the loading happens, but it may not be legal for the
whole program semantic.
Indeed, this optimization removes a store that may have some effects on
the legality of other optimizations.
Essentially, we lose some information and depending on the backend
it may or may not produce incorrect code, so don't do it!
This fixes llvm.org/PR56450.
Differential Revision: https://reviews.llvm.org/D141277
This patch drops the ZeroBehavior parameter from bit counting
functions like countLeadingZeros. ZeroBehavior specifies the behavior
when the input to count{Leading,Trailing}Zeros is zero and when the
input to count{Leading,Trailing}Ones is all ones.
ZeroBehavior was first introduced on May 24, 2013 in commit
eb91eac9fb866ab1243366d2e238b9961895612d. While that patch did not
state the intention, I would guess ZeroBehavior was for performance
reasons. The x86 machines around that time required a conditional
branch to implement countLeadingZero<uint32_t> that returns the 32 on
zero:
test edi, edi
je .LBB0_2
bsr eax, edi
xor eax, 31
.LBB1_2:
mov eax, 32
That is, we can remove the conditional branch if we don't care about
the behavior on zero.
IIUC, Intel's Haswell architecture, launched on June 4, 2013,
introduced several bit manipulation instructions, including lzcnt and
tzcnt, which eliminated the need for the conditional branch.
I think it's time to retire ZeroBehavior as its utility is very
limited. If you care about compilation speed, you should build LLVM
with an appropriate -march= to take advantage of lzcnt and tzcnt.
Even if not, modern host compilers should be able to optimize away
quite a few conditional branches because the input is often known to
be nonzero from dominating conditional branches.
Differential Revision: https://reviews.llvm.org/D141798
For the targets that have in their ABI the requirement that arguments and
return values are extended to the full register bitwidth, it is important
that calls when built also take care of this detail.
The OMPIRBuilder, AddressSanitizer, GCOVProfiling, MemorySanitizer and
ThreadSanitizer passes are with this patch hopefully now doing this properly.
Reviewed By: Eli Friedman, Ulrich Weigand, Johannes Doerfert
Differential Revision: https://reviews.llvm.org/D133949
In many cases, we can use an alias to avoid a symbolic relocations,
instead of using the public, interposable symbol. When the instrumented
function is in a COMDAT, we can use a hidden alias, and still avoid
references to discarded sections.
Previous versions of this patch allowed the compiler to name the
generated alias, but that would only be valid when the functions were
local. Since the alias may be used across TUs we use a more
deterministic naming convention, and add a ".local" suffix to the alias
name just as we do for relative vtables aliases.
https://reviews.llvm.org/rG20894a478da224bdd69c91a22a5175b28bc08ed9
removed an incorrect assertion on Mach-O which caused assertion failures in LLD.
We addressed the link errors under ThinLTO + PGO + CFI by being more
selective about which comdat functions can be given aliases.
Specifically, we now do not emit an alias in the case of a comdat
function with hidden visibility, since the alias would have the same
linkage and visibility, giving no benefit over using the symbol
directly. This also prevents LowerTypeTest from incorrectly updating the
dangling alias after GlobalOpt replaces uses, and introducing a
duplicate symbol.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D137982
This transform was added with 68c197f07eeae71b9b7,
and the post-commit review noted the potential
for miscompiles at narrow bitwidths.
I'm not sure how to expose the i1 nuw bug because we
already simplify that, but other cases show that
there are missing transforms to add in follow-up
patches.
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.
But the new schema ensures the following property when used for
VPDominatorTree:
1. Entry & exit blocks of regions dominate the successors of the region.
This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.
Depends on D140500.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140511
We can match and capture in one statement. Also, make the
code more closely resemble the description comment by using
the constant name of an operand value.
Create a global constructor which will initialize a global table of
function pointers. For now, this is only used as a reduction technique
for llvm-reduce.
In the future this may be useful to support ifunc on systems where the
program loader doesn't natively support it.
(X * X) - (Y * Y) --> (X + Y) * (X - Y)
https://alive2.llvm.org/ce/z/BAuRCf
The no-wrap propagation could be relaxed in some cases,
but there does not seem to be an obvious rule for that.
This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in cases where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136201
As a side-effect, this switchem them to use getConstantRange() rather
than getPredicateAt(). getPredicateAt() is not supposed to be more
powerful than getConstantRange() for non-equality comparisons (as
long as block values are used).
At the moment, both VPValue and VPDef have an ID used when casting via
classof. This duplication is cumbersome, because it requires adding IDs
for new recipes twice and also requires setting them twice. In a few
cases, there's only a VPDef ID and no VPValue ID, which can cause same
confusion.
To simplify things, remove the VPValue IDs for different recipes.
Instead, only retain the generic VPValue ID (= used VPValues without a
corresponding defining recipe) and VPVRecipe for VPValues that are
defined by recipes that inherit from VPValue.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140848
This patch extends Def memory dependency with support of select
instructions to consistently handle pointer-select conversion.
Differential Revision: https://reviews.llvm.org/D141619
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.
Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.
Differential Revision: https://reviews.llvm.org/D141814
Before D135808, There would be endless loop interchange posibility (no
proper priority was there in profitability check. Any profitable check
may leads to loop-interchange). With this patch, there is no endless
interchange (priority in profitable check is defined. Order of decision
is 'Cache cost' check, 'InstrOrderCost', 'Vectorization'). Corrected the
dependency checking inside isProfitableForVectorization(), corrected the
checking of bad order loops in isProfitablePerInstrOrderCost().
Reviewed By: Meinersbur, bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D135808
(A s>> (BW - 1)) + (zext (A s> 0)) --> (A s>> (BW - 1)) | (zext (A != 0))
https://alive2.llvm.org/ce/z/V-nM8N
This is not the form that we currently match as m_Signum(),
but I'm not sure if one is better than the other, so there's
a follow-up patch needed either way.
For this patch, it should be better for analysis to use a
not-null test and bitwise logic rather than >0 with add.
Codegen doesn't seem significantly different on any targets
that I looked at.
Also note that none of these variants is shown in issue #60012 -
those generally include at least one 'select', so that's likely
where these patterns will end up.
There is no need to update the DT here, because there must be a unique
latch. Hence if the latch is not exiting it must directly branch back
to the original loop header and does not dominate any nodes.
Skipping a DT update here simplifies D141487.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D141810
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.
This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.
The first version was reverted due to assert caused by i32 overflow,
fixed in this version.
Patch by Roman Paukner!
Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
This patch introduces new type of memory dependency - Select to
consistently handle it like Def/Clobber dependency.
Differential Revision: https://reviews.llvm.org/D141619
Various places in the code where still using the VPRecipeBase:: prefix
for VPDef IDs or not prefix at all. Now that the VPDef IDs have been
moved to VPDef, use this prefix instead and consistently use it.
This code handles (icmp eq/ne (1 << Y), C) if C is a power of 2.
This case is also handled by the more general foldICmpShlConstConst
which is called before we reach foldICmpShlOne.