This is a test case for D139582.
In this test case, %v4 and %v5 can be moved to predecessors, %v3 can be changed to a PHI instruction.
Differential Revision: https://reviews.llvm.org/D140234
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
Should cover most of the tests for GVN, GVNHoist, GVNSink, GlobalOpt,
GlobalSplit, InstCombine, Reassociate, SROA and TailCallElim that
had not been updated earlier.
If PRE is performed as part of the main GVN pass (to PRE GEP
operands before processing loads), and it is performed across a
backedge, we will end up adding the new instruction to the leader
table of a block that has not yet been processed. When it will be
processed, GVN will incorrectly assume that the value is already
available, even though it is only available at the end of the
block.
Avoid this by not performing PRE across backedges.
Fixes https://github.com/llvm/llvm-project/issues/58418.
Differential Revision: https://reviews.llvm.org/D136095
This is the first patch in a series intended for removing flag
-enable-new-pm=0 from lit tests. This is part of a bigger
effort of completely removing legacy code related to legacy
pass manager in favor of currently default new pass manager.
In this patch flag has been removed only from tests where no significant
change has been required because checks has been duplicated for
both PMs.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D134150
IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.
Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133898
This is a fix for PR57025 and an alternative to D131776. The problem
in the phi-translation-to-wrong-context.ll test case is that phi
translation of %gep.j into if2 pick %gep.i as the result. While this
instruction has the correct pointer address, it occurs in a context
where %i != 0. As such, we get a NoAlias result for the store in
if2, even though they do alias for %i == 0 (which is legal in the
original context of the pointer).
PHITranslateValue already has a MustDominate option, which can be
used to restrict PHI translation results to values that dominate the
translated-into block. However, this is more aggressive than what we
need and would significantly regress GVN results. In particular, if
we have a pointer value that does not require any translation, then
it is fine to continue using that value in the predecessor, because
the context is still correct for the original query. We only run into
problems if PHITranslateSubExpr() picks a completely random
instruction in a context that may have preconditions that do not hold.
Fix this by always performing the dominance checks in
PHITranslateSubExpr(), without enabling the more general MustDominate
requirement.
Fixes https://github.com/llvm/llvm-project/issues/57025. This also
fixes the test case for https://github.com/llvm/llvm-project/issues/30999,
but I'm not sure whether that's just the particular test case,
or a general solution to the problem.
Differential Revision: https://reviews.llvm.org/D132935
This commit adds a reproducer for
https://github.com/llvm/llvm-project/issues/57025
showing a miscompile in GVN.
Not sure how likely this kind of faults would be in a normal pipeline,
considering that the input IR has some dead code in it. On the other
hand, GVN itself sometimes creates dead basic blocks when splitting
critical edges. Anyway, the fault was found when doing fuzzy testing
using random pass pipelines.
Differential Revision: https://reviews.llvm.org/D131775
As my goal is to remove at least _some_ functions from the static list
in MemoryBuiltins.cpp, these tests either need to run inferattrs or
statically declare these attributes to keep passing. A couple of tests
had alternate cases which are no longer meaningful, e.g.
`malloc-load-removal.ll`.
Differential Revision: https://reviews.llvm.org/D123087
The code in this `#if 0` block appears to be a net benefit. Put it
behind a switch defaulting to off to support experimentation and as a
request for comment.
The codegen impact of enabling this that I'm currently persuing is that
it allows PRE to take place more frequently, particularly in loops with
second order recurrences.
Preliminary experimental data:
Across LNT on AArch64, 54 benchmarks are sped up by >1%, and 42 are
regressed by >1%, the geomean (exec_time_enabled / exec_time_disabled)
of these 96 "1% or greater significance" benchmarks is 0.991. For the
full set of 770 benchmarks it's 0.998.
There are two benchmarks which experience a >30% speedup, and the worst
slowdown is ~12%, and for every benchmark with a slowdown there is a
benckmark which is sped up by a greater factor.
Differential Revision: https://reviews.llvm.org/D130241
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
After D129205, we support SplitBlockPredecessors() for predecessors
with callbr terminators. This means that it is now also safe to
invoke critical edge splitting for an edge coming from a callbr
terminator. Remove checks in various passes that were protecting
against that.
Differential Revision: https://reviews.llvm.org/D129256
Currently, we only remove dead blocks and non-feasible edges in
IPSCCP, but not in SCCP. I'm not aware of any strong reason for
that difference, so this patch updates SCCP to perform the CFG
cleanup as well.
Compile-time impact seems to be pretty minimal, in the 0.05%
geomean range on CTMark.
For the test case from https://reviews.llvm.org/D126962#3611579
the result after -sccp now looks like this:
define void @test(i1 %c) {
entry:
br i1 %c, label %unreachable, label %next
next:
unreachable
unreachable:
call void @bar()
unreachable
}
-jump-threading does nothing on this, but -simplifycfg will produce
the optimal result.
Differential Revision: https://reviews.llvm.org/D128796
This option was added in D89854. It prevents GVN from performing
load PRE in a loop, if doing so would require critical edge
splitting on the backedge. From the review:
> I know that GVN Load PRE negatively impacts peeling,
> loop predication, so the passes expecting that latch has
> a conditional branch.
In the PhaseOrdering test in this patch, splitting the backedge
negatively affects vectorization: After critical edge splitting,
the loop gets rotated, effectively peeling off the first loop
iteration. The effect is that the first element is handled
separately, then the bulk of the elements use a vectorized
reduction (but using unaligned, off-by-one memory accesses) and
then a tail of 15 elements is handled separately again.
It's probably worth noting that the loop load PRE from D99926 is
not affected by this change (as it does not need backedge
splitting). This is about normal load PRE that happens to occur
inside a loop.
Differential Revision: https://reviews.llvm.org/D126382
This reverts commit 9b1e00738c5ddba681e17e5cb7c260d9afc4c3a7.
Nikic reported in commit thread that I had forgotten history here, and that a) we'd tried this before, and b) had to revert due to an unexpected codegen impact. Current measurements confirm the same issue still exists.
This code pre-exists the generic handling for inaccessiblememonly. If we remove it and update one test with inaccessiblememonly, nothing else changes. Note that simply running O1 on that test would annotate malloc with the missing inaccessiblememonly.
When using opaque pointers, convert GEPs into offset representation
of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset.
This allows us to recognize equivalent address calculations even if
the GEPs don't use the same source element type.
This fixes an opaque pointer codegen regression seen in rustc.
Differential Revision: https://reviews.llvm.org/D124527
To avoid incorrectly merging GEPs with different source types
under opaque pointers.
To avoid increasing the Expression structure size, this reuses the
existing type member. The code does not rely on this to be the
expression result type, it's only used as a disambiguator.
This patch extends the available-value logic to detect loads
of pointer-selects that can be replaced by a value select.
For example, consider the code below:
loop:
%sel.phi = phi i32* [ %start, %ph ], [ %sel, %ph ]
%l = load %ptr
%l.sel = load %sel.phi
%sel = select cond, %ptr, %sel.phi
...
exit:
%res = load %sel
use(%res)
The load of the pointer phi can be replaced by a load of the start value
outside the loop and a new phi/select chain based on the loaded values,
as illustrated below
%l.start = load %start
loop:
sel.phi.prom = phi i32 [ %l.start, %ph ], [ %sel.prom, %ph ]
%l = load %ptr
%sel.prom = select cond, %l, %sel.phi.prom
...
exit:
use(%sel.prom)
This is a first step towards alllowing vectorizing loops using common libc++
library functions, like std::min_element (https://clang.godbolt.org/z/6czGzzqbs)
#include <vector>
#include <algorithm>
int foo(const std::vector<int> &V) {
return *std::min_element(V.begin(), V.end());
}
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D118143
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.
Differential Revision: https://reviews.llvm.org/D117180
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115410