The TranspBlocks set was used to cache aliasing decision for all
processed loads in the parent loop. This is incorrect, because each load
can access a different location, which means one load not being modified
in a block doesn't translate to another load not being modified in the
same block.
All loads access the same underlying object, so we could perhaps use a
location without size for all loads and retain the cache, but that would
mean we loose precision.
For now, just drop the cache.
Fixes https://github.com/llvm/llvm-project/issues/84807
PR: https://github.com/llvm/llvm-project/pull/84835
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This reverts commit 8b8466fd31e5a194fd8ba7a73a0f23d32f164318.
This is causing size regressions with -Oz and FullLTO. Revert while I
come up with a repro.
Argument promotion mostly works on functions with more than one caller (otherwise the function would be inlined or is dead), so there's a good chance that performing this increases code size since we introduce loads at every call site. If any caller is marked minsize, bail.
We could compare the number of loads/stores removed from the function with the number of loads introduced in callers, but that's TODO.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149768
With LLVM patch https://reviews.llvm.org/D148269, we hit a linux kernel
bpf selftest compilation failure like below:
...
progs/test_xdp_noinline.c:739:8: error: too many args to t8: i64 = GlobalAddress<ptr @encap_v4> 0, progs/test_xdp_noinline.c:739:8
if (!encap_v4(xdp, cval, &pckt, dst, pkt_bytes))
^
...
progs/test_xdp_noinline.c:321:6: error: defined with too many args
bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
^
...
Note that bpf selftests are compiled with -O2 which is
the recommended flag for bpf community.
The bpf backend calling convention is only allowing 5
parameters in registers and does not allow pass arguments
through stacks. In the above case, ArgumentPromotionPass
replaced parameter '&pckt' as two parameters, so the total
number of arguments after ArgumentPromotion pass becomes 6
and this caused later compilation failure during instruction
selection phase.
This patch added a TargetTransformInfo hook getMaxNumArgs()
which returns 5 for BPF and UINT_MAX for other targets.
Differential Revision: https://reviews.llvm.org/D148551
For poison-generating (rather than IUB) metadata, only copy it
from the dominating must-exec load if it is combined with !noundef.
This could be further extended by additionall intersecting the
metadata from all loads, which does not require !noundef.
ArgPromotion currently produces phantom / dead loads. A good example of this is store-into-inself.ll. First, ArgPromo finds the promotable argument %p in @l. Then it inserts a load of %p in the caller, and passes instead the loaded value / transforms the function body. PromoteMem2Reg is able to optimize out the entire function body, resulting in an unused argument. In a subsequent ArgPromotion pass, it removes the dead argument, resulting in a dead load in the caller. These dead loads may reduce effectiveness of other transformations (e.g. SimplifyCFG, MergedLoadStoreMotion).
This patch removes loads and geps that are made dead in the caller after removal of dead args.
Differential Revision: https://reviews.llvm.org/D146327
After D141386 !nonnull violation returns poison rather than
resulting in immediate undefined behavior. However, converting
it into an assume would result in IUB. As such, we can only
perform this transform if !noundef is also present.
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.
In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.
The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676
Differential Revision: https://reviews.llvm.org/D125485
Support for the legacy pass manager in ArgPromotion causes
complications in D125485. As the legacy pass manager for middle-end
optimizations is unsupported, drop ArgPromotion from the legacy
pipeline, rather than introducing additional complexity to deal
with it.
Differential Revision: https://reviews.llvm.org/D128536
If a pointer argument is unused within the callee, this argument should
be removed from the function's signature while all used pointer
arguments should be promoted as it is expected. The ArgumentPromotion
pass doesn't touch unused non-pointer arguments at all.
If a load with the same offset has already been seen but the load had
a lower alignment, the pass has to check whether the pointer is
dereferenceable and is sufficiently aligned (so, the new alignment must
be taken into account).
It makes sense to make a non-byval promotion attempt first and then
fall back to the byval one. The non-byval ('usual') promotion is
generally better, for example it does promotion even when a structure
has more elements than 'MaxElements' but not all of them are actually
used in the function.
Differential Revision: https://reviews.llvm.org/D124514
X86 codegen uses function attribute `min-legal-vector-width` to select the proper ABI. The intention of the attribute is to reflect user's requirement when they passing or returning vector arguments. So Clang front-end will iterate the vector arguments and set `min-legal-vector-width` to the width of the maximum for both caller and callee.
It is assumed any middle end optimizations won't care of the attribute expect inlining and argument promotion.
- For inlining, we will propagate the attribute of inlined functions because the inlining functions become the newer caller.
- For argument promotion, we check the `min-legal-vector-width` of the caller and callee and refuse to promote when they don't match.
The problem comes from the optimizations' combination, as shown by https://godbolt.org/z/zo3hba8xW. The caller `foo` has two callees `bar` and `baz`. When doing argument promotion, both `foo` and `bar` has the same `min-legal-vector-width`. So the argument was promoted to vector. Then the inlining inlines `baz` to `foo` and updates `min-legal-vector-width`, which results in ABI mismatch between `foo` and `bar`.
This patch fixes the problem by expanding the concept of `min-legal-vector-width` to indicator of functions arguments. That says, any passes touch functions arguments have to set `min-legal-vector-width` to the value reflects the width of vector arguments. It makes sense to me because any arguments modifications are ABI related and should response for the ABI compatibility.
Differential Revision: https://reviews.llvm.org/D123284
The condition should be 'ArgParts.size() > MaxElements', so that if we
have exactly 3 elements in the 'ArgParts' vector, the promotion should
be allowed because the 'MaxElement' threshold is not exceeded yet.
The default value for 'MaxElement' has been decreased to 2 in order
to avoid an actual change in argument promoting behavior. However,
this changes byval argument transformation behavior by allowing
adding not more than 2 arguments to the function instead of 3 allowed
before.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D124178
add –allow-unused-prefixes.
This test has two runs that differ in what functions are left after the inliner,
for example: barney exists on OLDPM path but don’t exist on NEWPM path.
I restored prefixes this test had had after automatic checks were introduced
for this test.
For now there are no checks left for ALL_NEWPM path, but the behavior seem to
change over time so I added –allow-unused-prefixes to ease following check updates.
Renamed %tmp => %temp IR values to avoid update warning.
Differential revision: https://reviews.llvm.org/D120207
In addition to the self-recursion check, also check whether there
is more than one node in the SCC, which implies that there is a
larger cycle. I believe checking SCC structure (rather than
something like norecurse) is the right thing to do here, because
this is specifically about preventing infinite loops over the SCC.
Fixes https://github.com/llvm/llvm-project/issues/42028.
Differential Revision: https://reviews.llvm.org/D119418
This rewrites ArgPromotion to be based on offsets rather than GEP
structure. We inspect all loads at constant offsets and remember
which types are loaded at which offsets. Then we promote based on
those types.
This generalizes ArgPromotion to work with bitcasted loads, and
is compatible with opaque pointers.
This patch also fixes incorrect handling of alignment during
argument promotion. Previously, the implementation only checked
that the pointer is dereferenceable, but was happy to speculate
overaligned loads. (I would have fixed this separately in advance,
but I found this hard to do with the previous implementation
approach).
Differential Revision: https://reviews.llvm.org/D118685
Before walking all the callers, check whether we have a
dereferenceable attribute directly on the argument.
Also make it clearer that the code currently does not treat
alignment correctly.
This updates transform test cases for
ADCE
AddDiscriminators
AggressiveInstCombine
AlignmentFromAssumptions
ArgumentPromotion
BDCE
CalledValuePropagation
DCE
Reg2Mem
WholeProgramDevirt
to use the -passes syntax when specifying the pipeline.
Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is
a deprecated feature) the updated test cases already used the new
pass manager, but they were using the legacy syntax when specifying
the passes to run. This patch can be seen as a step toward deprecating
that interface.
This patch also removes some redundant RUN lines. Here I am
referring to test cases that had multiple RUN lines verifying both
the legacy "-passname" syntax and the new "-passes=passname" syntax.
Since we switched the default pass manager to "new PM" both RUN lines
have verified the new PM version of the pass (more or less wasting
time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER
is set to "off". It is assumed that it is enough to run these tests
with the new pass manager now.
Differential Revision: https://reviews.llvm.org/D108472
Make sure the alignment of the generated operations matches the
alignment of the byval argument. Previously, we were just ignoring
alignment and getting lucky.
While I'm here, also delete the unnecessary "tail" handling.
Passing a pointer to a byval argument to a "tail" call is UB, so
rewriting to an alloca doesn't require any special handling.
Differential Revision: https://reviews.llvm.org/D89819