Adds logic to the IR verifier that checks whether !tbaa.struct nodes are
well-formed. That is, it checks that the operands of !tbaa.struct nodes
are in groups of three, that each group of three operands consists of
two integers and a valid tbaa node, and that the regions described by
the offset and size operands are non-overlapping.
PR: https://github.com/llvm/llvm-project/pull/86709
If a gep has only one phi as one of its operands and the remaining
indexes are constant, we can unfold `gep ptr, (phi idx1, idx2)` to `phi
((gep ptr, idx1), (gep ptr, idx2))`.
Take care not to unfold recursive phis.
Followup to #80983.
This was initially was #83087. Initial PR did not handle allocas in
entry block that weren't at the beginning of the function, causing GEPs
to be inserted after the first chunk of allocas but potentially before
an alloca not at the beginning. Insert GEPs at the end of the entry
block instead since constants/arguments/static allocas can all be used
there.
This reverts commit 2eb63982e88b9ed8336158d35884b1a1d04a0f78.
This caused verifier error
```
Instruction does not dominate all uses!
```
for some projects using Halide.
The verifier error happens inside `Halide::Internal::CodeGen_LLVM::optimize_module`
and looks like a genuine SROA issue.
If a gep has only one phi as one of its operands and the remaining
indexes are constant, we can unfold `gep ptr, (phi idx1, idx2)` to `phi
((gep ptr, idx1), (gep ptr, idx2))`.
Take care not to unfold recursive phis.
Followup to #80983.
Reverted due to failures on buildbots, where a new cl flag was placed
in the wrong file, resulting in link errors.
https://lab.llvm.org/buildbot/#/builders/198/builds/8548
This reverts commit 0b398256b3f72204ad1f7c625efe4990204e898a.
This patch adds support for printing the proposed non-instruction debug
info ("RemoveDIs") out to textual IR. This patch does not add any
bitcode support, parsing support, or documentation.
Printing of the new format is controlled by a flag added in this patch,
`--write-experimental-debuginfo`, which defaults to false. The new
format will be printed *iff* this flag is true, so whether we use the IR
format is completely independent of whether we use non-instruction debug
info during LLVM passes (which is controlled by the
`--try-experimental-debuginfo-iterators` flag).
Even with the flag disabled, some existing tests need to be updated, as this
patch causes debug intrinsic declarations to be changed in a round trip,
such that they always appear at the end of a module and have no attributes
(this has no functional change on the module).
The design of this new IR format was proposed previously on
Discourse, and any further discussion about the design can still be
contributed there:
https://discourse.llvm.org/t/rfc-debuginfo-proposed-changes-to-the-textual-ir-representation-for-debug-values/73491
If a split memory access introduced by SROA accesses precisely a single
field of the original operation's !tbaa.struct, use the !tbaa tag for
the accessed field directly instead of the full !tbaa.struct.
InstCombine already had a similar logic.
Motivation for this and follow-on patches is to improve codegen for
libc++, where using memcpy limits optimizations, like vectorization for
code iteration over std::vector<std::complex<float>>:
https://godbolt.org/z/f3vqYos3c
Depends on https://github.com/llvm/llvm-project/pull/81285.
SROA currently supports converting a gep of select into select of gep if
the select is in the pointer operand. This patch expands support to
selects in an index operand.
This is intended to address the regression reported in
https://github.com/llvm/llvm-project/pull/68882#issuecomment-1924909922.
With this, I get a clean test suite running under RemoveDIs, the
non-intrinsic representation of debug-info, including under asan. We've
previously established that we generate identical binaries for some
large projects, so this i just edge-case cleanup. The changes:
* CodeGenPrepare fixups need to apply to dbg.assigns as well as
dbg.values (a dbg.assign is a dbg.value).
* Pin a test for constant-deletion to intrinsic debug-info: this very
rare scenario uses a different kill-location sigil in dbg.value mode to
RemoveDIs mode, which generates spurious test differences.
* Suppress a memory leak in a unit test: the code for dealing with
trailing debug-info in a block is necessarily fiddly, leading to this
leak when testing it. Developer-facing interfaces for moving
instructions around always deal with this behind the scenes.
* SROA, when replacing some vector-loads, needs to insert the
replacement loads ahead of any debug-info records so that their values
remain dominated by a definition. Set the head-bit indicating our
insertion should come before debug-info.
f9c2a341b9
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.
Ref `vector-promotion.ll:ptrLoadStoreTys`.
Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.
After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.
The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.
This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
This patch re-implements a variety of debug-info maintenence functions
to use DPValues instead of DbgValueInst's: supporting the "new"
non-intrinsic representation of debug-info. As per [0], we need to have
parallel implementations of various utilities for a time, and these are
the most fundamental utilities used throughout the compiler.
I've added --try-experimental-debuginfo-iterators to a variety of RUN
lines: this is a flag that turns on "new debug-info" if it's built into
LLVM, and not otherwise. This should ensure that we have the same
behaviour for the same IR inputs, but using a different internal
representation. For the most part these changes affect SROA/Mem2Reg
promotion of dbg.declares into dbg.value intrinsics (now DPValues),
we're leaving dbg.declares as instructions until later in the day.
There's also some salvaging changes made.
I believe the tests that I've added cover almost all the code being
updated here. The only thing I'm not confident about is SimplifyCFG,
which calls rewriteDebugUsers down a variety of code paths. Those
changes can't immediately get full coverage as an additional patch is
needed that updates handling of Unreachable instructions, will upload
that shortly.
[0]
https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939/9
Let `llvm.launder.invariant.group` intrinsic as well as instructions
operating on memory addresses, whose invariance may be broken by the
intrinsic, to be rewritten.
Fixes: https://github.com/llvm/llvm-project/issues/72035.
For volatile atomic, this may result in a verifier errors, if the
new alloca type is not legal for atomic accesses.
I've opted to disable this special case for volatile accesses in
general, as changing the size of the volatile access seems
dubious in any case.
Fixes https://github.com/llvm/llvm-project/issues/64721.
When IRBuilder is given an insertion position and there is debug-info, it
sets the DebugLoc of newly inserted instructions to the DebugLoc of the
insertion position. Unfortunately, that means if you insert in front of a
debug intrinsics, your "real" instructions get potentially-misleading
source locations from the debug intrinsics. Worse, if you compile -gmlt to
get source locations but no variable locations, you'll get different source
locations to a normal -g build, which is silly.
Rectify this with the getStableDebugLoc method, which skips over any debug
intrinsics to find the next "real" instruction. This is the source location
that you would get if you compile with -gmlt, and it remains stable in the
presence of debug intrinsics. The changed tests show a few locations where
this has been happening, for example selecting line-zero locations for
instrumentation on a perfectly valid call site.
Differential Revision: https://reviews.llvm.org/D159485
This patch adds a hidden CLI option "--sroa-max-alloca-slices", which is
an integer that controls the maximum number of alloca slices SROA can
consider before bailing out. This is useful because it may not be
profitable to split memcpys into (possibly tens of) thousands of loads/stores.
This also prevents an issue with exponential compile time explosion in passes
like DSE and MemCpyOpt caused by excessive alloca splitting.
Fixes https://github.com/rust-lang/rust/issues/88580.
Differential Revision: https://reviews.llvm.org/D159354
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, similar to D85725.
Fix https://gcc.godbolt.org/z/o561P9zj4
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158631
This was an oversight in D146872, where function calls with tuple type
was not covered. This commit fixes this.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D157953
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.
In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.
Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
After D138238 introduced the then/else blocks, we should remove UB-implying metadata for the promoted speculative instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148456
SROA may convert a wide integer load into a narrow pointer load,
make sure we don't crash. It would not be legal to transfer the
metadata in this case.
Second try at A-Wadhwani's https://reviews.llvm.org/D132096, which was reverted.
The original patch had three issues:
* https://reviews.llvm.org/D134032, which bjope kindly fixed. That patch is merged into this one.
* [GHI #57796](https://github.com/llvm/llvm-project/issues/57796). Fixed and added a test.
* [GHI #57821](https://github.com/llvm/llvm-project/issues/57821). I believe this is an undefined behavior which is not the fault of the original patch. Please see the issue for more details.
Original diff summary:
This patch adds additional vector types to be considered when doing promotion in
SROA, based on the types of the store and load slices. This provides more
promotion opportunities, by potentially using an optimal "intermediate" vector
type.
For example, the following code would currently not be promoted to a vector,
since `__m128i` is a `<2 x i64>` vector.
```
__m128i packfoo0(int a, int b, int c, int d) {
int r[4] = {a, b, c, d};
__m128i rm;
std::memcpy(&rm, r, sizeof(rm));
return rm;
}
```
```
packfoo0(int, int, int, int):
mov dword ptr [rsp - 24], edi
mov dword ptr [rsp - 20], esi
mov dword ptr [rsp - 16], edx
mov dword ptr [rsp - 12], ecx
movaps xmm0, xmmword ptr [rsp - 24]
ret
```
By also considering the types of the elements, we could find that the `<4 x i32>` type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can
provide us more promotion opportunities.
Additionally, the step for removing duplicate elements from the `CandidateTys` vector was not using an equality comparator, which has been fixed.
Differential Revision: https://reviews.llvm.org/D143225
`RankVectorTypes` is a not an equivalence relation so when it is used in
`std::unique`, the behavior is undefined. Create `RankVectorTypesEq` and use
that instead.
The C and C++ Language Extensions for AArch64 SME2 [1] adds a new type called
`svcount_t` which describes a predicate. This is not a predicate vector
mask, but rather a description of a predicate vector mask that can be
expanded into a mask using explicit instructions. The type is a scalable
opaque type.
To implement `svcount_t` type this patch uses the existing Target Extension Type
mechanism, but adds further support so that this type can be a scalable type.
AArch64 CodeGen support will follow in a separate patch.
[1] https://github.com/ARM-software/acle/pull/217
Reviewed By: jcranmer-intel, nikic
Differential Revision: https://reviews.llvm.org/D136861
After D141386 !nonnull violation returns poison rather than
resulting in immediate undefined behavior. However, converting
it into an assume would result in IUB. As such, we can only
perform this transform if !noundef is also present.
In copyRangeMetadata() and by extension copyLoadMetadata(),
handle the trivial case where the type did not change, in which
case we can simply preserve the range metadata as is.
Instead of copying just nonnull metadata, use the generic helper
to copy metadata to the new load. This helper is specifically
designed for the case where the load type may change, so it's
safe to use in this context.