This moves combineAAMetadata() into Local and implements it via a new
AAOnly flag, which will intersect only AA metadata and keep other known
metadata.
The existing KnownIDs list is dropped, because it is redundant with the
switch in combineMetadata(), which already drops unknown metadata.
I tried a few variants of this, and ultimately went with the AAOnly flag
because this way we make an explicit choice for each metadata kind
supported by combineMetadata(), and ignoring the flag gives you
conservatively correct behavior.
I checked that the memcpy tests still pass if we adjust the logic for
MD_memprof/MD_callsite to drop the metadata instead of arbitrarily
picking one.
Fixes https://github.com/llvm/llvm-project/issues/121495.
gnu::always_inline functions, which lower to available_externally, may
not have definitions external to the module. -finstrument-function
family options instrumentating the function (which takes the function
address) may lead to a linker error if the function is not optimized
out, e.g.
```
// -std=c++17 or above with libstdc++
#include <string>
std::string str;
int main() {}
```
Simplified reproduce:
```
template <typename T>
struct A {
[[gnu::always_inline]] T bar(T a) { return a * 2; }
};
extern template class A<int>;
int main(int argc, char **argv) {
return A<int>().bar(argc);
}
```
GCC's -finstrument-function instrumentation skips such functions
(https://gcc.gnu.org/PR78333). Let's skip such functions
(available_externally) as well.
Fix#50742
Pull Request: https://github.com/llvm/llvm-project/pull/121452
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.
All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.
Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.
For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.
Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.
There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
On RISC-V, AArch64, and LoongArch, the `_mcount` function takes
`__builtin_return_address(0)` as an argument since
`__builtin_return_address(1)` is not available on these platforms. This
patch fixes the argument passing to match the behavior of glibc/gcc.
Closes https://github.com/llvm/llvm-project/issues/121103.
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.
---------
Co-authored-by: DianQK <dianqk@dianqk.net>
This PR is motivated by a mismatch we discovered between compilation
results with vs. without `-g3`. We noticed this when compiling SPEC2017
testcases. The specific instance we saw is fixed in this PR by modifying
a guard (see below), but it is likely similar instances exist elsewhere
in the codebase.
The specific case fixed in this PR manifests itself in the `SimplifyCFG`
pass doing different things depending on whether DebugInfo is generated
or not. At the end of this comment, there is reduced example code that
shows the behavior in question.
The differing behavior has two root causes:
1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop
metadata including debug locations to loops that otherwise would not
have loop metadata
2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds
a guard to a simplification action in `SImplifyCFG` that prevents it
from simplifying away loop metadata
So, the change in 2. does not consider that when compiling with debug
symbols, loops that otherwise would not have metadata that needs
preserving, now have debug locations in their loop metadata. Thus, with
`-g3`, `SimplifyCFG` behaves differently than without it.
The larger issue is that while debug info is not supposed to influence
the final compilation result, commits like 1. blur the line between what
is and is not debug info, and not all optimization passes account for
this.
This PR does not address that and rather just modifies this particular
guard in order to restore equivalent behavior between debug and
non-debug builds in this one instance.
---
Here is a reduced version of a file from `f526.blender_r` that showcases
the behavior in question:
```C
struct LinkNode;
typedef struct LinkNode {
struct LinkNode *next;
void *link;
} LinkNode;
void do_projectpaint_thread_ph_v_state() {
int *ps = do_projectpaint_thread_ph_v_state;
LinkNode *node;
while (do_projectpaint_thread_ph_v_state)
for (node = ps; node; node = node->next)
;
}
```
Compiling this with and without DebugInfo, and then disassembling the
results, leads to different outcomes (tested on SystemZ and X86). The
reason for this is that the `SimplifyCFG` pass does different things in
either case.
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.
It is based on Hal Finkel's https://reviews.llvm.org/D32198.
C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.
For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.
The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.
The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.
The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.
Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.
When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.
As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.
It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)
PR: https://github.com/llvm/llvm-project/pull/76259
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.
Test Plan:
ninja check-llvm-unit check-llvm
Loop Optimizations expect the input loop to be in LCSSA form. But it
seems that LoopVersioning doesn't have any check to see if the loop is
actually in LCSSA form. As a result, if we give it a loop which is not
in LCSSA form but still correct semantically, the resulting
transformation fails to pass through verifier pass with the following
error.
Instruction does not dominate all uses!
%inc = add nsw i16 undef, 1
store i16 %inc, ptr @c, align 1
As the loop is not in LCSSA form, LoopVersioning's transformations leads
to invalid IR! As some instructions do not dominate all their uses.
This patch checks if a loop is in LCSSA form, if not it will call
formLCSSARecursively on the loop before passing it to LoopVersioning.
Fixes: #36998
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.
The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.
The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
The global ctor evaluator tries to evalute function calls where the call
function type and function type do not match, by performing bitcasts.
This currently causes a crash when calling a void function with non-void
return type.
I've opted to remove this functionality entirely rather than fixing this
specific case. With opaque pointers, there shouldn't be a legitimate use
case for this anymore, as we don't need to look through pointer type
casts. Doing other bitcasts is very iffy because it ignores ABI
considerations. We should at least leave adjusting the signatures to
make them line up to InstCombine (which also does some iffy things, but
is at least somewhat more constrained).
Fixes https://github.com/llvm/llvm-project/issues/118725.
Consider the following loop:
```
int rdx = init;
for (int i = 0; i < n; ++i)
rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.
This patch added new RecurKind enums - IFindLastIV and FFindLastIV.
---------
Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
When using non-integral pointer types, such as on CHERI targets, size_t
is equivalent
to the index size, which is allowed to be smaller than the size of the
pointer.
This reapplies #119138 with a defensive fix for the assertion failure
when building libcxx.
Unfortunately the failure does not reproduce on my machine, so I am not
able to extract a test case.
The key insight for the fix comes from Jessica Clarke, who observes that
`VTablePtr` may, in fact,
not be a pointer on return from `FindAvailableLoadedValue`.
Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
The `poison` values are used to substitute debug information of values
moved from the original header into the preheader that are no longer
available in the former.
Summary:
Moving the cloning of BBs after the metadata makes the flow of the
function a bit more straightforward and makes it easier to extract more
into helper functions.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
There was a single usage of CloneBasicBlock with non-default
DebugInfoFinder inside CloneFunctionInto which has been refactored in
more focused.
Test Plan:
ninja check-llvm-unit check-llvm
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.
This doesn't have measurable compile-time impact.
Proof: https://alive2.llvm.org/ce/z/ihztLy
Writing a test for this transitively exposed a number of places in
BuildLibCalls where
we were failing to propagate address spaces properly, which are
additionally fixed.
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220
Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.
PR: https://github.com/llvm/llvm-project/pull/117716
This should act like range.
Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
Summary:
Consolidate the logic in a single function. We do an extra pass over
Instructions but this is necessary to untangle things and extract
metadata cloning in a future diff.
Test Plan:
```
$ ninja check-llvm-unit check-llvm
[211/213] Running the LLVM regression tests
Testing Time: 106.06s
Total Discovered Tests: 62601
Skipped : 17 (0.03%)
Unsupported : 2518 (4.02%)
Passed : 59911 (95.70%)
Expectedly Failed: 155 (0.25%)
[212/213] Running lit suite
Testing Time: 12.47s
Total Discovered Tests: 8474
Skipped: 17 (0.20%)
Passed : 8457 (99.80%)
```
Extracted from #109032 (commit 3) (there are more refactors and cleanups
in subsequent commits)
Preserve llvm.access.group metadata on the replacement instruction, if
it does not move. In that case, the program would be UB, if the parallel
property encoded in the metadata does not hold.
This matches the LangRef recently updated in #116220
PR https://github.com/llvm/llvm-project/pull/115868
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.
Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.