This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.
It is based on Hal Finkel's https://reviews.llvm.org/D32198.
C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.
For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.
The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.
The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.
The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.
Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.
When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.
As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.
It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)
PR: https://github.com/llvm/llvm-project/pull/76259
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.
Test Plan:
ninja check-llvm-unit check-llvm
Loop Optimizations expect the input loop to be in LCSSA form. But it
seems that LoopVersioning doesn't have any check to see if the loop is
actually in LCSSA form. As a result, if we give it a loop which is not
in LCSSA form but still correct semantically, the resulting
transformation fails to pass through verifier pass with the following
error.
Instruction does not dominate all uses!
%inc = add nsw i16 undef, 1
store i16 %inc, ptr @c, align 1
As the loop is not in LCSSA form, LoopVersioning's transformations leads
to invalid IR! As some instructions do not dominate all their uses.
This patch checks if a loop is in LCSSA form, if not it will call
formLCSSARecursively on the loop before passing it to LoopVersioning.
Fixes: #36998
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.
The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.
The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
The global ctor evaluator tries to evalute function calls where the call
function type and function type do not match, by performing bitcasts.
This currently causes a crash when calling a void function with non-void
return type.
I've opted to remove this functionality entirely rather than fixing this
specific case. With opaque pointers, there shouldn't be a legitimate use
case for this anymore, as we don't need to look through pointer type
casts. Doing other bitcasts is very iffy because it ignores ABI
considerations. We should at least leave adjusting the signatures to
make them line up to InstCombine (which also does some iffy things, but
is at least somewhat more constrained).
Fixes https://github.com/llvm/llvm-project/issues/118725.
Consider the following loop:
```
int rdx = init;
for (int i = 0; i < n; ++i)
rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.
This patch added new RecurKind enums - IFindLastIV and FFindLastIV.
---------
Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
When using non-integral pointer types, such as on CHERI targets, size_t
is equivalent
to the index size, which is allowed to be smaller than the size of the
pointer.
This reapplies #119138 with a defensive fix for the assertion failure
when building libcxx.
Unfortunately the failure does not reproduce on my machine, so I am not
able to extract a test case.
The key insight for the fix comes from Jessica Clarke, who observes that
`VTablePtr` may, in fact,
not be a pointer on return from `FindAvailableLoadedValue`.
Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
The `poison` values are used to substitute debug information of values
moved from the original header into the preheader that are no longer
available in the former.
Summary:
Moving the cloning of BBs after the metadata makes the flow of the
function a bit more straightforward and makes it easier to extract more
into helper functions.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
There was a single usage of CloneBasicBlock with non-default
DebugInfoFinder inside CloneFunctionInto which has been refactored in
more focused.
Test Plan:
ninja check-llvm-unit check-llvm
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.
This doesn't have measurable compile-time impact.
Proof: https://alive2.llvm.org/ce/z/ihztLy
Writing a test for this transitively exposed a number of places in
BuildLibCalls where
we were failing to propagate address spaces properly, which are
additionally fixed.
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220
Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.
PR: https://github.com/llvm/llvm-project/pull/117716
This should act like range.
Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
Summary:
Consolidate the logic in a single function. We do an extra pass over
Instructions but this is necessary to untangle things and extract
metadata cloning in a future diff.
Test Plan:
```
$ ninja check-llvm-unit check-llvm
[211/213] Running the LLVM regression tests
Testing Time: 106.06s
Total Discovered Tests: 62601
Skipped : 17 (0.03%)
Unsupported : 2518 (4.02%)
Passed : 59911 (95.70%)
Expectedly Failed: 155 (0.25%)
[212/213] Running lit suite
Testing Time: 12.47s
Total Discovered Tests: 8474
Skipped: 17 (0.20%)
Passed : 8457 (99.80%)
```
Extracted from #109032 (commit 3) (there are more refactors and cleanups
in subsequent commits)
Preserve llvm.access.group metadata on the replacement instruction, if
it does not move. In that case, the program would be UB, if the parallel
property encoded in the metadata does not hold.
This matches the LangRef recently updated in #116220
PR https://github.com/llvm/llvm-project/pull/115868
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.
Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
I noticed that the two C functions emitted different IR:
```
int switch_duplicate_arms(int switch_val, int v, int w) {
switch (switch_val) {
default:
break;
case 0:
w = v;
break;
case 1:
w = v;
break;
}
return w;
}
int if_duplicate_arms(int switch_val, int v, int w) {
if (switch_val == 0)
w = v;
else if (switch_val == 1)
w = v;
return v0;
}
```
We generate IR that looks like this:
```
define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
switch i32 %1, label %7 [
i32 0, label %5
i32 1, label %6
]
5:
br label %7
6:
br label %7
7:
%8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
ret i32 %8
}
define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
%5 = icmp ult i32 %1, 2
%6 = select i1 %5, i32 %2, i32 %3
ret i32 %6
}
```
For `switch_duplicate_arms`, taking case 0 and 1 are the same since %5
and %6
branch to the same location and the incoming values for %8 are the same
from
those blocks. We could remove one on the duplicate switch targets and
update
the switch with the single target.
On RISC-V, prior to this patch, we generate the following code:
```
switch_duplicate_arms:
li a4, 1
beq a1, a4, .LBB0_2
mv a0, a3
bnez a1, .LBB0_3
.LBB0_2:
mv a0, a2
.LBB0_3:
ret
if_duplicate_arms:
li a4, 2
mv a0, a2
bltu a1, a4, .LBB1_2
mv a0, a3
.LBB1_2:
ret
```
After this patch, the O3 code is optimized to the icmp + select pair,
which
gives us the same code gen as `if_duplicate_arms`, as desired. This
results
is one less branch instruction in the final assembly.
This may help with both code size and further switch simplification. I
found
that this patch causes no significant impact to spec2006/int/ref and
spec2017/intrate/ref.
---------
Co-authored-by: Min Hsu <min@myhsu.dev>
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline
As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
`IRNormalizer` will reorder instructions. Thus, we need to invalidate
analyses. Done in cd500d28cba3177c213f2f2faf50f14ea56e230b. This should
resolve the [BuildBot
failure](https://github.com/llvm/llvm-project/pull/68176#issuecomment-2428243474).
---
Original PR: #68176
Original commit: 1295d2e6da2fe90f3b770ab1d35bf5caecd38bed
Reverted with: 8a12e0131f3d84b470fac63af042aa96a1b19f56
---
Add the llvm-canon tool. Description from the [original
PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu):
> Added a new llvm-canon tool which aims to transform LLVM Modules into
a canonical form by reordering and renaming instructions while
preserving the same semantics. This tool makes it easier to spot
semantic differences while diffing two modules which have undergone
different transformation passes.
The current version of this tool can:
- Reorder instructions within a function.
- Rename instructions based on the operands.
- Sort commutative operands.
This code was originally written by @michalpaszkowski and [submitted to
mainline
LLVM](14d358537f).
However, it was quickly
[reverted](335de55fa3)
to do BuildBot errors.
Michal presented his version of the tool in [LLVM-Canon: Shooting for
Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg).
@AidanGoldfarb and I ported the code to the new pass manager, added more
tests, and fixed some bugs related to PHI nodes that may have been the
root cause of the BuildBot errors that caused the patch to be reverted.
Additionally, we rewrote the implementation of instruction reordering to
fix cases where the original algorithm would break use-def chains.
Note that this is @AidanGoldfarb and I's first time submitting to LLVM.
Please liberally critique the PR!
CC @plotfi for initial review.
---------
Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>
Reorganize the code into phases:
* Analyze/normalize
* Create extracted function prototype
* Generate the new function's implementation
* Generate call to new function
* Connect call to original function's CFG
The motivation is #114669 to optionally clone the selected code region
into the new function instead of moving it. The current structure made
it difficult to add such functionality since there was no obvious place
to do so, not made easier by some functions doing more than their name
suggests. For instance, constructFunction modifies code outside the
constructed function, but also function properties such as
setPersonalityFn are derived somewhere else. Another example is
emitCallAndSwitchStatement, which despite its name also inserts stores
for output parameters.
Many operations also implicitly depend on the order they are applied
which this patch tries to reduce. For instance, ExtractedFuncRetVals
becomes the list exit blocks which also defines the return value when
leaving via that block. It is computed early such that the new
function's return instructions and the switch can be generated
independently. Also, ExtractedFuncRetVals is combining the lists
ExitBlocks and OldTargets which were not always kept consistent with
each other or NumExitBlocks. The method recomputeExitBlocks() will
update it when necessary.
The coding style partially contradict the current coding standard. For
instance some local variable start with lower case letters. I updated
some, but not all occurrences to make the diff match at least some lines
as unchanged.
The patch [D96854](https://reviews.llvm.org/D96854) introduced some
confusion of function argument indexes this is fixed here as well, hence
the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp.
Patch [D121061](https://reviews.llvm.org/D121061) introduced
AllocationBlock, but not all allocas were inserted there.
Efectively includes the following fixes:
1. ce73b1672a
2. 4aaa925786
3. Missing allocas, still unfixed
Originally submitted as https://reviews.llvm.org/D115218
The expansion of a SCEVUnknown is trivial (it's just the wrapped value).
If we try to reuse an existing value it might be a more complex
expression that simplifies to the SCEVUnknown.
This is inspired by https://github.com/llvm/llvm-project/issues/114879,
because SCEVExpander replacing a constant with a phi node is just silly.
(I don't consider this a fix for that issue though.)
Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
This patch fixes a simple error where as part of loop unrolling we
optimize conditional loop-exiting branches into unconditional branches
when we know that they will or won't exit the loop, but does not
propagate the source location of the original branch to the new one.
Found using https://github.com/llvm/llvm-project/pull/107279.