Propagate alignment through ptrmask based on potential constant values
of mask and align of ptr.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
std::make_optional<T> is a lot like std::make_unique<T> in that it
performs perfect forwarding of arguments for T's constructor. As a
result, we don't have to repeat type names twice.
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.
The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
Reverts llvm/llvm-project#141800
The implementation critically misunderstands the `AAMemoryBehavior`
attributor, which it relies on heavily.
@shiltian, since I do not have commit permissions.
The attributor conservatively marks pointers whose loads are eligible to
be marked as `!invariant.load`.
It does so by identifying:
1. Pointers marked `noalias` and `readonly`
2. Pointers whose underlying objects are all eligible for invariant
loads.
The attributor then manifests this attribute at non-atomic non-volatile
load instructions.
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
Remove the special case where comparing a dereferenceable_or_null
pointer with null results in captures(none) instead of
captures(address_is_null).
This special case is not entirely correct. Let's say we have an
allocated object of size 2 at address 1 and have a pointer `%p` pointing
either to address 1 or 2. Then passing `gep p, -1` to a
`dereferenceable_or_null(1)` function is well-defined, and allows us to
distinguish between the two possible pointers, capturing information
about the address.
Now that we ignore address captures in alias analysis, I think we're
ready to drop this special case. Additionally, if there are regressions
in other places, the fact that this is inferred as address_is_null
should allow us to easily address them if necessary.
These should not really have uselists, and it's not worth the compile
time of looking at all uses of trivial constants. The main observable
change of this is it no longer adds align attributes on constant null
uses, but those are not useful. Some of these cases should potentially
be more aggressive and not look at any Constant users.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Relative to the previous attempt this includes two fixes:
* Adjust callCapturesBefore() to not skip captures(ret: address,
provenance) arguments, as these will not count as a capture
at the call-site.
* When visiting uses during stack slot optimization, don't skip
the ModRef check for passthru captures. Calls can both modref
and be passthru for captures.
------
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.
The key API changes here are:
* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.
For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.
This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
Relative to the previous attempt, this adjusts isEscapeSource()
to not treat calls with captures(ret: address, provenance) or similar
arguments as escape sources. This addresses the miscompile reported at:
https://github.com/llvm/llvm-project/pull/125880#issuecomment-2656632577
The implementation uses a helper function on CallBase to make this
check a bit more efficient (e.g. by skipping the byval checks) as
checking attributes on all arguments if fairly expensive.
------
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.
The key API changes here are:
* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.
For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.
This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.
The key API changes here are:
* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.
For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.
This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
` forallInterferingAccesses` is a hotspot and for large modules these
changes make a measurable improvement in compilation time.
For LTO kernel compilation of 519.clvleaf (SPEChpc 2021) I measured the
following:
```
| Measured times (s) | Average | speedup
--------------------+------------------------+---------+---------
Baseline | 33.268 33.332 33.275 | 33.292 | 0%
Cache "kernel" | 30.543 30.339 30.607 | 30.496 | 9.2%
templatize callback | 30.981 30.97 30.964 | 30.972 | 7.5%
Both changes | 29.284 29.201 29.053 | 29.179 | 14.1%
```
Currently `AAAddressSpace` relies on identifying the address spaces of
all underlying objects. However, it might infer sub-optimal address
space when the underlying object is a function argument. In
`AMDGPUPromoteKernelArgumentsPass`, the promotion of a pointer kernel
argument is by adding a series of `addrspacecast` instructions (as shown
below), and hoping `InferAddressSpacePass` can pick it up and do the
rewriting accordingly.
Before promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%val = load i32, ptr %to_be_promoted
...
ret void
}
```
After promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%ptr.cast.0 = addrspace cast ptr % to_be_promoted to ptr addrspace(1)
%ptr.cast.1 = addrspace cast ptr addrspace(1) %ptr.cast.0 to ptr
# all the use of %to_be_promoted will use %ptr.cast.1
%val = load i32, ptr %ptr.cast.1
...
ret void
}
```
When `AAAddressSpace` analyzes the code after promotion, it will take
`%to_be_promoted` as the underlying object of `%ptr.cast.1`, and use its
address space (which is 0) as its final address space, thus simply do
nothing in `manifest`. The attributor framework will them eliminate the
address space cast from 0 to 1 and back to 0, and replace `%ptr.cast.1`
with `%to_be_promoted`, which basically reverts all changes by
`AMDGPUPromoteKernelArgumentsPass`.
IMHO I'm not sure if `AMDGPUPromoteKernelArgumentsPass` promotes the
argument in a proper way. To improve the handling of this case, this PR
adds an extra handling when iterating over all underlying objects. If an
underlying object is a function argument, it means it reaches a terminal
such that we can't futher deduce its underlying object further. In this
case, we check all uses of the argument. If they are all `addrspacecast`
instructions and their destination address spaces are same, we take the
destination address space.
Fixes: SWDEV-482640.
If the pointer returned by a function is not "the base pointer" but has
an offset, we need to track the offset such that users can apply it to
their offset chain when they create accesses.
This was reported by @ye-luo and reduced test cases are included. The
OffsetInfo was moved and the container was replaced with a set to avoid
excessive growth. Otherwise, the patch just replaces the "returns
pointer" flag with the "returned offsets", and deals with the applying
to offsets at the call site.
---------
Co-authored-by: Johannes Doerfert <jdoerfert@llnl.gov>
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.