Currently `AAAddressSpace` relies on identifying the address spaces of
all underlying objects. However, it might infer sub-optimal address
space when the underlying object is a function argument. In
`AMDGPUPromoteKernelArgumentsPass`, the promotion of a pointer kernel
argument is by adding a series of `addrspacecast` instructions (as shown
below), and hoping `InferAddressSpacePass` can pick it up and do the
rewriting accordingly.
Before promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%val = load i32, ptr %to_be_promoted
...
ret void
}
```
After promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%ptr.cast.0 = addrspace cast ptr % to_be_promoted to ptr addrspace(1)
%ptr.cast.1 = addrspace cast ptr addrspace(1) %ptr.cast.0 to ptr
# all the use of %to_be_promoted will use %ptr.cast.1
%val = load i32, ptr %ptr.cast.1
...
ret void
}
```
When `AAAddressSpace` analyzes the code after promotion, it will take
`%to_be_promoted` as the underlying object of `%ptr.cast.1`, and use its
address space (which is 0) as its final address space, thus simply do
nothing in `manifest`. The attributor framework will them eliminate the
address space cast from 0 to 1 and back to 0, and replace `%ptr.cast.1`
with `%to_be_promoted`, which basically reverts all changes by
`AMDGPUPromoteKernelArgumentsPass`.
IMHO I'm not sure if `AMDGPUPromoteKernelArgumentsPass` promotes the
argument in a proper way. To improve the handling of this case, this PR
adds an extra handling when iterating over all underlying objects. If an
underlying object is a function argument, it means it reaches a terminal
such that we can't futher deduce its underlying object further. In this
case, we check all uses of the argument. If they are all `addrspacecast`
instructions and their destination address spaces are same, we take the
destination address space.
Fixes: SWDEV-482640.
If the pointer returned by a function is not "the base pointer" but has
an offset, we need to track the offset such that users can apply it to
their offset chain when they create accesses.
This was reported by @ye-luo and reduced test cases are included. The
OffsetInfo was moved and the container was replaced with a set to avoid
excessive growth. Otherwise, the patch just replaces the "returns
pointer" flag with the "returned offsets", and deals with the applying
to offsets at the call site.
---------
Co-authored-by: Johannes Doerfert <jdoerfert@llnl.gov>
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
Instead of visiting call sites in Attribute::checkForAllUses, we now
keep track of returns in AAPointerInfo and use the call site return
information as required. This way, the user of
AAPointerInfo(CallSite)Argument can determine if the call return should
be visited. We do not collect them as "may accesses" in the
AAPointerInfo(CallSite)Argument itself in case a return user is found.
When we propagate call site arguments we always need to translate them,
this is important as we ended up picking the function argument for a
recurisve call not the call site argument. `@recBad` and `@recGood` in
`returned.ll` show the problem as they used to transform them the same
way. The restructuring cleans the code up and helps derive more
"returned" arguments and better information in the presence of recursive
calls. The "dropped" attributes are simply dropped because we do not
query them anymore, not because we cannot derive them.
Before, we kept the call site access kind (may/must) when we translated
the access. However, the pointer we access it through (by passing it to
the callee) might not be the underlying object. We have similar logic
when we add store and load accesses.
- Allocas and GlobalValues cannot be simplified, so we should not try.
- If we never used any assumed state, the AAUnderlyingObjects doesn't
require an additional update.
- If we have seen an object (or it's underlying object) before, we do
not need to inspect it anymore.
The original logic for "SeenObjects" was flawed and caused us to add
intermediate values to the underlying object list if a PHI or select
instruction referenced the same underlying object twice. The test
changes are all instances of this situation and we now correctly derive
`memory(none)` for the functions that only access stack memory.
---------
Co-authored-by: Shilei Tian <i@tianshilei.me>
When copying map entries, we might run into resizing and invalidate the
RHS of the assignment. We dealt with this before and now use the proper
helper to avoid the problem in another place.
Fixes: https://github.com/llvm/llvm-project/issues/104397
When we check if an access can be skipped, there is a case that an
inter-procedural interference access exists after a dominant write.
Currently we
rely on `AAInterFnReachability` to tell if the access can be reachable.
If it is
not, we can safely skip the access. However, it is based on an
assumption that
the AA exists. It is possible that the AA doesn't exist. In this case,
we can't
safely assume the acess can be skipped because we have to assume the
access can
reach. This can happen when `AAInterFnReachability` is not in the
allowed AA
list when creating the attributor, such as AMDGPUAttributor.
Co-authored-by: Mark de Wever <koraq@xs4all.nl>
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
The old use of must-be-executed-context (MBEC) did propagate
through calls even if that was not allowed. We now only propagate from
call site arguments. If there are calls/intrinsics that allows
propagation, we need to add them explicitly.
Fixes: https://github.com/llvm/llvm-project/issues/78507
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
If the calling function has the null_pointer_is_valid attribute, somehow
a null constant reaches here. I'm not sure why exactly, it doesn't
happen for other types of constants.
Fixes#87856
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
CBBB will keep same after the first iteration so
registerManifestAddedBasicBlock would always register the same basic
block later.
Co-authored-by: laichunfeng <laichunfeng@tencent.com>
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
The use of std::pair makes the values it holds opaque. Using classes
improves this while keeping the POD aspect of a std::pair. As a nice
addition, the "known" functions held inappropriately in the Visitor
classes can now properly reside in the value classes. :-)
For the two remaining uses that did not explicitly specify it,
set UndefAllowed=false. In both cases, I believe that treating
undef as a full range is the correct behavior.