We can't assume closed world even in full LTO post-link stage. It is
only true
if we are building a "GPU executable". However, AMDGPU does support
"dyamic
library". I'm not aware of any approach to tell if it is relocatable
link when
we create the pass. For now let's revert the patch as it is currently
breaking things.
We can re-enable it once we can handle it correctly.
This patch removes the conservative uniformity check in the indirect
call
specialization callback, as whether the function pointer is uniform
doesn't
matter too much. Instead, we add an argument to control specialization.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
38 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
SIMachineFunctionInfo has a scan of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156852
AANonNull is now the first AA that is always queried via the new APIs
and not created manually. Others will follow shortly to avoid trivial
AAs whenever possible.
This commit introduced some helper logic that will make it simpler to
port the next one. It also untangles AADereferenceable and AANonNull
such that the former does not keep a handle on the latter. Finally,
we stop deducing `nonnull` for `undef`, which was incorrect.
Before, we checked and manifested attributes right in the IR. This was
bad as we modified the IR before the manifest stage. Now we can
add/remove/inspect attributes w/o going to the IR (except for the
initial query).
This is a partial cleanup to centralize the initialization and update
decisions for AAs. Lifting the burdon and boilerplate on users and
making it harder to accidentally perform unsound deductions.
The two static helpers show how we can lift the decisions to generate an
AA into the Attributor, avoiding trivial AAs that just cost us compile
time and maintenance code (to check for pre-conditions).
This will do a value range merging down the callgraph, unlike the
current pass which can only propagate values to undecorated functions
from a kernel.
This one is a bit weird due to the interaction with the implied range
from amdgpu-flat-workgroup-size. At the default group range of 1,1024,
the minimum implied bounds is 4 so this ends up introducing the
attribute on undecorated functions. We could probably simplify this by
ignoring it and propagating the raw values. The subtarget interaction
and the interaction with amdgpu-flat-workgroup-size only really clamp
invalid values (plus the lower bound doesn't seem to do anything as
far as I can tell anyway).
Summary:
This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.
Reviewers: arsenm
Differential Revision
https://reviews.llvm.org/D143293
Summary:
This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.
For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.
For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.
Reviewer: arsenm
Differential Revision:
https://reviews.llvm.org/D14313
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.
Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.
There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95
Reapplying because this commit is NOT DEPENDENT on the reverted commit
fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot.
See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
Previously reverted in 12696d302d146ffe616eecab3feceba9d29be2db
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
An expression of the form `gep(base, select(pred, const1, const2))` can result
in a set of offsets instead of just one. PointerInfo can now track these sets
instead of conservatively modeling them as Unknown. In general, AAPointerInfo
now uses AAPotentialConstantValues to examine the operands of the GEP.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138646
An expression of the form `gep(base, select(pred, const1, const2))` can result
in a set of offsets instead of just one. PointerInfo can now track these sets
instead of conservatively modeling them as Unknown. In general, AAPointerInfo
now uses AAPotentialConstantValues to examine the operands of the GEP.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138646
This is in preparation for future changes that introduce an actual list of
ranges per Access, to be called a RangeList.
Differential Revision: https://reviews.llvm.org/D138644
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"
This reverts commit 844f6c5d03d58e7ac0c6b838e4a7834ac575ab9b and
4ed0a88cd8a77370073feb270d77a9e8b27bd68c as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80589d1c5a1154b92cc3af9495f8f86c.