Currently, `__constant__` variables do not get unconditionally marked as
`constant` in IR, which seems a bit odd given their definition. This is
generally inconsequential for NVPTX/AMDGPU, since said variables get
emitted in the constant address space for those BEs. However, it is
potentially significant for e.g. HIP-on-SPIR-V cases, as SPIR-V does not
allow casts to/from the constant AS (`UniformConstant`), which forces
`__constant__` variables to be emitted in the global AS, thus making IR
constness meaningful.
Fix codegen of consteval functions returning an empty class, and related
issues
If a class is empty, don't store it to memory: the store might overwrite
useful data. Similarly, if a class has tail padding that might overlap
other fields, don't store the tail padding to memory.
The problem here turned out a bit more general than I initially thought:
basically all uses of EmitAggregateStore were broken. Call lowering had
a method that did mostly the right thing, though: CreateCoercedStore.
Adapt CreateCoercedStore so it always does the conservatively right
thing, and use it for both calls and ConstantExpr.
Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
set incorrectly for empty classes in some cases.
Fixes#93040.
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.
There are two problems with _BitInt prior to this patch:
1. For at least some values of N, we cannot use LLVM's iN for the type
of struct elements, array elements, allocas, global variables, and so
on, because the LLVM layout for that type does not match the high-level
layout of _BitInt(N).
Example: Currently for i128:128 targets correct implementation is
possible either for __int128 or for _BitInt(129+) with lowering to iN,
but not both, since we have now correct implementation of __int128 in
place after a21abc7.
When this happens, opaque [M x i8] types used, where M =
sizeof(_BitInt(N)).
2. LLVM doesn't guarantee any particular extension behavior for integer
types that aren't a multiple of 8. For this reason, all _BitInt types
are now have in-memory representation that is a whole number of bytes.
I.e. for example _BitInt(17) now will have memory layout type i32.
This patch also introduces concept of load/store type and adds an API to
CodeGenTypes that returns the IR type that should be used for load and
store operations. This is particularly useful for the case when a
_BitInt ends up having array of bytes as memory layout type. For
_BitInt(N), let M = sizeof(_BitInt(N)), and let BITS = M * 8. Loads and
stores of iM would both (1) produce far better code from the backends
and (2) be far more optimizable by IR passes than loads and stores of [M
x i8].
Fixes https://github.com/llvm/llvm-project/issues/85139
Fixes https://github.com/llvm/llvm-project/issues/83419
---------
Co-authored-by: John McCall <rjmccall@gmail.com>
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
This enables the AMDGPU specific implementation of `printf` when
compiling for AMDGCN flavoured SPIR-V, the consequence being that the
expansion into ROCDL calls & friends gets expanded before "lowering" to
SPIR-V and gets carried through. The only relatively "novel" aspect is
that the `callAppendStringN` is simplified to take the type of the
passed in arguments, as opposed to querying them from the module. This
is a neutral change since the arguments were passed directly to the
call, without any attempt to cast them, hence the assumption that the
actual types match the formal ones was already baked in.
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
The Clang declaration of the wave-64 builtin uses "UL" as the return
type, which is interpreted as a 32-bit unsigned integer on Windows. This
emits an incorrect LLVM declaration with i32 return type instead of i64.
The clang declaration needs to be fixed to use "WU" instead.
amdgcn_update_dpp intrinsic (#71139)""
This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes
the lit test clang/test/CodeGenHIP/dpp-const-fold.hip
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.
Fixes: SWDEV-426822, SWDEV-431138
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency.
__AMDGCN_WAVEFRONT_SIZE will be deprecated in the future.
Reviewed by: Matt Arsenault, Johannes Doerfert
Differential Revision: https://reviews.llvm.org/D154207
This is an alternative to currently existing hostcall implementation and uses printf buffer similar to OpenCL,
The data stored in the buffer (i.e the data frame) for each printf call are as follows,
1. Control DWord - contains info regarding stream, format string constness and size of data frame
2. Hash of the format string (if constant) else the format string itself
3. Printf arguments (each aligned to 8 byte boundary)
The format string Hash is generated using LLVM's MD5 Message-Digest Algorithm implementation and only low 64 bits are used.
The implementation still uses amdhsa metadata and hash is stored as part of format string itself to ensure
minimal changes in runtime.
Differential Revision: https://reviews.llvm.org/D150427
This is an ongoing series of commits that are reformatting our
Python code.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D150761
Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.
If WGP mode is not supported, ignore -mno-cumode and emit a warning.
This is needed for implementing device functions like __smid
(312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957))
Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner
Differential Revision: https://reviews.llvm.org/D145343
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.
Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
This is a dirty, dirty hack to workaround bot failures at
-O0. Currently these fields are only used by OpenCL features and
evidently the HIP runtime isn't expecting to see them in HIP
programs. The code objects should be language agnostic, so just force
optimize these out until the runtime is fixed.
This was assuming a direct reference to the global variable. The
constant string is placed in addrspace 4, and has a constexpr
addrspacecast to the generic address space.
Add the ability to put __attribute__((maybe_undef)) on function arguments.
Clang codegen introduces a freeze instruction on the argument.
Differential Revision: https://reviews.llvm.org/D130224
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.
The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D118876
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This patch translates HIP kernels to SPIR-V kernels when the HIP
compilation mode is targeting SPIR-S. This involves:
* Setting Cuda calling convention to CC_OpenCLKernel (which maps to
SPIR_KERNEL in LLVM IR later on).
* Coercing pointer arguments with default address space (AS) qualifier
to CrossWorkGroup AS (__global in OpenCL). HIPSPV's device code is
ultimately SPIR-V for OpenCL execution environment (as
starter/default) where Generic or Function (OpenCL's private) is not
supported as storage class for kernel pointer types. This leaves the
CrossWorkGroup to be the only reasonable choice for HIP buffers.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D109818
Add mapping for CUDA address spaces for HIP to SPIR-V
translation. This change allows HIP device code to be
emitted as valid SPIR-V by mapping unqualified pointers
to generic address space and by mapping __device__ and
__shared__ AS to their equivalent AS in SPIR-V
(CrossWorkgroup and Workgroup, respectively).
Cuda's __constant__ AS is handled specially. In HIP
unqualified pointers (aka "flat" pointers) can point to
__constant__ objects. Mapping this AS to ConstantMemory
would produce to illegal address space casts to
generic AS. Therefore, __constant__ AS is mapped to
CrossWorkgroup.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D108621
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365