The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
The Clang declaration of the wave-64 builtin uses "UL" as the return
type, which is interpreted as a 32-bit unsigned integer on Windows. This
emits an incorrect LLVM declaration with i32 return type instead of i64.
The clang declaration needs to be fixed to use "WU" instead.
amdgcn_update_dpp intrinsic (#71139)""
This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes
the lit test clang/test/CodeGenHIP/dpp-const-fold.hip
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.
Fixes: SWDEV-426822, SWDEV-431138
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency.
__AMDGCN_WAVEFRONT_SIZE will be deprecated in the future.
Reviewed by: Matt Arsenault, Johannes Doerfert
Differential Revision: https://reviews.llvm.org/D154207
This is an alternative to currently existing hostcall implementation and uses printf buffer similar to OpenCL,
The data stored in the buffer (i.e the data frame) for each printf call are as follows,
1. Control DWord - contains info regarding stream, format string constness and size of data frame
2. Hash of the format string (if constant) else the format string itself
3. Printf arguments (each aligned to 8 byte boundary)
The format string Hash is generated using LLVM's MD5 Message-Digest Algorithm implementation and only low 64 bits are used.
The implementation still uses amdhsa metadata and hash is stored as part of format string itself to ensure
minimal changes in runtime.
Differential Revision: https://reviews.llvm.org/D150427
This is an ongoing series of commits that are reformatting our
Python code.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D150761
Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.
If WGP mode is not supported, ignore -mno-cumode and emit a warning.
This is needed for implementing device functions like __smid
(312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957))
Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner
Differential Revision: https://reviews.llvm.org/D145343
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.
Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
This is a dirty, dirty hack to workaround bot failures at
-O0. Currently these fields are only used by OpenCL features and
evidently the HIP runtime isn't expecting to see them in HIP
programs. The code objects should be language agnostic, so just force
optimize these out until the runtime is fixed.
This was assuming a direct reference to the global variable. The
constant string is placed in addrspace 4, and has a constexpr
addrspacecast to the generic address space.
Add the ability to put __attribute__((maybe_undef)) on function arguments.
Clang codegen introduces a freeze instruction on the argument.
Differential Revision: https://reviews.llvm.org/D130224
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.
The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D118876
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This patch translates HIP kernels to SPIR-V kernels when the HIP
compilation mode is targeting SPIR-S. This involves:
* Setting Cuda calling convention to CC_OpenCLKernel (which maps to
SPIR_KERNEL in LLVM IR later on).
* Coercing pointer arguments with default address space (AS) qualifier
to CrossWorkGroup AS (__global in OpenCL). HIPSPV's device code is
ultimately SPIR-V for OpenCL execution environment (as
starter/default) where Generic or Function (OpenCL's private) is not
supported as storage class for kernel pointer types. This leaves the
CrossWorkGroup to be the only reasonable choice for HIP buffers.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D109818
Add mapping for CUDA address spaces for HIP to SPIR-V
translation. This change allows HIP device code to be
emitted as valid SPIR-V by mapping unqualified pointers
to generic address space and by mapping __device__ and
__shared__ AS to their equivalent AS in SPIR-V
(CrossWorkgroup and Workgroup, respectively).
Cuda's __constant__ AS is handled specially. In HIP
unqualified pointers (aka "flat" pointers) can point to
__constant__ objects. Mapping this AS to ConstantMemory
would produce to illegal address space casts to
generic AS. Therefore, __constant__ AS is mapped to
CrossWorkgroup.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D108621
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365