A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
By default, clang assumes HIP kernels are launched with uniform block size,
which is the case for kernels launched through triple chevron or
hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute
to HIP kernels to allow the backend to do optimizations on that.
However, in some rare cases, HIP kernels can be launched
through hipExtModuleLaunchKernel where global work size is specified,
which may result in non-uniform block size.
To be able to support non-uniform block size for HIP kernels,
an option `-f[no-]offload-uniform-block is added. This option
is generic for offloading languages. Its default value is on for
CUDA/HIP and off otherwise.
Make -cl-uniform-work-group-size an alias to -foffload-uniform-block.
Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert
Differential Revision: https://reviews.llvm.org/D155213
Fixes: SWDEV-406592
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
Temporarily revert commit 04abbb3a78186aa92809866b43217c32cba90b71
due to regressions in some HIP apps due backend issues revealed by
this change.
Will re-commit it when backend issues are fixed.
There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.
This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.
Differential Revision: https://reviews.llvm.org/D47958
llvm-svn: 334561