11 Commits

Author SHA1 Message Date
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Yaxun (Sam) Liu
ac72531043 [Driver] Add -f[no-]offload-uniform-block
By default, clang assumes HIP kernels are launched with uniform block size,
which is the case for kernels launched through triple chevron or
hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute
to HIP kernels to allow the backend to do optimizations on that.

However, in some rare cases, HIP kernels can be launched
through hipExtModuleLaunchKernel where global work size is specified,
which may result in non-uniform block size.

To be able to support non-uniform block size for HIP kernels,
an option `-f[no-]offload-uniform-block is added. This option
is generic for offloading languages. Its default value is on for
CUDA/HIP and off otherwise.

Make -cl-uniform-work-group-size an alias to -foffload-uniform-block.

Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert

Differential Revision: https://reviews.llvm.org/D155213

Fixes: SWDEV-406592
2023-07-27 16:36:02 -04:00
Yaxun (Sam) Liu
053e61d54e Relands "[HIP] Change default --gpu-max-threads-per-block value to 1024"
This reverts commit e384e94fbe7c1d5c89fcdde33ffda04e9802c2ce.
2021-02-12 10:53:59 -05:00
Fangrui Song
fd739804e0 [test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
2020-12-31 00:27:11 -08:00
Yaxun (Sam) Liu
e384e94fbe Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
This reverts commit 187658b8a6112446d9e7797d495bc7542ac83905 due to
AMDGPU backend issues.
2020-10-15 17:25:55 -04:00
Yaxun (Sam) Liu
187658b8a6 Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Recommit 04abbb3a78186aa92809866b43217c32cba90b71
2020-09-28 22:43:17 -04:00
Yaxun (Sam) Liu
62dbb7e54c Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Temporarily revert commit 04abbb3a78186aa92809866b43217c32cba90b71
due to regressions in some HIP apps due backend issues revealed by
this change.

Will re-commit it when backend issues are fixed.
2020-09-02 16:12:28 -04:00
Yaxun (Sam) Liu
04abbb3a78 [HIP] Change default --gpu-max-threads-per-block value to 1024
Differential Revision: https://reviews.llvm.org/D76795
2020-06-03 11:09:22 -04:00
Yaxun (Sam) Liu
0ffb12ca67 [HIP] Mark kernels with uniform-work-group-size=true
Differential Revision: https://reviews.llvm.org/D76076
2020-03-13 06:56:56 -04:00
Yaxun (Sam) Liu
9f2d8b5c0c [HIP] Add option --gpu-max-threads-per-block=n
Add this option to change the default launch bounds.

Differential Revision: https://reviews.llvm.org/D71221
2020-01-07 11:18:00 -05:00
Yaxun Liu
aa24601f98 [CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes
There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.

This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.

Differential Revision: https://reviews.llvm.org/D47958

llvm-svn: 334561
2018-06-12 23:58:59 +00:00