32 Commits

Author SHA1 Message Date
Fraser Cormack
9705500582
[libclc] Move nextafter to the CLC library (#124097)
There were two implementations of this - one that implemented nextafter
in software, and another that called a clang builtin. No in-tree targets
called the builtin, so all targets build the software version. The
builtin version has been removed, and the software version has been
renamed to be the "default".

This commit also optimizes nextafter, to avoid scalarization as much as
possible. Note however that the (CLC) relational builtins still
scalarize; those will be optimized in a separate commit.

Since nextafter is used by some convert_type builtins, the diff to IR
codegen is not limited to the builtin itself.
2025-01-23 12:24:16 +00:00
Fraser Cormack
d2d1b5897e
[libclc] Move clcmacro.h to CLC library. NFC (#114845) 2024-11-04 22:00:01 +00:00
Jan Vesely
0ba9339cde Remove redundant OVERRRIDES file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 346086
2018-11-04 00:54:46 +00:00
Jan Vesely
70a270da5f Add initial support for half precision builtins
v2: fix fmax implementation
    use consistent checks for __CLC_FP_SIZE
    add missing TODOs
    fix whitespace in definitions.h
v3: undef ZERO in modf.inc

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 332677
2018-05-17 22:55:30 +00:00
Jan Vesely
b424954682 amdgpu/half_recip: Switch implementation to native_recip
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325061
2018-02-13 22:09:46 +00:00
Jan Vesely
ed28c4458a amdgpu/half_log2: Switch implementation to native_log2
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325060
2018-02-13 22:09:44 +00:00
Jan Vesely
86cbf56a4b amdgpu/half_log10: Switch implementation to native_log10
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325059
2018-02-13 22:09:42 +00:00
Jan Vesely
65fd65efbf amdgpu/half_log: Switch implementation to native_log
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325058
2018-02-13 22:09:41 +00:00
Jan Vesely
2d3b6dfdca amdgpu/half_exp2: Switch implementation to native_exp2
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325057
2018-02-13 22:09:38 +00:00
Jan Vesely
021264c75a amdgpu/half_exp10: Switch implementation to native_exp10
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325056
2018-02-13 22:09:37 +00:00
Jan Vesely
4879dd7471 amdgpu/half_exp: Switch implementation to native_exp
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325055
2018-02-13 22:09:35 +00:00
Jan Vesely
bca92445ba amdgpu/half_sqrt: Switch implementation to native_sqrt
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325054
2018-02-13 22:09:33 +00:00
Jan Vesely
aad28681c2 amdgpu/half_rsqrt: Switch implementation to native_rsqrt
Reviewer: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325053
2018-02-13 22:09:31 +00:00
Jan Vesely
8dc6e98d47 amdgpu: Add workaround for unimplemented llvm.exp intrinsic
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 317935
2017-11-10 22:16:25 +00:00
Jan Vesely
47e093da9b math: Implement native_log10
Use llvm instrinsic by default
Provide amdgpu workaround

v2: drop old amd copyrights

Reviewer: Aaron Watry
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316588
2017-10-25 16:49:22 +00:00
Jan Vesely
9fedbb9d8e amdgpu/math: Don't use llvm instrinsic for native_log
AMDGPU targets don't have insturction for it,
so it'll be expanded to C * log2 anyway.

v2: use native_log2 instead of the more precise sw implementation
v3: move to amdgpu
v4: drop old AMD copyright

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316587
2017-10-25 16:49:17 +00:00
Jan Vesely
3d349ea98e Make image builtins r600/llvm-3.9 only
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
2017-10-10 18:10:21 +00:00
Jan Vesely
1de1444d62 Do not include clc_nextafter header globally
Drop unused clc/math/clc_nextafter.h header

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315190
2017-10-08 19:33:58 +00:00
Jan Vesely
ce29e8cde1 Restore support for llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
2017-09-29 19:06:41 +00:00
Jan Vesely
1fa727d615 Rework atomic ops to use clang builtins rather than llvm asm
reviewer: Aaron Watry

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
2017-09-25 16:07:34 +00:00
Jan Vesely
285d2fb85c Implement vload_half{,n} and vload(half)
v2: add vload(half) as well
    make helpers amdgpu specific (NVPTX uses different private AS numbering)
    use clang builtin on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
2017-09-08 23:59:00 +00:00
Jan Vesely
661ac03a1b vstore: Cleanup and add vstore(half)
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
2017-09-08 23:58:57 +00:00
Jan Vesely
e337b30c7d r600: Cleanup barrier implementation.
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
2017-09-04 15:52:05 +00:00
Matt Arsenault
fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Matt Arsenault
958fce3192 amdgcn: Fix return type of get_num_groups
llvm-svn: 279723
2016-08-25 07:31:40 +00:00
Matt Arsenault
26d9c41ff6 amdgcn: Fix return type for get_global_size
llvm-svn: 279644
2016-08-24 17:52:04 +00:00
Matt Arsenault
220268d177 amdgcn: Fix get_local_size IR return type
llvm-svn: 279350
2016-08-20 00:01:21 +00:00
Jan Vesely
74f02db922 AMDGPU: Use clang intrinsics for workitem builtins
v2: split into 2 patches
    use clang builtins for other intrinsics as well

v3: Fix warnings
    Switch r600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
2016-07-22 17:24:20 +00:00
Matt Arsenault
633d749da7 amdgpu: Use right builtn for rsq
The r600 path has never actually worked sinced double is not implemented
there.

llvm-svn: 276009
2016-07-19 19:02:01 +00:00
Matt Arsenault
b456c6dd56 Replace llvm.AMDGPU.ldexp with llvm.amdgcn.ldexp
It didn't really work on r600 to begin with, which should
get its own intrinsic.

llvm-svn: 275813
2016-07-18 16:42:50 +00:00
Matt Arsenault
45e6eaaa05 amdgcn: Use new workitem intrinsics
llvm-svn: 261042
2016-02-17 00:27:27 +00:00
Matt Arsenault
a48e15c6cb Split sources for amdgcn and r600
Most files remain in a common amdgpu directory.

Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.

This now requires 3.9/trunk to build amdgcn.

llvm-svn: 260777
2016-02-13 01:01:59 +00:00