There were two implementations of this - one that implemented nextafter
in software, and another that called a clang builtin. No in-tree targets
called the builtin, so all targets build the software version. The
builtin version has been removed, and the software version has been
renamed to be the "default".
This commit also optimizes nextafter, to avoid scalarization as much as
possible. Note however that the (CLC) relational builtins still
scalarize; those will be optimized in a separate commit.
Since nextafter is used by some convert_type builtins, the diff to IR
codegen is not limited to the builtin itself.
Use llvm instrinsic by default
Provide amdgpu workaround
v2: drop old amd copyrights
Reviewer: Aaron Watry
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316588
AMDGPU targets don't have insturction for it,
so it'll be expanded to C * log2 anyway.
v2: use native_log2 instead of the more precise sw implementation
v3: move to amdgpu
v4: drop old AMD copyright
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316587
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
v2: add vload(half) as well
make helpers amdgpu specific (NVPTX uses different private AS numbering)
use clang builtin on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
v2: split into 2 patches
use clang builtins for other intrinsics as well
v3: Fix warnings
Switch r600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
Most files remain in a common amdgpu directory.
Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.
This now requires 3.9/trunk to build amdgcn.
llvm-svn: 260777