94 Commits

Author SHA1 Message Date
Matt Arsenault
daecc303bb
AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.

Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
2024-01-09 15:13:58 +07:00
Jakub Chlanda
a34db9bdef
[AMDGPU][NFC] Simplify needcopysign logic (#75176)
This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the else clause, hence making the second part of the statement
redundant.
2023-12-18 12:07:22 +01:00
Youngsuk Kim
67aec2f58b [llvm] Remove no-op ptr-to-ptr casts (NFC)
Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr
bitcasts.

Opaque ptr cleanup effort (NFC).
2023-12-15 11:04:48 -06:00
Matt Arsenault
ee795fd1cf AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral
https://reviews.llvm.org/D158999
2023-09-01 08:22:16 -04:00
Matt Arsenault
def228553c AMDGPU: Use pown instead of pow if known integral
https://reviews.llvm.org/D158998
2023-09-01 08:22:16 -04:00
Matt Arsenault
deefda7074 AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.org/D158997
2023-09-01 08:22:16 -04:00
Matt Arsenault
dac8f974b5 AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion
https://reviews.llvm.org/D158996
2023-09-01 08:22:16 -04:00
Matt Arsenault
699685b718 AMDGPU: Enable assumptions in AMDGPULibCalls
https://reviews.llvm.org/D159006
2023-09-01 08:22:16 -04:00
Matt Arsenault
a45b787c91 AMDGPU: Turn pow libcalls into powr
powr is just pow with the assumption that x >= 0, otherwise nan. This
fires at least 6 times in luxmark

https://reviews.llvm.org/D158908
2023-09-01 08:22:16 -04:00
Matt Arsenault
f5d8a9b1bb AMDGPU: Simplify handling of constant vectors in libcalls
Also fixes not handling the partially undef case.

https://reviews.llvm.org/D158905
2023-09-01 08:22:16 -04:00
Matt Arsenault
afb24cbb69 AMDGPU: Don't require all flags to expand fast powr
This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.

https://reviews.llvm.org/D158904
2023-09-01 08:22:16 -04:00
Matt Arsenault
bfe6bc05cd AMDGPU: Cleanup check for integral exponents in pow folds
Also improves undef handling

https://reviews.llvm.org/D159006
2023-08-30 10:37:24 -04:00
Matt Arsenault
80e5b46e45 AMDGPU: Fix assertion on half typed pow with constant exponents
https://reviews.llvm.org/D158993
2023-08-28 13:54:49 -04:00
Matt Arsenault
35c2a7542c AMDGPU: Fix asserting on fast f16 pown
https://reviews.llvm.org/D158903
2023-08-25 19:56:20 -04:00
Matt Arsenault
b24dab0ec6 AMDGPU: Trim dead includes 2023-08-25 19:55:53 -04:00
Matt Arsenault
66ee794064 AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls
Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars to the vector
type.

https://reviews.llvm.org/D158077
2023-08-16 09:42:26 -04:00
Matt Arsenault
d251761660 AMDGPU: Replace log libcalls with log intrinsics 2023-08-15 10:48:46 -04:00
Matt Arsenault
d45022b094 AMDGPU: Remove special case constant folding of divide
We should probably just swap this out for the fdiv, but that's what
the implementation is anyway.
2023-08-14 18:36:01 -04:00
Matt Arsenault
483cc21866 AMDGPU: Remove special case folding of sqrt 2023-08-14 18:36:01 -04:00
Matt Arsenault
416f6af976 AMDGPU: Remove special case folding of fma/mad
These just get replaced with an intrinsic now. This was also
introducing host dependence on the result since it relied on the
compiler choice to contract or not.
2023-08-14 18:36:01 -04:00
Matt Arsenault
0eabe65bfb AMDGPU: Replace ldexp libcalls with intrinsic 2023-08-14 18:36:01 -04:00
Matt Arsenault
f337a77c99 AMDGPU: Replace rounding libcalls with intrinsics 2023-08-14 18:36:01 -04:00
Matt Arsenault
c7876c55ac AMDGPU: Replace fabs and copysign libcalls with intrinsics
Preserves flags and metadata like the other cases.
2023-08-14 18:28:21 -04:00
Matt Arsenault
a70006c4c5 AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers
around intrinsics.

Do this to preserve call site flags which are lost when inlining. It's
not safe in general to propagate flags during inline, so avoid dealing
with this by just special casing some of the useful calls.
2023-08-14 18:20:47 -04:00
Matt Arsenault
f44beecb78 AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all
the pointer overloads. An extremely pedantic interpretation of the
spec would suggest only the flat version exists, but the overloads do
exist in the implementation.

https://reviews.llvm.org/D156720
2023-08-14 11:40:04 -04:00
Matt Arsenault
42c6e4209c AMDGPU: Handle multiple uses when matching sincos
Match how the generic implementation handles this. We now will leave
behind the dead other user for later passes to deal with.

https://reviews.llvm.org/D156707
2023-08-14 11:28:41 -04:00
Matt Arsenault
6dbd458128 AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and
fmuladd intrinsics already handle these cases, and with precise flag
handling. This was requiring all fast math flags when we really just
need nsz for the fma(a, b, 0) case.

https://reviews.llvm.org/D156677
2023-08-09 19:37:52 -04:00
Matt Arsenault
6448d5ba58 AMDGPU: Remove pointless libcall recognition of native_{divide|recip}
This was trying to constant fold these calls, and also turn some of
them into a regular fmul/fdiv. There's no point to doing that, the
underlying library implementation should be using those in the first
place. Even when the library does use the rcp intrinsics, the backend
handles constant folding of those. This was also only performing the
folds under overly strict fast-evertyhing-is-required conditions.

The one possible plus this gained over linking in the library is if
you were using all fast math flags, it would propagate them to the new
instructions. We could address this in the library by adding more fast
math flags to the native implementations.

The constant fold case also had no test coverage.

https://reviews.llvm.org/D156676
2023-08-09 18:48:46 -04:00
Matt Arsenault
b0f4b6587a AMDGPU: Delete probably wrong constant folding of expm1
It's not really correct to implement this as exp(x) - 1, it was maybe
OK for the restricted float-as-double case handled here. There's not a
strong reason to special case it with the host function, as the
implementation naturally constant folds anyway. InstSimplify can fully
handle everything in it, so just running the inliner alone with a
constant argument produces the fully constant folded result. This also
had no tests.

https://reviews.llvm.org/D156892
2023-08-04 21:01:44 -04:00
Matt Arsenault
54bda79335 AMDGPU: Simplify and improve sincos matching
The first trivial example I tried failed to merge due to the user scan
logic. Remove the complicated scan of users handling with distance
thresholds, with a same block restriction. The actual expansion of
sincos is basically the same size as sin or cos individually. Copy the
technique the generic optimization uses, which is to just use the
input instruction as the insert point or just insert at the start of
the entry block.

https://reviews.llvm.org/D156706
2023-08-02 17:48:35 -04:00
Matt Arsenault
9a806551a0 AMDGPU: Delete old PM support for libcall passes
This has no reason to run in the codegen pipeline.
2023-08-01 18:22:02 -04:00
Matt Arsenault
5dfdd3494b AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify
It's not a libcall so doesn't really belong here to begin
with. Relying on checking the target name and explicit features isn't
particularly sound either. The library doesn't use the intrinsic
anymore, so it doesn't matter anyway.
2023-08-01 18:20:50 -04:00
Matt Arsenault
3b2f3238a4 AMDGPU: Don't try memory optimizations in libcall optimizer
This was trying to find a loaded value for some reason when looking
for sincos arguments. This is untested and shouldn't be necessary.

https://reviews.llvm.org/D156746
2023-08-01 18:10:22 -04:00
Matt Arsenault
db4d6ef9ef AMDGPU: Directly emit fabs intrinsic instead of new libcall 2023-07-31 19:19:56 -04:00
Matt Arsenault
f63cdfc4cf AMDGPU: Move check of compatible libcall
https://reviews.llvm.org/D156681
2023-07-31 16:47:07 -04:00
Matt Arsenault
c2c22c6c95 AMDGPU: Don't store current instruction in AMDGPULibCalls member
This was adding confusing global state which was shadowed most of the
time.

https://reviews.llvm.org/D156680
2023-07-31 11:44:21 -04:00
Matt Arsenault
8f38138090 AMDGPU: Refactor libcall simplify to help with future refined fast math flag usage
https://reviews.llvm.org/D156678
2023-07-31 11:23:12 -04:00
Matt Arsenault
94d55450d2 AMDGPU: Don't parse name of sin/cos twice in libcall simplify 2023-07-31 11:23:12 -04:00
Matt Arsenault
8a677a7ff0 AMDGPU: Partially respect nobuiltin in libcall simplifier
There are more contexts where it's not handled correctly but this is
the simplest one.

https://reviews.llvm.org/D156682
2023-07-31 10:56:46 -04:00
Matt Arsenault
2dc1a27449 AMDGPU: Some AMDGPULibCalls cleanups
dyn_cast instead of isa+cast, and initialize on declaration.
2023-07-31 10:53:09 -04:00
Matt Arsenault
ab6cd2d498 AMDGPU: Simplify early exit handling for libcall simplify
Early exit on intrinsics and don't duplicate indirect call
checks. Also let the IRBuilder constructor figure out the insert point
rather than doing it manually. Also avoid debug print about trying to
simplify calls in more unhandled scenarios.
2023-07-31 08:18:12 -04:00
Matt Arsenault
360a5d5612 AMDGPU: Remove some typed pointer handling 2023-07-31 08:05:12 -04:00
Nico Weber
90f7f24b20 try to fix build yet more after 16544cbe64b8 2022-09-28 15:40:52 -04:00
Kazu Hirata
ae998555ba [AMDGPU] Remove a redundant variable (NFC)
ArrayRef has operator[], so we don't need to access the contents via
data().
2022-07-23 12:29:05 -07:00
Kazu Hirata
6cbfffb3a3 [AMDGPU] Declare TableRef in terms of ArrayRef (NFC) 2022-07-16 10:56:20 -07:00
Simon Pilgrim
8de7297374 [AMDGPU] Pull out repeated getVecSize() calls. NFC.
This is guaranteed to be evaluated so we can avoid repeated calls.

Helps the static analyzer as it couldn't recognise that each getVecSize() would return the same value.
2022-02-10 16:31:36 +00:00
serge-sans-paille
e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Kazu Hirata
bc360fd83a [AMDGPU] Remove unused declarations fold_exp* and fold_log* (NFC) 2021-12-31 16:50:18 -08:00
Kazu Hirata
5c4b9ea4a7 [AMDGPU] Remove replaceWithNative (NFC)
The function was introduced without any use on Aug 11, 2017 in commit
7f37794ebd2c6c36224597800e4d1e5a99ad80e9.
2021-12-31 16:43:06 -08:00
Kazu Hirata
5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00