130 Commits

Author SHA1 Message Date
Simon Pilgrim
8a7413b141
Fix MSVC "not all control paths return a value" warning. NFC. (#182262) 2026-02-19 14:17:24 +00:00
Matt Arsenault
fdc4274e2f
AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135)
If a float-typed call site is marked with afn, replace the 4
flavors of pow with a faster variant.

This transforms pow, powr, pown, and rootn to __pow_fast,
__powr_fast, __pown_fast, and __rootn_fast if available. Also
attempts to handle all of the same basic folds on the new fast
variants that were already performed with the base forms. This
maintains optimizations with OpenCL when the device libs unsafe
math control library is deleted. This maintains the status quo
of how libcalls work, and only handles 4 new entry points. This
only helps with the elimination of the control library, and not
general libcall emission problems.

This makes no practical difference for HIP, which is the status
quo for libcall optimizations. AMDGPULibCalls recognizes the OpenCL
mangled names. e.g., OpenCL float "pow" is really _Z3powff but the
HIP provided function "powf" is really named _ZL4powfff, and std::pow
with float is _ZL3powff. The pass still runs for HIP, so by accident
if you used the OpenCL mangled function names, this would trigger.

Since the functions cannot yet be relied on from the library,
introduce a temporary module flag check. I'm not planning on emitting
it anywhere and it's a poor substitute for versioning the target.
2026-02-19 11:49:32 +01:00
Matt Arsenault
3b5ed86983
AMDGPU: Libcall expand fast pow/powr/pown/rootn for float case (#180553)
This is to eliminate the special case global unsafe math options
in these functions from the library. The core operation only
uses about 4 instructions, and then there's an additional prolog
and/or epilog to fixup special cases.

I have an alternative patch which implements this by using separate
entrypoints in the library, and having the pass replace the calls
instead of this full handling. However, given the unfortunate state
of library development, it requires a full year to make cross project
changes. This is the most expedient path to deleting the control
library;
in the future we can do libcall emission when compiler has the real
ability to properly emit new calls.

This is mostly a direct port of these functions:

https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/ocml/src/powF_base.h

I used copilot to do the heavy lifting on the drudgery of writing out
all the IRBuilder calls. It mostly worked, though it made mistakes in
porting != (used the ONE instead of UNE), plus some API usage errors.

The exact output isn't exactly the same. The ordering of some
instructions is different and some conditions and selects are inverted. 
This expansion also preserves the flags, so the finite only case optimizes 
better than with the library code (this is largely because fast math flags aren't
propagated yet, except for the few places SimplifyDemandedFPClass is called).

Alive2 verifies the library function with the special fast math path
functions have the same behavior as the result of this expansion. afn
only case: https://alive2.llvm.org/ce/z/uoPBC2

I did not bother trying to specially optimize the finite only cases
here. They get cleaned up by instcombine anyway, so doing so would only
be a hypothetical compile time improvement for a lot of additional complexity.
The exception would be existing, overlapped code in the pass. There's some 
overlap here with existing code in the pass. The afn+ninf+nnan case for powr
already does the rewrite to the 4 instruction core sequence, which takes precedence 
over this. In the future this should be merged, but it's more annoying now since the 
old path also handles it for the f64 case via libcall emission since the  intrinsic won't 
work.
2026-02-18 20:52:49 +01:00
Matt Arsenault
82799a448e
Reapply "AMDGPU: Use real copysign in fast pow (#97152)" (#178036)
This reverts commit bff619f91015a633df659d7f60f842d5c49351df.

This was reverted due to regressions caused by poor copysign
optimization, which have been fixed.
2026-02-05 20:40:38 +00:00
Matt Arsenault
8461579298
AMDGPU: Add nofpclass when expanding pow (#177933)
The codegen regression is tracked in #177913
2026-02-05 07:40:21 +01:00
Matt Arsenault
a25c7d7ade
ValueTracking: Extract isKnownIntegral out of AMDGPU (#177912)
Also do some basic conversions to use SimplifyQuery and add tests to
show assume works in a new context.
2026-01-26 19:55:14 +01:00
Jameson Nash
ba2bd3fbba
Use AllocaInst::getAllocationSize instead of manual size calculations (#176486)
Replace patterns that manually compute allocation sizes by multiplying
getTypeAllocSize(getAllocatedType()) by the array size with calls to the
getAllocationSize(DL) API, which handles this correctly and concisely,
returning nullopt for VLAs.

This fixes several places that were not accounting for array allocations
when computing sizes, simplifies code that was doing this manually, and
adds some explicit isFixed checks where implied convert was being used.

This PR is because now that we have opaque pointers, I hate that some
AllocaInst still has type information being consumed by some passes
instead of just using the size, since passes rarely handle that type
information well or correctly. I hope this will grow into a sequence of
commits to slowly eliminate uses of getAllocatedType from AllocaInst.
And similarly later to remove type information from GlobalValue too (it
can be replaced with just dereferenceable bytes, similar to arguments).

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:55:52 -05:00
Mehdi Amini
efd9dc83f2
Revert "[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959)" (#172325)
This reverts commit 4190d576823c18f45ee0632baee7d798448178ac.

See https://lab.llvm.org/buildbot/#/builders/181/builds/33524

```
                 from /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/lib/Support/APFloat.cpp:32:
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/math/asin_utils.h:548:1:   in ‘constexpr’ expansion of ‘__llvm_libc::fputil::DyadicFloat<128>(__llvm_libc::Sign::POS, -127, __llvm_libc::BigInt<128, false, long unsigned int>(__llvm_libc::operator""_u128(((const char*)"0x80000000\'00000000\'00000000\'00000000"))))’
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/FPUtil/dyadic_float.h:109:5:   in ‘constexpr’ expansion of ‘((__llvm_libc::fputil::DyadicFloat<128>*)this)->__llvm_libc::fputil::DyadicFloat<128>::normalize()’
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/FPUtil/dyadic_float.h:118:16:   in ‘constexpr’ expansion of ‘((__llvm_libc::fputil::DyadicFloat<128>*)this)->__llvm_libc::fputil::DyadicFloat<128>::mantissa.__llvm_libc::BigInt<128, false, long unsigned int>::operator<<=(((size_t)shift_length))’
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/big_int.h:829:52:   in ‘constexpr’ expansion of ‘__llvm_libc::multiword::shift<__llvm_libc::multiword::LEFT, false, long unsigned int, 2>(((__llvm_libc::BigInt<128, false, long unsigned int>*)this)->__llvm_libc::BigInt<128, false, long unsigned int>::val, s)’
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/big_int.h:264:35: error: ‘constexpr __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> __llvm_libc::cpp::bit_cast(const From&) [with To = __int128 unsigned; From = __llvm_libc::cpp::array<long unsigned int, 2>; __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> = __int128 unsigned]’ called in a constant expression
  264 |     auto tmp = cpp::bit_cast<type>(array);
      |                ~~~~~~~~~~~~~~~~~~~^~~~~~~
                 from /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/lib/Support/APFloat.cpp:32:
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/CPP/bit.h:48:1: note: ‘constexpr __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> __llvm_libc::cpp::bit_cast(const From&) [with To = __int128 unsigned; From = __llvm_libc::cpp::array<long unsigned int, 2>; __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> = __int128 unsigned]’ is not usable as a ‘constexpr’ function because:
   48 | bit_cast(const From &from) {
      | ^~~~~~~~
/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/CPP/bit.h:56:28: error: call to non-‘constexpr’ function ‘void __llvm_libc::cpp::inline_copy(const char*, char*) [with unsigned int N = 16]’
   56 |   inline_copy<sizeof(From)>(src, dst);
```
2025-12-15 16:57:37 +01:00
lntue
4190d57682
[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959)
Discourse RFC:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

- The implementation in LLVM libc is header-only.
- expf implementation in LLVM libc is correctly rounded for all rounding
modes.
- LLVM libc implementation will round to the floating point
environment's rounding mode.
- No cmake build dependency between LLVM and LLVM libc, only requires
LLVM libc source presents in llvm-project/libc folder.
2025-12-15 10:21:45 -05:00
Jay Foad
72c69aefba
[AMDGPU] Make use of getFunction and getMF. NFC. (#167872) 2025-11-14 11:00:57 +00:00
paperchalice
8bacfb2538
[AMDGPU] Remove UnsafeFPMath uses (#151079)
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to
clang and the ultimate goal is to remove `resetTargetOptions` method in
`TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-31 17:36:57 +08:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Kazu Hirata
1e8e662174
[AMDGPU] Remove unused includes (NFC) (#141376)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 14:48:46 -07:00
Matt Arsenault
04bb8ecb05
AMDGPU: Disable sincos fold for constant inputs (#134579) 2025-04-07 15:20:23 +07:00
Matt Arsenault
b446c208a5
AMDGPU: Verify function type matches when matching libcalls (#119043)
Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to verify the second
parameter is in fact an integer.

Fixes: SWDEV-501389
2024-12-16 15:01:48 +09:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jay Foad
c7309dadbf
[AMDGPU] Use range-based for loops. NFC. (#99047) 2024-07-17 10:18:03 +01:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Jay Foad
aeafdc21d2 [AMDGPU] Use using instead of typedef. NFC. 2024-07-16 16:44:12 +01:00
Jay Foad
63a1242ae3 [AMDGPU] clang-tidy: define trivial constructors with = default. NFC. 2024-07-16 15:41:54 +01:00
Matt Arsenault
bff619f910 Revert "AMDGPU: Use real copysign in fast pow (#97152)"
This reverts commit d3e7c4ce7a3d7f08cea02cba8f34c590a349688b.
2024-07-01 20:54:50 +02:00
Matt Arsenault
d3e7c4ce7a
AMDGPU: Use real copysign in fast pow (#97152)
Previously this would introduce some codegen regressions, but
those have been avoided by simplifying demanded bits on copysign
operations.
2024-07-01 20:16:22 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Matt Arsenault
b932da16b7
AMDGPU: Fix vector handling in pown libcall simplification (#95832)
The isIntegerTy check would not work as you would hope in
the vector case.
2024-06-18 19:17:42 +02:00
Matt Arsenault
dab1f7c8d3
AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863)
With the contract flag we should end up codegening to the rsqrt
instruction, or denormal corrected rsqrt sequence present in the
library.
2024-05-21 18:42:45 +02:00
Matt Arsenault
66b76faffb
AMDGPU: Directly emit sqrt intrinsic when folding rootn(x, 2) (#92598)
This avoids depending on pre/post link runs.

Depends #92595
2024-05-21 07:57:04 +02:00
Matt Arsenault
3cb1fe60fb
AMDGPU: Don't fold rootn(x, 1) to input for strictfp functions (#92595)
We need to insert a constrained canonicalize.

Depends #92594
2024-05-20 22:23:02 +02:00
Matt Arsenault
586ecd7560
AMDGPU: Relax vector restriction for rootn libcall folds (#92594)
We could try harder for nonsplat vectors but probably not worth the
effort.
2024-05-20 18:36:17 +02:00
Matt Arsenault
48b23c09c0
AMDGPU: Handle undef correctly in isKnownIntegral (#92566) 2024-05-17 20:51:27 +02:00
Nikita Popov
1baa385065
[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.

This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.

As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.

There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
2024-04-18 15:44:12 +09:00
Kevin P. Neal
f5296df97c
[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705)
The AMDGPUSimplifyLibCalls pass was lowering function calls with the
strictfp attribute to sequences that included function calls incorrectly
lacking the attribute. This patch corrects that.

The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.
Replacing non-constrained calls with constrained calls when required
is still on the IRBuilder's TODO list.
2024-03-27 10:20:00 -04:00
Jeremy Morse
b9d83eff25
[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 16:36:29 +00:00
Yingwei Zheng
930996e9e4
[ValueTracking][NFC] Pass SimplifyQuery to computeKnownFPClass family (#80657)
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.

Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
  %i32 = bitcast float %x to i32
  %cmp = icmp slt i32 %i32, 0
  br i1 %cmp, label %if.then1, label %if.else

if.then1:
  %fneg = fneg float %x
  br label %if.end

if.else:
  br i1 %cond, label %if.then2, label %if.end

if.then2:
  br label %if.end

if.end:
  %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
  %ret = call float @llvm.fabs.f32(float %value)
  ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
2024-02-06 02:30:12 +08:00
Matt Arsenault
daecc303bb
AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.

Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
2024-01-09 15:13:58 +07:00
Jakub Chlanda
a34db9bdef
[AMDGPU][NFC] Simplify needcopysign logic (#75176)
This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the else clause, hence making the second part of the statement
redundant.
2023-12-18 12:07:22 +01:00
Youngsuk Kim
67aec2f58b [llvm] Remove no-op ptr-to-ptr casts (NFC)
Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr
bitcasts.

Opaque ptr cleanup effort (NFC).
2023-12-15 11:04:48 -06:00
Matt Arsenault
ee795fd1cf AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral
https://reviews.llvm.org/D158999
2023-09-01 08:22:16 -04:00
Matt Arsenault
def228553c AMDGPU: Use pown instead of pow if known integral
https://reviews.llvm.org/D158998
2023-09-01 08:22:16 -04:00
Matt Arsenault
deefda7074 AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.org/D158997
2023-09-01 08:22:16 -04:00
Matt Arsenault
dac8f974b5 AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion
https://reviews.llvm.org/D158996
2023-09-01 08:22:16 -04:00
Matt Arsenault
699685b718 AMDGPU: Enable assumptions in AMDGPULibCalls
https://reviews.llvm.org/D159006
2023-09-01 08:22:16 -04:00
Matt Arsenault
a45b787c91 AMDGPU: Turn pow libcalls into powr
powr is just pow with the assumption that x >= 0, otherwise nan. This
fires at least 6 times in luxmark

https://reviews.llvm.org/D158908
2023-09-01 08:22:16 -04:00
Matt Arsenault
f5d8a9b1bb AMDGPU: Simplify handling of constant vectors in libcalls
Also fixes not handling the partially undef case.

https://reviews.llvm.org/D158905
2023-09-01 08:22:16 -04:00
Matt Arsenault
afb24cbb69 AMDGPU: Don't require all flags to expand fast powr
This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.

https://reviews.llvm.org/D158904
2023-09-01 08:22:16 -04:00
Matt Arsenault
bfe6bc05cd AMDGPU: Cleanup check for integral exponents in pow folds
Also improves undef handling

https://reviews.llvm.org/D159006
2023-08-30 10:37:24 -04:00
Matt Arsenault
80e5b46e45 AMDGPU: Fix assertion on half typed pow with constant exponents
https://reviews.llvm.org/D158993
2023-08-28 13:54:49 -04:00
Matt Arsenault
35c2a7542c AMDGPU: Fix asserting on fast f16 pown
https://reviews.llvm.org/D158903
2023-08-25 19:56:20 -04:00