llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	8a7413b141	Fix MSVC "not all control paths return a value" warning. NFC. (#182262 )	2026-02-19 14:17:24 +00:00
Matt Arsenault	fdc4274e2f	AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135 ) If a float-typed call site is marked with afn, replace the 4 flavors of pow with a faster variant. This transforms pow, powr, pown, and rootn to __pow_fast, __powr_fast, __pown_fast, and __rootn_fast if available. Also attempts to handle all of the same basic folds on the new fast variants that were already performed with the base forms. This maintains optimizations with OpenCL when the device libs unsafe math control library is deleted. This maintains the status quo of how libcalls work, and only handles 4 new entry points. This only helps with the elimination of the control library, and not general libcall emission problems. This makes no practical difference for HIP, which is the status quo for libcall optimizations. AMDGPULibCalls recognizes the OpenCL mangled names. e.g., OpenCL float "pow" is really _Z3powff but the HIP provided function "powf" is really named _ZL4powfff, and std::pow with float is _ZL3powff. The pass still runs for HIP, so by accident if you used the OpenCL mangled function names, this would trigger. Since the functions cannot yet be relied on from the library, introduce a temporary module flag check. I'm not planning on emitting it anywhere and it's a poor substitute for versioning the target.	2026-02-19 11:49:32 +01:00
Matt Arsenault	3b5ed86983	AMDGPU: Libcall expand fast pow/powr/pown/rootn for float case (#180553 ) This is to eliminate the special case global unsafe math options in these functions from the library. The core operation only uses about 4 instructions, and then there's an additional prolog and/or epilog to fixup special cases. I have an alternative patch which implements this by using separate entrypoints in the library, and having the pass replace the calls instead of this full handling. However, given the unfortunate state of library development, it requires a full year to make cross project changes. This is the most expedient path to deleting the control library; in the future we can do libcall emission when compiler has the real ability to properly emit new calls. This is mostly a direct port of these functions: https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/ocml/src/powF_base.h I used copilot to do the heavy lifting on the drudgery of writing out all the IRBuilder calls. It mostly worked, though it made mistakes in porting != (used the ONE instead of UNE), plus some API usage errors. The exact output isn't exactly the same. The ordering of some instructions is different and some conditions and selects are inverted. This expansion also preserves the flags, so the finite only case optimizes better than with the library code (this is largely because fast math flags aren't propagated yet, except for the few places SimplifyDemandedFPClass is called). Alive2 verifies the library function with the special fast math path functions have the same behavior as the result of this expansion. afn only case: https://alive2.llvm.org/ce/z/uoPBC2 I did not bother trying to specially optimize the finite only cases here. They get cleaned up by instcombine anyway, so doing so would only be a hypothetical compile time improvement for a lot of additional complexity. The exception would be existing, overlapped code in the pass. There's some overlap here with existing code in the pass. The afn+ninf+nnan case for powr already does the rewrite to the 4 instruction core sequence, which takes precedence over this. In the future this should be merged, but it's more annoying now since the old path also handles it for the f64 case via libcall emission since the intrinsic won't work.	2026-02-18 20:52:49 +01:00
Matt Arsenault	82799a448e	Reapply "AMDGPU: Use real copysign in fast pow (#97152 )" (#178036 ) This reverts commit bff619f91015a633df659d7f60f842d5c49351df. This was reverted due to regressions caused by poor copysign optimization, which have been fixed.	2026-02-05 20:40:38 +00:00
Matt Arsenault	8461579298	AMDGPU: Add nofpclass when expanding pow (#177933 ) The codegen regression is tracked in #177913	2026-02-05 07:40:21 +01:00
Matt Arsenault	a25c7d7ade	ValueTracking: Extract isKnownIntegral out of AMDGPU (#177912 ) Also do some basic conversions to use SimplifyQuery and add tests to show assume works in a new context.	2026-01-26 19:55:14 +01:00
Jameson Nash	ba2bd3fbba	Use AllocaInst::getAllocationSize instead of manual size calculations (#176486 ) Replace patterns that manually compute allocation sizes by multiplying getTypeAllocSize(getAllocatedType()) by the array size with calls to the getAllocationSize(DL) API, which handles this correctly and concisely, returning nullopt for VLAs. This fixes several places that were not accounting for array allocations when computing sizes, simplifies code that was doing this manually, and adds some explicit isFixed checks where implied convert was being used. This PR is because now that we have opaque pointers, I hate that some AllocaInst still has type information being consumed by some passes instead of just using the size, since passes rarely handle that type information well or correctly. I hope this will grow into a sequence of commits to slowly eliminate uses of getAllocatedType from AllocaInst. And similarly later to remove type information from GlobalValue too (it can be replaced with just dereferenceable bytes, similar to arguments). Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 09:55:52 -05:00
Mehdi Amini	efd9dc83f2	Revert "[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959 )" (#172325 ) This reverts commit 4190d576823c18f45ee0632baee7d798448178ac. See https://lab.llvm.org/buildbot/#/builders/181/builds/33524 ``` from /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/lib/Support/APFloat.cpp:32: /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/math/asin_utils.h:548:1: in ‘constexpr’ expansion of ‘__llvm_libc::fputil::DyadicFloat<128>(__llvm_libc::Sign::POS, -127, __llvm_libc::BigInt<128, false, long unsigned int>(__llvm_libc::operator""_u128(((const char)"0x80000000\'00000000\'00000000\'00000000"))))’ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/FPUtil/dyadic_float.h:109:5: in ‘constexpr’ expansion of ‘((__llvm_libc::fputil::DyadicFloat<128>)this)->__llvm_libc::fputil::DyadicFloat<128>::normalize()’ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/FPUtil/dyadic_float.h:118:16: in ‘constexpr’ expansion of ‘((__llvm_libc::fputil::DyadicFloat<128>)this)->__llvm_libc::fputil::DyadicFloat<128>::mantissa.__llvm_libc::BigInt<128, false, long unsigned int>::operator<<=(((size_t)shift_length))’ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/big_int.h:829:52: in ‘constexpr’ expansion of ‘__llvm_libc::multiword::shift<__llvm_libc::multiword::LEFT, false, long unsigned int, 2>(((__llvm_libc::BigInt<128, false, long unsigned int>)this)->__llvm_libc::BigInt<128, false, long unsigned int>::val, s)’ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/big_int.h:264:35: error: ‘constexpr __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> __llvm_libc::cpp::bit_cast(const From&) [with To = __int128 unsigned; From = __llvm_libc::cpp::array<long unsigned int, 2>; __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> = __int128 unsigned]’ called in a constant expression 264 \| auto tmp = cpp::bit_cast<type>(array); \| ~~~~~~~~~~~~~~~~~~~^~~~~~~ from /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/lib/Support/APFloat.cpp:32: /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/CPP/bit.h:48:1: note: ‘constexpr __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> __llvm_libc::cpp::bit_cast(const From&) [with To = __int128 unsigned; From = __llvm_libc::cpp::array<long unsigned int, 2>; __llvm_libc::cpp::enable_if_t<((((sizeof (To) == sizeof (From)) && __llvm_libc::cpp::is_trivially_constructible<To>::value) && __llvm_libc::cpp::is_trivially_copyable<T>::value) && __llvm_libc::cpp::is_trivially_copyable<From>::value), To> = __int128 unsigned]’ is not usable as a ‘constexpr’ function because: 48 \| bit_cast(const From &from) { \| ^~~~~~~~ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/llvm/../libc/src/__support/CPP/bit.h:56:28: error: call to non-‘constexpr’ function ‘void __llvm_libc::cpp::inline_copy(const char, char) [with unsigned int N = 16]’ 56 \| inline_copy<sizeof(From)>(src, dst); ```	2025-12-15 16:57:37 +01:00
lntue	4190d57682	[APFloat] Add exp function for APFloat::IEEESsingle using expf implementation from LLVM libc. (#143959 ) Discourse RFC: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450 - The implementation in LLVM libc is header-only. - expf implementation in LLVM libc is correctly rounded for all rounding modes. - LLVM libc implementation will round to the floating point environment's rounding mode. - No cmake build dependency between LLVM and LLVM libc, only requires LLVM libc source presents in llvm-project/libc folder.	2025-12-15 10:21:45 -05:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
paperchalice	8bacfb2538	[AMDGPU] Remove `UnsafeFPMath` uses (#151079 ) Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-07-31 17:36:57 +08:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
Kazu Hirata	1e8e662174	[AMDGPU] Remove unused includes (NFC) (#141376 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-05-24 14:48:46 -07:00
Matt Arsenault	04bb8ecb05	AMDGPU: Disable sincos fold for constant inputs (#134579 )	2025-04-07 15:20:23 +07:00
Matt Arsenault	b446c208a5	AMDGPU: Verify function type matches when matching libcalls (#119043 ) Previously this would recognize a call to a mangled ldexp(float, float) as a candidate to replace with the intrinsic. We need to verify the second parameter is in fact an integer. Fixes: SWDEV-501389	2024-12-16 15:01:48 +09:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
goldsteinn	c85611e858	[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649 ) In a variety of places we change the bitwidth of a parameter but don't update the attributes. The issue in this case is from the `range` attribute when inlining `__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an `i8`, and if the `i32` had a `range` attr assosiated it will cause an error. Fixes #112633	2024-10-17 10:32:55 -05:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	63a1242ae3	[AMDGPU] clang-tidy: define trivial constructors with = default. NFC.	2024-07-16 15:41:54 +01:00
Matt Arsenault	bff619f910	Revert "AMDGPU: Use real copysign in fast pow (#97152 )" This reverts commit d3e7c4ce7a3d7f08cea02cba8f34c590a349688b.	2024-07-01 20:54:50 +02:00
Matt Arsenault	d3e7c4ce7a	AMDGPU: Use real copysign in fast pow (#97152 ) Previously this would introduce some codegen regressions, but those have been avoided by simplifying demanded bits on copysign operations.	2024-07-01 20:16:22 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Matt Arsenault	b932da16b7	AMDGPU: Fix vector handling in pown libcall simplification (#95832 ) The isIntegerTy check would not work as you would hope in the vector case.	2024-06-18 19:17:42 +02:00
Matt Arsenault	dab1f7c8d3	AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863 ) With the contract flag we should end up codegening to the rsqrt instruction, or denormal corrected rsqrt sequence present in the library.	2024-05-21 18:42:45 +02:00
Matt Arsenault	66b76faffb	AMDGPU: Directly emit sqrt intrinsic when folding rootn(x, 2) (#92598 ) This avoids depending on pre/post link runs. Depends #92595	2024-05-21 07:57:04 +02:00
Matt Arsenault	3cb1fe60fb	AMDGPU: Don't fold rootn(x, 1) to input for strictfp functions (#92595 ) We need to insert a constrained canonicalize. Depends #92594	2024-05-20 22:23:02 +02:00
Matt Arsenault	586ecd7560	AMDGPU: Relax vector restriction for rootn libcall folds (#92594 ) We could try harder for nonsplat vectors but probably not worth the effort.	2024-05-20 18:36:17 +02:00
Matt Arsenault	48b23c09c0	AMDGPU: Handle undef correctly in isKnownIntegral (#92566 )	2024-05-17 20:51:27 +02:00
Nikita Popov	1baa385065	[IR][PatternMatch] Only accept poison in getSplatValue() (#89159 ) In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.	2024-04-18 15:44:12 +09:00
Kevin P. Neal	f5296df97c	[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705 ) The AMDGPUSimplifyLibCalls pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Replacing non-constrained calls with constrained calls when required is still on the IRBuilder's TODO list.	2024-03-27 10:20:00 -04:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Matt Arsenault	daecc303bb	AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197 ) The library implementation is just a wrapper around a call to the intrinsic, but loses metadata. Swap out the call site to the intrinsic so that the lowering can see the !fpmath metadata and fast math flags. Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing !fpmath on OpenCL library sqrt calls. Also don't bother emitting native_sqrt anymore, it's just another wrapper around llvm.sqrt.	2024-01-09 15:13:58 +07:00
Jakub Chlanda	a34db9bdef	[AMDGPU][NFC] Simplify needcopysign logic (#75176 ) This was caught by coverity, reported as: `dead_error_condition`. Since the conditional revolves around `CF`, it is guaranteed to be null in the else clause, hence making the second part of the statement redundant.	2023-12-18 12:07:22 +01:00
Youngsuk Kim	67aec2f58b	[llvm] Remove no-op ptr-to-ptr casts (NFC) Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).	2023-12-15 11:04:48 -06:00
Matt Arsenault	ee795fd1cf	AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral https://reviews.llvm.org/D158999	2023-09-01 08:22:16 -04:00
Matt Arsenault	def228553c	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998	2023-09-01 08:22:16 -04:00
Matt Arsenault	deefda7074	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997	2023-09-01 08:22:16 -04:00
Matt Arsenault	dac8f974b5	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996	2023-09-01 08:22:16 -04:00
Matt Arsenault	699685b718	AMDGPU: Enable assumptions in AMDGPULibCalls https://reviews.llvm.org/D159006	2023-09-01 08:22:16 -04:00
Matt Arsenault	a45b787c91	AMDGPU: Turn pow libcalls into powr powr is just pow with the assumption that x >= 0, otherwise nan. This fires at least 6 times in luxmark https://reviews.llvm.org/D158908	2023-09-01 08:22:16 -04:00
Matt Arsenault	f5d8a9b1bb	AMDGPU: Simplify handling of constant vectors in libcalls Also fixes not handling the partially undef case. https://reviews.llvm.org/D158905	2023-09-01 08:22:16 -04:00
Matt Arsenault	afb24cbb69	AMDGPU: Don't require all flags to expand fast powr This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only needs afn nnan and ninf. https://reviews.llvm.org/D158904	2023-09-01 08:22:16 -04:00
Matt Arsenault	bfe6bc05cd	AMDGPU: Cleanup check for integral exponents in pow folds Also improves undef handling https://reviews.llvm.org/D159006	2023-08-30 10:37:24 -04:00
Matt Arsenault	80e5b46e45	AMDGPU: Fix assertion on half typed pow with constant exponents https://reviews.llvm.org/D158993	2023-08-28 13:54:49 -04:00
Matt Arsenault	35c2a7542c	AMDGPU: Fix asserting on fast f16 pown https://reviews.llvm.org/D158903	2023-08-25 19:56:20 -04:00

1 2 3

130 Commits