llvm-project

Author	SHA1	Message	Date
Matt Arsenault	b446c208a5	AMDGPU: Verify function type matches when matching libcalls (#119043 ) Previously this would recognize a call to a mangled ldexp(float, float) as a candidate to replace with the intrinsic. We need to verify the second parameter is in fact an integer. Fixes: SWDEV-501389	2024-12-16 15:01:48 +09:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
goldsteinn	c85611e858	[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649 ) In a variety of places we change the bitwidth of a parameter but don't update the attributes. The issue in this case is from the `range` attribute when inlining `__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an `i8`, and if the `i32` had a `range` attr assosiated it will cause an error. Fixes #112633	2024-10-17 10:32:55 -05:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Jay Foad	aeafdc21d2	[AMDGPU] Use using instead of typedef. NFC.	2024-07-16 16:44:12 +01:00
Jay Foad	63a1242ae3	[AMDGPU] clang-tidy: define trivial constructors with = default. NFC.	2024-07-16 15:41:54 +01:00
Matt Arsenault	bff619f910	Revert "AMDGPU: Use real copysign in fast pow (#97152 )" This reverts commit d3e7c4ce7a3d7f08cea02cba8f34c590a349688b.	2024-07-01 20:54:50 +02:00
Matt Arsenault	d3e7c4ce7a	AMDGPU: Use real copysign in fast pow (#97152 ) Previously this would introduce some codegen regressions, but those have been avoided by simplifying demanded bits on copysign operations.	2024-07-01 20:16:22 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Matt Arsenault	b932da16b7	AMDGPU: Fix vector handling in pown libcall simplification (#95832 ) The isIntegerTy check would not work as you would hope in the vector case.	2024-06-18 19:17:42 +02:00
Matt Arsenault	dab1f7c8d3	AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863 ) With the contract flag we should end up codegening to the rsqrt instruction, or denormal corrected rsqrt sequence present in the library.	2024-05-21 18:42:45 +02:00
Matt Arsenault	66b76faffb	AMDGPU: Directly emit sqrt intrinsic when folding rootn(x, 2) (#92598 ) This avoids depending on pre/post link runs. Depends #92595	2024-05-21 07:57:04 +02:00
Matt Arsenault	3cb1fe60fb	AMDGPU: Don't fold rootn(x, 1) to input for strictfp functions (#92595 ) We need to insert a constrained canonicalize. Depends #92594	2024-05-20 22:23:02 +02:00
Matt Arsenault	586ecd7560	AMDGPU: Relax vector restriction for rootn libcall folds (#92594 ) We could try harder for nonsplat vectors but probably not worth the effort.	2024-05-20 18:36:17 +02:00
Matt Arsenault	48b23c09c0	AMDGPU: Handle undef correctly in isKnownIntegral (#92566 )	2024-05-17 20:51:27 +02:00
Nikita Popov	1baa385065	[IR][PatternMatch] Only accept poison in getSplatValue() (#89159 ) In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.	2024-04-18 15:44:12 +09:00
Kevin P. Neal	f5296df97c	[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705 ) The AMDGPUSimplifyLibCalls pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Replacing non-constrained calls with constrained calls when required is still on the IRBuilder's TODO list.	2024-03-27 10:20:00 -04:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Matt Arsenault	daecc303bb	AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197 ) The library implementation is just a wrapper around a call to the intrinsic, but loses metadata. Swap out the call site to the intrinsic so that the lowering can see the !fpmath metadata and fast math flags. Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing !fpmath on OpenCL library sqrt calls. Also don't bother emitting native_sqrt anymore, it's just another wrapper around llvm.sqrt.	2024-01-09 15:13:58 +07:00
Jakub Chlanda	a34db9bdef	[AMDGPU][NFC] Simplify needcopysign logic (#75176 ) This was caught by coverity, reported as: `dead_error_condition`. Since the conditional revolves around `CF`, it is guaranteed to be null in the else clause, hence making the second part of the statement redundant.	2023-12-18 12:07:22 +01:00
Youngsuk Kim	67aec2f58b	[llvm] Remove no-op ptr-to-ptr casts (NFC) Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).	2023-12-15 11:04:48 -06:00
Matt Arsenault	ee795fd1cf	AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral https://reviews.llvm.org/D158999	2023-09-01 08:22:16 -04:00
Matt Arsenault	def228553c	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998	2023-09-01 08:22:16 -04:00
Matt Arsenault	deefda7074	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997	2023-09-01 08:22:16 -04:00
Matt Arsenault	dac8f974b5	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996	2023-09-01 08:22:16 -04:00
Matt Arsenault	699685b718	AMDGPU: Enable assumptions in AMDGPULibCalls https://reviews.llvm.org/D159006	2023-09-01 08:22:16 -04:00
Matt Arsenault	a45b787c91	AMDGPU: Turn pow libcalls into powr powr is just pow with the assumption that x >= 0, otherwise nan. This fires at least 6 times in luxmark https://reviews.llvm.org/D158908	2023-09-01 08:22:16 -04:00
Matt Arsenault	f5d8a9b1bb	AMDGPU: Simplify handling of constant vectors in libcalls Also fixes not handling the partially undef case. https://reviews.llvm.org/D158905	2023-09-01 08:22:16 -04:00
Matt Arsenault	afb24cbb69	AMDGPU: Don't require all flags to expand fast powr This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only needs afn nnan and ninf. https://reviews.llvm.org/D158904	2023-09-01 08:22:16 -04:00
Matt Arsenault	bfe6bc05cd	AMDGPU: Cleanup check for integral exponents in pow folds Also improves undef handling https://reviews.llvm.org/D159006	2023-08-30 10:37:24 -04:00
Matt Arsenault	80e5b46e45	AMDGPU: Fix assertion on half typed pow with constant exponents https://reviews.llvm.org/D158993	2023-08-28 13:54:49 -04:00
Matt Arsenault	35c2a7542c	AMDGPU: Fix asserting on fast f16 pown https://reviews.llvm.org/D158903	2023-08-25 19:56:20 -04:00
Matt Arsenault	b24dab0ec6	AMDGPU: Trim dead includes	2023-08-25 19:55:53 -04:00
Matt Arsenault	66ee794064	AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars to the vector type. https://reviews.llvm.org/D158077	2023-08-16 09:42:26 -04:00
Matt Arsenault	d251761660	AMDGPU: Replace log libcalls with log intrinsics	2023-08-15 10:48:46 -04:00
Matt Arsenault	d45022b094	AMDGPU: Remove special case constant folding of divide We should probably just swap this out for the fdiv, but that's what the implementation is anyway.	2023-08-14 18:36:01 -04:00
Matt Arsenault	483cc21866	AMDGPU: Remove special case folding of sqrt	2023-08-14 18:36:01 -04:00
Matt Arsenault	416f6af976	AMDGPU: Remove special case folding of fma/mad These just get replaced with an intrinsic now. This was also introducing host dependence on the result since it relied on the compiler choice to contract or not.	2023-08-14 18:36:01 -04:00
Matt Arsenault	0eabe65bfb	AMDGPU: Replace ldexp libcalls with intrinsic	2023-08-14 18:36:01 -04:00
Matt Arsenault	f337a77c99	AMDGPU: Replace rounding libcalls with intrinsics	2023-08-14 18:36:01 -04:00
Matt Arsenault	c7876c55ac	AMDGPU: Replace fabs and copysign libcalls with intrinsics Preserves flags and metadata like the other cases.	2023-08-14 18:28:21 -04:00
Matt Arsenault	a70006c4c5	AMDGPU: Replace some libcalls with intrinsics OpenCL loses fast math information by going through libcall wrappers around intrinsics. Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.	2023-08-14 18:20:47 -04:00
Matt Arsenault	f44beecb78	AMDGPU: Try to use private version of sincos if available The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation. https://reviews.llvm.org/D156720	2023-08-14 11:40:04 -04:00
Matt Arsenault	42c6e4209c	AMDGPU: Handle multiple uses when matching sincos Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with. https://reviews.llvm.org/D156707	2023-08-14 11:28:41 -04:00
Matt Arsenault	6dbd458128	AMDGPU: Remove pointless libcall optimization of fma/mad After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case. https://reviews.llvm.org/D156677	2023-08-09 19:37:52 -04:00
Matt Arsenault	6448d5ba58	AMDGPU: Remove pointless libcall recognition of native_{divide\|recip} This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing that, the underlying library implementation should be using those in the first place. Even when the library does use the rcp intrinsics, the backend handles constant folding of those. This was also only performing the folds under overly strict fast-evertyhing-is-required conditions. The one possible plus this gained over linking in the library is if you were using all fast math flags, it would propagate them to the new instructions. We could address this in the library by adding more fast math flags to the native implementations. The constant fold case also had no test coverage. https://reviews.llvm.org/D156676	2023-08-09 18:48:46 -04:00

1 2 3

116 Commits