llvm-project

Author	SHA1	Message	Date
Matt Arsenault	daecc303bb	AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197 ) The library implementation is just a wrapper around a call to the intrinsic, but loses metadata. Swap out the call site to the intrinsic so that the lowering can see the !fpmath metadata and fast math flags. Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing !fpmath on OpenCL library sqrt calls. Also don't bother emitting native_sqrt anymore, it's just another wrapper around llvm.sqrt.	2024-01-09 15:13:58 +07:00
Jakub Chlanda	a34db9bdef	[AMDGPU][NFC] Simplify needcopysign logic (#75176 ) This was caught by coverity, reported as: `dead_error_condition`. Since the conditional revolves around `CF`, it is guaranteed to be null in the else clause, hence making the second part of the statement redundant.	2023-12-18 12:07:22 +01:00
Youngsuk Kim	67aec2f58b	[llvm] Remove no-op ptr-to-ptr casts (NFC) Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).	2023-12-15 11:04:48 -06:00
Matt Arsenault	ee795fd1cf	AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral https://reviews.llvm.org/D158999	2023-09-01 08:22:16 -04:00
Matt Arsenault	def228553c	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998	2023-09-01 08:22:16 -04:00
Matt Arsenault	deefda7074	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997	2023-09-01 08:22:16 -04:00
Matt Arsenault	dac8f974b5	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996	2023-09-01 08:22:16 -04:00
Matt Arsenault	699685b718	AMDGPU: Enable assumptions in AMDGPULibCalls https://reviews.llvm.org/D159006	2023-09-01 08:22:16 -04:00
Matt Arsenault	a45b787c91	AMDGPU: Turn pow libcalls into powr powr is just pow with the assumption that x >= 0, otherwise nan. This fires at least 6 times in luxmark https://reviews.llvm.org/D158908	2023-09-01 08:22:16 -04:00
Matt Arsenault	f5d8a9b1bb	AMDGPU: Simplify handling of constant vectors in libcalls Also fixes not handling the partially undef case. https://reviews.llvm.org/D158905	2023-09-01 08:22:16 -04:00
Matt Arsenault	afb24cbb69	AMDGPU: Don't require all flags to expand fast powr This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only needs afn nnan and ninf. https://reviews.llvm.org/D158904	2023-09-01 08:22:16 -04:00
Matt Arsenault	bfe6bc05cd	AMDGPU: Cleanup check for integral exponents in pow folds Also improves undef handling https://reviews.llvm.org/D159006	2023-08-30 10:37:24 -04:00
Matt Arsenault	80e5b46e45	AMDGPU: Fix assertion on half typed pow with constant exponents https://reviews.llvm.org/D158993	2023-08-28 13:54:49 -04:00
Matt Arsenault	35c2a7542c	AMDGPU: Fix asserting on fast f16 pown https://reviews.llvm.org/D158903	2023-08-25 19:56:20 -04:00
Matt Arsenault	b24dab0ec6	AMDGPU: Trim dead includes	2023-08-25 19:55:53 -04:00
Matt Arsenault	66ee794064	AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars to the vector type. https://reviews.llvm.org/D158077	2023-08-16 09:42:26 -04:00
Matt Arsenault	d251761660	AMDGPU: Replace log libcalls with log intrinsics	2023-08-15 10:48:46 -04:00
Matt Arsenault	d45022b094	AMDGPU: Remove special case constant folding of divide We should probably just swap this out for the fdiv, but that's what the implementation is anyway.	2023-08-14 18:36:01 -04:00
Matt Arsenault	483cc21866	AMDGPU: Remove special case folding of sqrt	2023-08-14 18:36:01 -04:00
Matt Arsenault	416f6af976	AMDGPU: Remove special case folding of fma/mad These just get replaced with an intrinsic now. This was also introducing host dependence on the result since it relied on the compiler choice to contract or not.	2023-08-14 18:36:01 -04:00
Matt Arsenault	0eabe65bfb	AMDGPU: Replace ldexp libcalls with intrinsic	2023-08-14 18:36:01 -04:00
Matt Arsenault	f337a77c99	AMDGPU: Replace rounding libcalls with intrinsics	2023-08-14 18:36:01 -04:00
Matt Arsenault	c7876c55ac	AMDGPU: Replace fabs and copysign libcalls with intrinsics Preserves flags and metadata like the other cases.	2023-08-14 18:28:21 -04:00
Matt Arsenault	a70006c4c5	AMDGPU: Replace some libcalls with intrinsics OpenCL loses fast math information by going through libcall wrappers around intrinsics. Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.	2023-08-14 18:20:47 -04:00
Matt Arsenault	f44beecb78	AMDGPU: Try to use private version of sincos if available The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation. https://reviews.llvm.org/D156720	2023-08-14 11:40:04 -04:00
Matt Arsenault	42c6e4209c	AMDGPU: Handle multiple uses when matching sincos Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with. https://reviews.llvm.org/D156707	2023-08-14 11:28:41 -04:00
Matt Arsenault	6dbd458128	AMDGPU: Remove pointless libcall optimization of fma/mad After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case. https://reviews.llvm.org/D156677	2023-08-09 19:37:52 -04:00
Matt Arsenault	6448d5ba58	AMDGPU: Remove pointless libcall recognition of native_{divide\|recip} This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing that, the underlying library implementation should be using those in the first place. Even when the library does use the rcp intrinsics, the backend handles constant folding of those. This was also only performing the folds under overly strict fast-evertyhing-is-required conditions. The one possible plus this gained over linking in the library is if you were using all fast math flags, it would propagate them to the new instructions. We could address this in the library by adding more fast math flags to the native implementations. The constant fold case also had no test coverage. https://reviews.llvm.org/D156676	2023-08-09 18:48:46 -04:00
Matt Arsenault	b0f4b6587a	AMDGPU: Delete probably wrong constant folding of expm1 It's not really correct to implement this as exp(x) - 1, it was maybe OK for the restricted float-as-double case handled here. There's not a strong reason to special case it with the host function, as the implementation naturally constant folds anyway. InstSimplify can fully handle everything in it, so just running the inliner alone with a constant argument produces the fully constant folded result. This also had no tests. https://reviews.llvm.org/D156892	2023-08-04 21:01:44 -04:00
Matt Arsenault	54bda79335	AMDGPU: Simplify and improve sincos matching The first trivial example I tried failed to merge due to the user scan logic. Remove the complicated scan of users handling with distance thresholds, with a same block restriction. The actual expansion of sincos is basically the same size as sin or cos individually. Copy the technique the generic optimization uses, which is to just use the input instruction as the insert point or just insert at the start of the entry block. https://reviews.llvm.org/D156706	2023-08-02 17:48:35 -04:00
Matt Arsenault	9a806551a0	AMDGPU: Delete old PM support for libcall passes This has no reason to run in the codegen pipeline.	2023-08-01 18:22:02 -04:00
Matt Arsenault	5dfdd3494b	AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify It's not a libcall so doesn't really belong here to begin with. Relying on checking the target name and explicit features isn't particularly sound either. The library doesn't use the intrinsic anymore, so it doesn't matter anyway.	2023-08-01 18:20:50 -04:00
Matt Arsenault	3b2f3238a4	AMDGPU: Don't try memory optimizations in libcall optimizer This was trying to find a loaded value for some reason when looking for sincos arguments. This is untested and shouldn't be necessary. https://reviews.llvm.org/D156746	2023-08-01 18:10:22 -04:00
Matt Arsenault	db4d6ef9ef	AMDGPU: Directly emit fabs intrinsic instead of new libcall	2023-07-31 19:19:56 -04:00
Matt Arsenault	f63cdfc4cf	AMDGPU: Move check of compatible libcall https://reviews.llvm.org/D156681	2023-07-31 16:47:07 -04:00
Matt Arsenault	c2c22c6c95	AMDGPU: Don't store current instruction in AMDGPULibCalls member This was adding confusing global state which was shadowed most of the time. https://reviews.llvm.org/D156680	2023-07-31 11:44:21 -04:00
Matt Arsenault	8f38138090	AMDGPU: Refactor libcall simplify to help with future refined fast math flag usage https://reviews.llvm.org/D156678	2023-07-31 11:23:12 -04:00
Matt Arsenault	94d55450d2	AMDGPU: Don't parse name of sin/cos twice in libcall simplify	2023-07-31 11:23:12 -04:00
Matt Arsenault	8a677a7ff0	AMDGPU: Partially respect nobuiltin in libcall simplifier There are more contexts where it's not handled correctly but this is the simplest one. https://reviews.llvm.org/D156682	2023-07-31 10:56:46 -04:00
Matt Arsenault	2dc1a27449	AMDGPU: Some AMDGPULibCalls cleanups dyn_cast instead of isa+cast, and initialize on declaration.	2023-07-31 10:53:09 -04:00
Matt Arsenault	ab6cd2d498	AMDGPU: Simplify early exit handling for libcall simplify Early exit on intrinsics and don't duplicate indirect call checks. Also let the IRBuilder constructor figure out the insert point rather than doing it manually. Also avoid debug print about trying to simplify calls in more unhandled scenarios.	2023-07-31 08:18:12 -04:00
Matt Arsenault	360a5d5612	AMDGPU: Remove some typed pointer handling	2023-07-31 08:05:12 -04:00
Nico Weber	90f7f24b20	try to fix build yet more after 16544cbe64b8	2022-09-28 15:40:52 -04:00
Kazu Hirata	ae998555ba	[AMDGPU] Remove a redundant variable (NFC) ArrayRef has operator[], so we don't need to access the contents via data().	2022-07-23 12:29:05 -07:00
Kazu Hirata	6cbfffb3a3	[AMDGPU] Declare TableRef in terms of ArrayRef (NFC)	2022-07-16 10:56:20 -07:00
Simon Pilgrim	8de7297374	[AMDGPU] Pull out repeated getVecSize() calls. NFC. This is guaranteed to be evaluated so we can avoid repeated calls. Helps the static analyzer as it couldn't recognise that each getVecSize() would return the same value.	2022-02-10 16:31:36 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Kazu Hirata	bc360fd83a	[AMDGPU] Remove unused declarations fold_exp* and fold_log* (NFC)	2021-12-31 16:50:18 -08:00
Kazu Hirata	5c4b9ea4a7	[AMDGPU] Remove replaceWithNative (NFC) The function was introduced without any use on Aug 11, 2017 in commit 7f37794ebd2c6c36224597800e4d1e5a99ad80e9.	2021-12-31 16:43:06 -08:00
Kazu Hirata	5a667c0e74	[llvm] Use nullptr instead of 0 (NFC) Identified with modernize-use-nullptr.	2021-12-28 08:52:25 -08:00

1 2

94 Commits