llvm-project

Author	SHA1	Message	Date
pvanhout	b3b3cb2d2f	[AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason. Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147786	2023-04-14 15:41:26 +02:00
pvanhout	d892521076	[AMDGPU] Break-up large PHIs for DAGISel DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen. This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases. This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users. It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function). This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs. Fixes SWDEV-321581 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D143731	2023-03-28 09:38:47 +02:00
pvanhout	dbebebf6f6	[AMDGPU] Use UniformityAnalysis in CodeGenPrepare A little extra change was needed in UA because it didn't consider InvokeInst and it made call-constexpr.ll assert. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D145358	2023-03-06 13:26:51 +01:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Matt Arsenault	3830e4e58c	AMDGPU: Create poison values instead of undef These placeholders don't care about the finer points on the difference between the two.	2022-11-16 14:47:24 -08:00
Matt Arsenault	838fd611b7	AMDGPU: Fix assertion on <1 x i16> vectors Fixes issue 58331.	2022-10-12 17:25:24 -07:00
Nikita Popov	8e70258b18	[AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands() This function will become fallible once we don't support constant expressions for all binops, so make sure to check the result.	2022-07-04 14:20:23 +02:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Craig Topper	361216f3c4	[AMDGPU] Use ComputeMinSignedBits and KnownBits::countMaxActiveBits to simplify some code. NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D116516	2022-01-03 10:09:51 -08:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	de3038400b	[AMDGPU] Avoid redundant calls to numBits in AMDGPUCodeGenPrepare::replaceMulWithMul24(). The isU24() and isI24() calls numBits to make its decision. This change replaces them with the internal numBits call so that we can use its result for the > 32 bit width cases. Differential Revision: https://reviews.llvm.org/D111864	2021-10-15 19:49:44 +05:30
Abinav Puthan Purayil	0379263f23	[AMDGPU] Fix width check for signed mul24 generation. This changes fixes a case in which the highest set bit of the original result is at bit 31 and sign-extending the mul24 for it would make the result negative. Differential Revision: https://reviews.llvm.org/D111823	2021-10-15 18:53:41 +05:30
Abinav Puthan Purayil	b3c9d84e5a	[AMDGPU] Fix 24-bit mul intrinsic generation for > 32-bit result. The 24-bit mul intrinsics yields the low-order 32 bits. We should only do the transformation if the operands are known to be not wider than 24 bits and the result is known to be not wider than 32 bits. Differential Revision: https://reviews.llvm.org/D111523	2021-10-14 09:00:19 +05:30
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Jay Foad	477b9bc9f7	[AMDGPU] Minor cleanup after D109483. NFC.	2021-09-13 10:27:15 +01:00
Anshil Gandhi	2e5dc4a1ef	[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into llvm.amdgcn.class(x, ~mask). Added LIT tests as well. Differential Revision: https://reviews.llvm.org/D104049	2021-06-18 13:04:12 -06:00
Nikita Popov	9914200393	[CodeGen] Add missing includes (NFC) These currently rely on the IRBuilder.h include in TargetLowering.h. Make them explicit.	2021-06-06 15:48:27 +02:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Simon Pilgrim	1673a08044	SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI. Use forward declarations and move the include down to dependent files that actually use it. This also exposes a number of implicit dependencies on KnownBits.h	2020-09-03 18:33:25 +01:00
Matt Arsenault	75e6f0b3d4	AMDGPU: Add flag to disable promotion of uniform i16 ops This interferes with GlobalISel's much better handling of the situation. This should really be disable for GlobalISel. However, the fallback only re-runs the selection passes, and doesn't go back and rerun any codegen IR passes. I haven't come up with a good solution to this problem.	2020-08-24 14:39:27 -04:00
Jay Foad	f4bd01c191	[AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32 Fix the division/remainder algorithm by adding a second quotient refinement step, which is required in some cases like 0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212). Also document, rewrite and simplify it by ensuring that we always have a lower bound on inv(y), which simplifies the UNR step and the quotient refinement steps. Differential Revision: https://reviews.llvm.org/D83381	2020-07-08 19:14:48 +01:00
Guillaume Chatelet	52911428ef	[Alignment][NFC] Migrate AMDGPU backend to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82743	2020-06-29 11:56:06 +00:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Christopher Tetreault	3254a001fc	[SVE] Remove usages of VectorType::getNumElements() from AMDGPU Reviewers: efriedma, arsenm, david-arm, fpetrogalli Reviewed By: efriedma Subscribers: dmgreen, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79807	2020-05-13 15:57:55 -07:00
Nikita Popov	5fa87ec004	[AMDGPU] Try to determine sign bit during div/rem expansion This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstCombine instead. Of course, this does not hold up if a backend performs non-trivial IR expansions without running a canonicalization pipeline afterwards, which turned up as an issue in the context of AMDGPU div/rem expansion. This patch mitigates the issue by explicitly performing a known bits calculation where it matters. No test changes, as those would only be visible after the other patch lands. Differential Revision: https://reviews.llvm.org/D79596	2020-05-08 10:11:26 +02:00
Florian Hahn	a7aaadc135	[TTI] Clean up includes (NFC). Remove some unnecessary includes, replace some with forward declarations. This also exposed a few places that were missing some includes.	2020-04-19 20:11:59 +01:00
Stanislav Mekhanoshin	44920e8566	[AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision: https://reviews.llvm.org/D77835	2020-04-10 08:20:49 -07:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Nikita Popov	98ed613ccc	[IRBuilder] Avoid passing IRBuilder by value; NFC I've fixed most of these before, but missed some occurrences in targets I don't usually build.	2020-02-17 18:14:47 +01:00
Matt Arsenault	65dbdc329f	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Matt Arsenault	34d9a16e54	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Matt Arsenault	6d4ebada79	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.	2020-02-11 23:11:30 -05:00
Matt Arsenault	b30e122333	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.	2020-02-11 19:01:06 -05:00
Matt Arsenault	92c62582fc	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.	2020-02-11 18:11:39 -05:00
Matt Arsenault	b87e3e2d0d	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.	2020-02-11 18:11:39 -05:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Changpeng Fang	2531535984	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293	2020-01-23 16:57:43 -08:00
Matt Arsenault	dfec702290	AMDGPU: Check for other uses when looking through casted select Fixes mesa regression on ext_transform_feedback-max-varyings	2020-01-23 11:31:24 -05:00
Matt Arsenault	e93e1b621c	AMDGPU: Fix typo	2020-01-22 10:17:46 -05:00
Matt Arsenault	2fe500ab5b	AMDGPU: Look through casted selects to constant fold bin ops The promotion of the uniform select to i32 interfered with this fold.	2020-01-22 10:16:39 -05:00
Matt Arsenault	bcd91778fe	AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch.	2020-01-22 10:16:39 -05:00
Fangrui Song	5721483b64	[AMDGPU] Fix -Wunused-variable after e5823bf806ca9fa6f87583065b3898a2edabce57	2020-01-20 22:41:13 -08:00

1 2

92 Commits