llvm-project

Author	SHA1	Message	Date
Leon Clark	2759cfa0c3	[AMDGPU] Remove unnecessary add instructions in ctlz.i8 (#77615 ) Add custom lowering for ctlz.i8 to avoid multiple add/sub operations. --------- Co-authored-by: Leon Clark <leoclark@amd.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2024-01-19 10:16:46 +00:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Matt Arsenault	bdbaf6e61b	AMDGPU: Make v8bf16/v16bf16 legal types (#76678 ) Depends #76217	2024-01-08 18:59:01 +07:00
Matt Arsenault	47685633a7	AMDGPU: Make v4bf16 a legal type (#76217 ) Gets a few code quality improvements. A few cases are worse from losing load narrowing. Depends #76213 #76214 #76215	2024-01-05 08:35:07 +07:00
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>\(.->getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)->getOperand\((.)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>\(.\.getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)\.getOperand\((.)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Acim Maravic	48f36c6e74	[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158 ) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2023-12-25 11:55:20 +01:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Matt Arsenault	db8b85ac58	AMDGPU: Support llvm.exp10 (#65860 )	2023-12-02 21:56:35 +07:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Diana	7f5d59b38d	[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186 ) The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call parameters are bundled up into 2 intrinsic arguments, one for those that should go in the SGPRs (the 3rd intrinsic argument), and one for those that should go in the VGPRs (the 4th intrinsic argument). Both will often be some kind of aggregate. Both instruction selection frameworks have some internal representation for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel, ISD::INTRINSIC_[VOID\|WITH_CHAIN] for DAGISel), but we can't use those because aggregates are dissolved very early on during ISel and we'd lose the inreg information. Therefore, this patch shortcircuits both the IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call from the very start. It tries to use the existing infrastructure as much as possible, by calling into the code for lowering tail calls. This has already gone through a few rounds of review in Phab: Differential Revision: https://reviews.llvm.org/D153761	2023-11-06 12:30:07 +01:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Pierre van Houtryve	40a426fac6	[AMDGPU] Constant fold FMAD_FTZ (#69443 ) Solves #68315	2023-10-19 16:05:51 +02:00
Jay Foad	21c2ba4bdb	[GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal Use LegalizerInfo::isLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D116807	2023-09-27 15:58:01 +01:00
Matt Arsenault	1328a8534b	AMDGPU: Fix handling of -0 in round lowering (#65761 )	2023-09-19 09:14:17 +03:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Kazu Hirata	57390c914b	[AMDGPU] Use isNullConstant and isOneConstant (NFC)	2023-08-27 08:26:52 -07:00
Bjorn Pettersson	a23e01ada7	[AMDGPU] Fix -Wenum-compare warnings Avoiding warnings like this when building with GCC: warning: enumeral mismatch in conditional expression: 'llvm::AMDGPUISD::NodeType' vs 'llvm::ISD::NodeType' [-Wenum-compare]	2023-08-23 14:24:30 +02:00
Diana Picus	26dc284498	[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517	2023-08-21 11:16:17 +02:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Matt Arsenault	9a53f5f5c4	AMDGPU: Handle llvm.stacksave and llvm.stackrestore Not sure if the only valid use is to have stackrestore directly consume stacksave outputs or not. Handled exactly like a regular stack pointer so all the edge cases theoretically should work. https://reviews.llvm.org/D156669	2023-08-11 10:25:01 -04:00
Matt Arsenault	055a7f2512	AMDGPU: Adjust outdated comment	2023-07-31 08:05:13 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Amara Emerson	3a80bdb316	[GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine. This check was unnecessary/incorrect, it was already being done by the target hook default implementation, and the one in the matcher was checking for a completely different thing. This change: 1) Removes the check and updates affected tests which now do some more reassociations. 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse check. Not sure why I didn't do this the first time.	2023-07-10 01:03:12 -07:00
Matt Arsenault	8ee1cc82c9	AMDGPU: Fold out sign bit ops on frexp_exp The sign bit has no impact on the exponent, so strip these away. Saves on the source modifier encoding cost. I left the GlobalISel handling until there's a resolution to issue #62628. We should do this in instcombine too, but legalization should be introducing more frexps than it currently is where this would occur.	2023-07-06 10:26:21 -04:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Matt Arsenault	d9333e360a	Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874. Accidentally pushed wrong patch	2023-06-16 18:13:07 -04:00
Matt Arsenault	1159c670d4	AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp	2023-06-16 18:06:27 -04:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Matt Arsenault	d0923a7739	AMDGPU: Correct constants used in fast math log expansion The division between float constants was done with less precision. Performing the divide in double and truncating to float provides the same value as used in the library fast math expansion.	2023-06-12 21:11:41 -04:00
Matt Arsenault	eccc89b26c	AMDGPU: Add llvm.amdgcn.log intrinsic This will map directly to the hardware instruction which does not handle denormals for f32. This will allow moving the generic intrinsic to be lowered correctly. Also handles selecting the f16 version, but there's no reason to use it over the generic intrinsic.	2023-06-12 21:10:30 -04:00
Matt Arsenault	abff7668ab	AMDGPU: Implement known bits functions for min3/max3/med3	2023-06-10 10:58:44 -04:00
Matt Arsenault	4e4c351ae5	AMDGPU: Avoid endpgm in middle of block for fallback trap lowering. This was inserting an s_endpgm in the middle of the block when it has to be a terminator. Split the block and insert a branch to a new block with the trap if it's not in a terminator position. Fixes verifier error on LDS in function with no trap support (and other trap sources).	2023-06-09 21:04:38 -04:00
Amara Emerson	086601eac2	[GlobalISel] Implement some binary reassociations, G_ADD for now - (op (op X, C1), C2) -> (op X, (op C1, C2)) - (op (op X, C1), Y) -> (op (op X, Y), C1) Some code duplication with the G_PTR_ADD reassociations unfortunately but no easy way to avoid it that I can see. Differential Revision: https://reviews.llvm.org/D150230	2023-06-08 21:14:58 -07:00
Matt Arsenault	c01f284fbb	AMDGPU: Fix regressions in integer mad matching Undo the canonicalize done in 0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c. Restores some regressed matching of integer mad. The selection patterns fo the actual mads don't seem to be properly commuting, so some of the commuted cases are still missed. Fixes: SWDEV-363009	2023-06-08 16:48:47 -04:00
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00

1 2 3 4 5 ...

579 Commits