llvm-project

Author	SHA1	Message	Date
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Noah Goldstein	61c06775c9	[KnownBits] Add API for `nuw` flag in `computeForAddSub`; NFC	2024-03-05 12:59:58 -06:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Martin Wehking	4bf06c16fc	Initialize unsigned integer when declared (#81894 ) Initialize ModOpcode directly before the loop execution to silence static analyzer warnings about the usage of an uninitialized variable. This leads to a redundant assignment of ElV2F16 inside the first loop execution, but also avoids superfluous emptiness checks of EltsV2F16 after the first execution of the loop.	2024-02-25 18:26:12 +05:30
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Kazu Hirata	8582d41789	[Target] Use SDValue::getConstantOperandVal (NFC)	2024-01-29 18:46:16 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Jay Foad	ea9d75aa2a	[AMDGPU] Misc formatting fixes. NFC.	2024-01-19 13:50:26 +00:00
Jay Foad	c111dc72e9	[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193 ) https://github.com/llvm/llvm-project/pull/70634 has disabled use of potentially negative scratch offsets, but we can use it on GFX12. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-01-18 10:02:40 +00:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mirko Brkušanin	2adbf254a1	[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785 ) This is used to select the source modifier (neg) from the immediate operand. After a follow up commit this will no longer be DOTIU specific. Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-12 10:57:24 +01:00
Kazu Hirata	1e05236dbd	[Target] Use isNullConstant (NFC)	2024-01-10 20:25:24 -08:00
Alex Bradbury	2d54ec36f7	[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455 ) This is the logical equivalent for #76710 for APInt and uses the same naming scheme. Converted existing users through: `git grep -l "cast<ConstantSDNode>\(.\).getAPIntValueValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`	2024-01-09 14:27:07 +00:00
Alex Bradbury	197214e39b	[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710 ) This follows on from #76708, allowing `cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just `N->getAsZextVal();` Introduced via `git grep -l "cast<ConstantSDNode>\(.\).getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and then using `git clang-format` on the result.	2024-01-09 12:25:17 +00:00
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Alex Bradbury	a181b42565	[llvm][NFC] Use SDValue::getConstantOperandAPInt(i) where possible The helper function allows examples like `cast<ConstantSDNode>(Op.getOperand(0))->getAPIntValue();` to be changed to `Op.getConstantOperandAPInt(0);`. See #76708 for further context. Although there are far fewer opportunities for replacement, I used a similar git grep and sed combo as before, given I already had it to hand: `git grep -l "cast<ConstantSDNode>\(.->getOperand\(.\)\)->getAPIntValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)->getOperand\((.)\)\)->getAPIntValue\(\)/\1->getConstantOperandAPInt(\2)/'` and `git grep -l "cast<ConstantSDNode>\(.\.getOperand\(.\)\)->getAPIntValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)\.getOperand\((.)\)\)->getAPIntValue\(\)/\1.getConstantOperandAPInt(\2)/'`	2024-01-02 14:43:55 +00:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>\(.->getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)->getOperand\((.)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>\(.\.getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)\.getOperand\((.)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Kazu Hirata	28a78e2a4a	[AMDGPU] Use isNullConstant (NFC)	2023-12-07 22:33:46 -08:00
Ruiling, Song	c1511a65d5	[AMDGPU] Folding imm offset in more cases for scratch access (#70634 ) For scratch load/store, our hardware only accept non-negative value in SGPR/VGPR. Besides the case that we can prove from known bits, we can also prove that the value in `base` will be non-negative: 1.) When the ADD for the address calculation has NonUnsignedWrap flag. 2.) When the immediate offset is already negative.	2023-11-29 12:46:45 +08:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Valery Pykhtin	fe6893b1d8	Improve selection of conditional branch on amdgcn.ballot!=0 condition in SelectionDAG. (#68714 ) Improve selection of the following pattern: bool cnd = ... if (amdgcn.ballot(cnd) != 0) { ... } which means "execute _then_ if any lane has satisfied the _cnd_ condition".	2023-11-06 15:16:49 +01:00
Stanislav Mekhanoshin	fe8335babb	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395 ) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781	2023-10-30 08:12:28 -07:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Kazu Hirata	57390c914b	[AMDGPU] Use isNullConstant and isOneConstant (NFC)	2023-08-27 08:26:52 -07:00
Matt Arsenault	9a53f5f5c4	AMDGPU: Handle llvm.stacksave and llvm.stackrestore Not sure if the only valid use is to have stackrestore directly consume stacksave outputs or not. Handled exactly like a regular stack pointer so all the edge cases theoretically should work. https://reviews.llvm.org/D156669	2023-08-11 10:25:01 -04:00
Jay Foad	c2093b8504	[AMDGPU] Add target features for GDS and GWS GFX9 subtargets from GFX90A onwards lack GDS but still have GWS. Differential Revision: https://reviews.llvm.org/D156713	2023-08-02 09:02:07 +01:00
Matt Arsenault	bd203072e6	AMDGPU: Silence a gcc warning	2023-07-22 08:07:49 -04:00
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
David Green	2802739dfd	[NFC] Replace ;; with ;	2023-06-11 10:25:24 +01:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Kazu Hirata	7bb6d1b32e	[llvm] Skip getAPIntValue (NFC) ConstantSDNode provides some convenience functions like isZero, getZExtValue, and isMinSignedValue that are named identically to those provided by APInt, so we can "skip" getAPIntValue.	2023-03-22 22:10:25 -07:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00

1 2 3 4 5 ...

383 Commits