llvm-project

Author	SHA1	Message	Date
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Petar Avramovic	83fe85115d	AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256 ) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.	2024-09-30 10:44:59 +02:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Nikita Popov	cee0bf9626	[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413 )	2024-09-20 14:35:38 +02:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Simon Pilgrim	11ba72e651	[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468 )	2024-08-12 10:21:28 +01:00
Matt Arsenault	dd094b2647	NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645 ) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.	2024-08-11 15:11:10 +04:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Ivan Kosarev	430cf6537b	[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560 ) Being of type i8 makes them signed, which they aren't, and requires extra work masking them on verbalisation. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-25 14:32:19 +01:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Jay Foad	bf536cc7db	[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190 ) Mark both the intrinsic and the selected MachineInstr as having side effects to prevent MachineLICM and MachineCSE from moving/removing them.	2024-06-27 09:27:52 +01:00
vangthao95	3aef525aa4	[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165 ) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of https://github.com/llvm/llvm-project/pull/79553.	2024-06-24 14:18:23 -07:00
Jay Foad	90779fdc19	[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.id (#96167 ) Without this SelectionDAG could fail assertions when using the intrinsic in a non-entry BB.	2024-06-20 12:30:34 +01:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
paperchalice	7652a59407	Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149 ) - Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly	2024-06-04 08:10:58 +08:00
paperchalice	8917afaf0e	Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146 ) This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.	2024-06-02 14:31:52 +08:00
paperchalice	d2cdc8ab45	[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567 ) Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.	2024-06-02 09:12:33 +08:00
Jay Foad	990bed64fb	[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612 ) This provides access to the special scalar source value SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.	2024-05-22 19:47:59 +01:00
Jay Foad	6eb9e214b3	RFC: [AMDGPU] Check subtarget features for consistency (#86957 ) Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to check subtarget features for consistency and diagnose any inconsistencies. To start with, the implementation just checks that either wavefrontsize32 or wavefrontsize64 is selected. checkSubtargetFeatures is called at the start of instruction selection. This is pretty arbitrary. It is just a convenient point at which we have access to the subtarget that we're going to use for codegenning a particular function.	2024-05-09 11:37:28 +01:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Noah Goldstein	61c06775c9	[KnownBits] Add API for `nuw` flag in `computeForAddSub`; NFC	2024-03-05 12:59:58 -06:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Martin Wehking	4bf06c16fc	Initialize unsigned integer when declared (#81894 ) Initialize ModOpcode directly before the loop execution to silence static analyzer warnings about the usage of an uninitialized variable. This leads to a redundant assignment of ElV2F16 inside the first loop execution, but also avoids superfluous emptiness checks of EltsV2F16 after the first execution of the loop.	2024-02-25 18:26:12 +05:30
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Kazu Hirata	8582d41789	[Target] Use SDValue::getConstantOperandVal (NFC)	2024-01-29 18:46:16 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Jay Foad	ea9d75aa2a	[AMDGPU] Misc formatting fixes. NFC.	2024-01-19 13:50:26 +00:00
Jay Foad	c111dc72e9	[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193 ) https://github.com/llvm/llvm-project/pull/70634 has disabled use of potentially negative scratch offsets, but we can use it on GFX12. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-01-18 10:02:40 +00:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mirko Brkušanin	2adbf254a1	[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785 ) This is used to select the source modifier (neg) from the immediate operand. After a follow up commit this will no longer be DOTIU specific. Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-12 10:57:24 +01:00
Kazu Hirata	1e05236dbd	[Target] Use isNullConstant (NFC)	2024-01-10 20:25:24 -08:00
Alex Bradbury	2d54ec36f7	[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455 ) This is the logical equivalent for #76710 for APInt and uses the same naming scheme. Converted existing users through: `git grep -l "cast<ConstantSDNode>$.$.getAPIntValueValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.*)$->getAPIntValue/\1->getAsAPIntVal/'`	2024-01-09 14:27:07 +00:00
Alex Bradbury	197214e39b	[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710 ) This follows on from #76708, allowing `cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just `N->getAsZextVal();` Introduced via `git grep -l "cast<ConstantSDNode>$.$.getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.*)$->getZExtValue/\1->getAsZExtVal/'` and then using `git clang-format` on the result.	2024-01-09 12:25:17 +00:00
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Alex Bradbury	a181b42565	[llvm][NFC] Use SDValue::getConstantOperandAPInt(i) where possible The helper function allows examples like `cast<ConstantSDNode>(Op.getOperand(0))->getAPIntValue();` to be changed to `Op.getConstantOperandAPInt(0);`. See #76708 for further context. Although there are far fewer opportunities for replacement, I used a similar git grep and sed combo as before, given I already had it to hand: `git grep -l "cast<ConstantSDNode>$.->getOperand\(.$\)->getAPIntValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)->getOperand\((.)$\)->getAPIntValue/\1->getConstantOperandAPInt(\2)/'` and `git grep -l "cast<ConstantSDNode>$.\.getOperand\(.$\)->getAPIntValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)\.getOperand\((.)$\)->getAPIntValue/\1.getConstantOperandAPInt(\2)/'`	2024-01-02 14:43:55 +00:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>$.->getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)->getOperand\((.)$\)->getZExtValue/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>$.\.getOperand\(.$\)->getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>$(.)\.getOperand\((.)$\)->getZExtValue/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00

1 2 3 4 5 ...

407 Commits