llvm-project

Author	SHA1	Message	Date
Nikita Popov	3317c9ceac	[AMDGPU] Use getSignedConstant() where necessary (#117328 ) Create signed constant using getSignedConstant(), to avoid future assertion failures when we disable implicit truncation in getConstant(). This also touches some generic legalization code, which apparently only AMDGPU tests.	2024-11-25 09:49:34 +01:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Changpeng Fang	e3e7c756fb	AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" (#116115 ) It is not correct to lower "x&(-1>>(32-y))" to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0. However, if we know y is at most of 5 bits (< 32), we can still do the pattern matching.	2024-11-14 12:21:34 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Changpeng Fang	9778fc76e3	Revert "AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372 )" (#116091 ) Based on the suggestion from https://github.com/llvm/llvm-project/pull/115543, we should not do the pattern matching from x << (32-y) >> (32-y) to "bfe x, 0, y" at all. This reverts commits a2bacf8ab58af4c1a0247026ea131443d6066602 and `bdf8e308b7`.	2024-11-13 14:01:21 -08:00
Changpeng Fang	bdf8e308b7	AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372 ) Enable pattern matching from "x<<32-y>>32-y" to "bfe x, 0, y" when we know y is in [0,31]. This is the follow-up for the PR: https://github.com/llvm/llvm-project/pull/114279 to fix the issue: https://github.com/llvm/llvm-project/issues/114282	2024-11-07 14:15:33 -08:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Petar Avramovic	83fe85115d	AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256 ) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.	2024-09-30 10:44:59 +02:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Nikita Popov	cee0bf9626	[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413 )	2024-09-20 14:35:38 +02:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Simon Pilgrim	11ba72e651	[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468 )	2024-08-12 10:21:28 +01:00
Matt Arsenault	dd094b2647	NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645 ) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.	2024-08-11 15:11:10 +04:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Ivan Kosarev	430cf6537b	[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560 ) Being of type i8 makes them signed, which they aren't, and requires extra work masking them on verbalisation. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-25 14:32:19 +01:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Jay Foad	bf536cc7db	[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190 ) Mark both the intrinsic and the selected MachineInstr as having side effects to prevent MachineLICM and MachineCSE from moving/removing them.	2024-06-27 09:27:52 +01:00
vangthao95	3aef525aa4	[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165 ) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of https://github.com/llvm/llvm-project/pull/79553.	2024-06-24 14:18:23 -07:00
Jay Foad	90779fdc19	[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.id (#96167 ) Without this SelectionDAG could fail assertions when using the intrinsic in a non-entry BB.	2024-06-20 12:30:34 +01:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
paperchalice	7652a59407	Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149 ) - Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly	2024-06-04 08:10:58 +08:00
paperchalice	8917afaf0e	Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146 ) This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.	2024-06-02 14:31:52 +08:00
paperchalice	d2cdc8ab45	[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567 ) Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.	2024-06-02 09:12:33 +08:00
Jay Foad	990bed64fb	[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612 ) This provides access to the special scalar source value SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.	2024-05-22 19:47:59 +01:00
Jay Foad	6eb9e214b3	RFC: [AMDGPU] Check subtarget features for consistency (#86957 ) Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to check subtarget features for consistency and diagnose any inconsistencies. To start with, the implementation just checks that either wavefrontsize32 or wavefrontsize64 is selected. checkSubtargetFeatures is called at the start of instruction selection. This is pretty arbitrary. It is just a convenient point at which we have access to the subtarget that we're going to use for codegenning a particular function.	2024-05-09 11:37:28 +01:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Noah Goldstein	61c06775c9	[KnownBits] Add API for `nuw` flag in `computeForAddSub`; NFC	2024-03-05 12:59:58 -06:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Martin Wehking	4bf06c16fc	Initialize unsigned integer when declared (#81894 ) Initialize ModOpcode directly before the loop execution to silence static analyzer warnings about the usage of an uninitialized variable. This leads to a redundant assignment of ElV2F16 inside the first loop execution, but also avoids superfluous emptiness checks of EltsV2F16 after the first execution of the loop.	2024-02-25 18:26:12 +05:30
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Kazu Hirata	8582d41789	[Target] Use SDValue::getConstantOperandVal (NFC)	2024-01-29 18:46:16 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Jay Foad	ea9d75aa2a	[AMDGPU] Misc formatting fixes. NFC.	2024-01-19 13:50:26 +00:00
Jay Foad	c111dc72e9	[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193 ) https://github.com/llvm/llvm-project/pull/70634 has disabled use of potentially negative scratch offsets, but we can use it on GFX12. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-01-18 10:02:40 +00:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mirko Brkušanin	2adbf254a1	[AMDGPU][NFC] Rename DotIUVOP3PMods to VOP3PModsNeg (#77785 ) This is used to select the source modifier (neg) from the immediate operand. After a follow up commit this will no longer be DOTIU specific. Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-12 10:57:24 +01:00
Kazu Hirata	1e05236dbd	[Target] Use isNullConstant (NFC)	2024-01-10 20:25:24 -08:00

1 2 3 4 5 ...

413 Commits