llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	c6e560a25b	[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111 )	2025-07-22 15:24:55 -07:00
Stanislav Mekhanoshin	a0973de745	[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107 ) Also switches immediate offset to signed for the subtarget.	2025-07-22 15:04:52 -07:00
Stanislav Mekhanoshin	a0aebb1935	[AMDGPU] Select scale_offset with SMEM instructions (#150078 )	2025-07-22 13:26:28 -07:00
Chris Jackson	b3e016e05f	Revert "[AMDGPU] Recognise bitmask operations as srcmods" (#150000 ) Reverts llvm/llvm-project#149110 due to various buildbot failures.	2025-07-22 12:16:03 +01:00
Chris Jackson	c51b48be47	[AMDGPU] Recognise bitmask operations as srcmods on integer types (#149110 ) Add to the VOP patterns to recognise when or/xor/and are masking only the most significant bit of i32/v2i32/i64 and replace with the appropriate source modifier.	2025-07-22 11:35:17 +01:00
Stanislav Mekhanoshin	cfa918bec1	[AMDGPU] Select flat GVS atomics on gfx1250 (#149554 )	2025-07-18 12:31:29 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Stanislav Mekhanoshin	a32040e483	[AMDGPU] Use 64-bit literals in codegen on gfx1250 (#148727 )	2025-07-14 15:47:24 -07:00
Kazu Hirata	f1791c0ae3	[AMDGPU] Remove unnecessary casts (NFC) (#148340 ) getRegisterInfo() already returns const SIRegisterInfo . Likewise, getInstrInfo() already returns const SIInstrInfo .	2025-07-12 11:28:41 -07:00
Fabian Ritter	a1edb1dbc6	[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469 ) The last parameter of these functions used to be `Signed`, and it looks like a few calls weren't updated when that was changed to `FlatVariant`. Effectively, the functions were called with `FlatVariant=SALU` due to integer promotions, which doesn't make any sense.	2025-07-08 11:18:36 +02:00
Matt Arsenault	dd4776d429	AMDGPU: Remove AMDGPUInstrInfo class (#144984 ) This was never constructed and only provided one static helper function.	2025-06-20 18:26:56 +09:00
Brox Chen	cd6c4b6103	[AMDGPU][True16][CodeGen] optimize codegen for mad-mix in true16 (#124995 ) remove unnecessary COPY for SDAG for mad-mix pattern	2025-05-05 23:08:03 -04:00
Kazu Hirata	aa15596b5f	[llvm] Remove unused local variables (NFC) (#138478 )	2025-05-04 21:33:54 -07:00
Mariusz Sikora	6b47bba440	[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007 ) New intrinsics / instructions : int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64 Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-17 09:13:20 +01:00
Shilei Tian	dccc0a836c	[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379 ) This is an extension of #131357. Hopefully this would be the last one.	2025-03-14 17:02:15 -04:00
Shilei Tian	51c706c119	[NFC][AMDGPU] Replace direct arch comparison with `isAMDGCN()` (#131357 )	2025-03-14 14:21:44 -04:00
Matt Arsenault	e28e93550a	AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (#123684 ) For VALU shuffles, this saves an instruction in some case.	2025-01-23 20:58:02 +07:00
Jay Foad	f33e3d422d	[AMDGPU] Fix DAG types for V_MAD_I64_I32 and V_MAD_U64_U32. NFC. (#123629 ) These instructions return a 64-bit result and a 1-bit carry, unlike smul_lohi and umul_lohi which return a pair of 32-bit results. This does not appear to make any difference in practice because the DAG types are not used for anything before these nodes are converted to MachineInstrs.	2025-01-20 16:29:23 +00:00
jofrn	c8bbbaa5c7	[SelectionDAG][AMDGPU] Negative offset when selecting scratch sv offsets (#122251 ) APInt will fail when given a negative offset. SelectScratchSVAddr utilizes this function and can be given a negative offset as well, so this change modifies it to use APSInt instead.	2025-01-15 06:56:28 -05:00
Changpeng Fang	68694259b2	AMDGPU: Use getSignedTargetConstant for ImmOffset in SelectScratchSVAddr (#121978 ) ImmOffset is signed and we will hit an assert with negative ImmOffset when getTargetConstant is used. Fixes: SWDEV-506453	2025-01-07 12:02:18 -08:00
Konstantina Mitropoulou	d3508ccd15	[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588 ) - [AMDGPU] Add new test. - [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2024-12-19 11:20:43 -08:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Jay Foad	a161e73fcc	[AMDGPU] Remove unnecessary casts to GCNSubtarget	2024-12-19 15:50:53 +00:00
Matt Arsenault	b4a16a78c2	AMDGPU: Match and Select BITOP3 on gfx950 (#117843 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-11-27 01:31:19 -05:00
Nikita Popov	3317c9ceac	[AMDGPU] Use getSignedConstant() where necessary (#117328 ) Create signed constant using getSignedConstant(), to avoid future assertion failures when we disable implicit truncation in getConstant(). This also touches some generic legalization code, which apparently only AMDGPU tests.	2024-11-25 09:49:34 +01:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Changpeng Fang	e3e7c756fb	AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" (#116115 ) It is not correct to lower "x&(-1>>(32-y))" to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0. However, if we know y is at most of 5 bits (< 32), we can still do the pattern matching.	2024-11-14 12:21:34 -08:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Changpeng Fang	9778fc76e3	Revert "AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372 )" (#116091 ) Based on the suggestion from https://github.com/llvm/llvm-project/pull/115543, we should not do the pattern matching from x << (32-y) >> (32-y) to "bfe x, 0, y" at all. This reverts commits a2bacf8ab58af4c1a0247026ea131443d6066602 and `bdf8e308b7`.	2024-11-13 14:01:21 -08:00
Changpeng Fang	bdf8e308b7	AMDGPU: Don't avoid clamp of bit shift in BFE pattern (#115372 ) Enable pattern matching from "x<<32-y>>32-y" to "bfe x, 0, y" when we know y is in [0,31]. This is the follow-up for the PR: https://github.com/llvm/llvm-project/pull/114279 to fix the issue: https://github.com/llvm/llvm-project/issues/114282	2024-11-07 14:15:33 -08:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Petar Avramovic	83fe85115d	AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256 ) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.	2024-09-30 10:44:59 +02:00
Jay Foad	73b8074e68	[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414 )	2024-09-20 13:45:04 +01:00
Nikita Popov	cee0bf9626	[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413 )	2024-09-20 14:35:38 +02:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Simon Pilgrim	11ba72e651	[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468 )	2024-08-12 10:21:28 +01:00
Matt Arsenault	dd094b2647	NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (#102645 ) This was much more difficult than I anticipated. The pass is not in a good state, with poor test coverage. The legacy PM does seem to be relying on maintaining the map state between different SCCs, which seems bad. The pass is going out of its way to avoid putting the attributes it introduces onto non-callee functions. If it just added them, we could use them directly instead of relying on the map, I would think. The NewPM path uses a ModulePass; I'm not sure if we should be using CGSCC here but there seems to be some missing infrastructure to support backend defined ones.	2024-08-11 15:11:10 +04:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Ivan Kosarev	430cf6537b	[AMDGPU][NFCI] Declare offset0/1 operands to be i32. (#100560 ) Being of type i8 makes them signed, which they aren't, and requires extra work masking them on verbalisation. Part of <https://github.com/llvm/llvm-project/issues/62629>.	2024-07-25 14:32:19 +01:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Jay Foad	bf536cc7db	[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190 ) Mark both the intrinsic and the selected MachineInstr as having side effects to prevent MachineLICM and MachineCSE from moving/removing them.	2024-06-27 09:27:52 +01:00
vangthao95	3aef525aa4	[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165 ) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of https://github.com/llvm/llvm-project/pull/79553.	2024-06-24 14:18:23 -07:00
Jay Foad	90779fdc19	[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.id (#96167 ) Without this SelectionDAG could fail assertions when using the intrinsic in a non-entry BB.	2024-06-20 12:30:34 +01:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00

1 2 3 4 5 ...

438 Commits