llvm-project

Author	SHA1	Message	Date
Matt Arsenault	3e6f3508ad	AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214 )	2024-11-21 15:01:03 -08:00
Matt Arsenault	8c53036146	AMDGPU: Add v_smfmac_i32_16x16x128_i8 for gfx950 (#117213 )	2024-11-21 14:58:11 -08:00
Matt Arsenault	42dd114a46	AMDGPU: Add v_smfmac_f32_32x32x32_bf16 for gfx950 (#117212 )	2024-11-21 14:52:11 -08:00
Matt Arsenault	95ddc1a63b	AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211 )	2024-11-21 14:46:43 -08:00
Matt Arsenault	e50eaa2cf1	AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205 )	2024-11-21 14:43:33 -08:00
Matt Arsenault	2ab178820b	AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202 )	2024-11-21 14:40:30 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Jay Foad	ade0750e35	[AMDGPU] Fix some cache policy checks for GFX12+ (#116396 ) Fix coding errors found by inspection and check that the swz bit still serves to prevent merging of buffer loads/stores on GFX12+.	2024-11-21 08:22:59 +00:00
Matt Arsenault	927032807d	AMDGPU: Handle gfx950 96/128-bit buffer_load_lds (#116681 ) Enforcing this limit in the clang builtin will come later.	2024-11-18 22:01:56 -08:00
Matt Arsenault	50224bd5ba	AMDGPU: Handle gfx950 global_load_lds_* instructions (#116680 ) Define global_load_lds_dwordx3 and global_load_dwordx4. Oddly it seems dwordx2 was skipped.	2024-11-18 21:58:02 -08:00
Matt Arsenault	9eefa922f8	AMDGPU/GlobalISel: Remove getVRegDef null checks in selector (#115530 ) We should be able to assume every virtual register is defined.	2024-11-11 12:58:06 -08:00
Kazu Hirata	10b80ff0cc	[Target] Migrate away from PointerUnion::{is,get,dyn_cast} (NFC) (#115623 ) Note that PointerUnion::{is,get,dyn_cast} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T>	2024-11-09 17:22:57 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Petar Avramovic	7b0d56be1d	AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986 ) Both input and output of ballot are lane-masks: result is lane-mask with 'S32/S64 LLT and SGPR bank' input is lane-mask with 'S1 LLT and VCC reg bank'. Ballot copies bits from input lane-mask for all active lanes and puts 0 for inactive lanes. GlobalISel did not set 0 in result for inactive lanes for non-constant input.	2024-10-11 11:40:27 +02:00
Matt Arsenault	c36f902372	AMDGPU/GlobalISel: Insert m0 initialization before sextload/zextload (#111720 ) Fixes missing m0 initialize for pre-gfx9 targets with local extending loads.	2024-10-10 14:01:49 +04:00
Shilei Tian	48ac846fbc	[AMDGPU][GlobalISel] Align `selectVOP3PMadMixModsImpl` with the `SelectionDAG` counterpart (#110168 ) The current `selectVOP3PMadMixModsImpl` can produce `V_MAD_FIX_F32` instruction that violates constant bus restriction, while its `SelectionDAG` counterpart doesn't. The culprit is in the copy stripping while the `SelectionDAG` version only has a bitcast stripping. This PR simply aligns the two version.	2024-10-08 09:41:24 -04:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Nikita Popov	cee0bf9626	[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413 )	2024-09-20 14:35:38 +02:00
Diana Picus	3356208531	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512 ) This reverts commit `7792b4ae79`. The problem was a conflict with `e55d6f5ea2` "[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (https://github.com/llvm/llvm-project/pull/107889)" which changed the syntax of V_SET_INACTIVE (and thus made my MIR test crash). ...if only we had a merge queue.	2024-09-13 11:54:30 +02:00
Diana Picus	7792b4ae79	Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )"" (#108341 ) Reverts llvm/llvm-project#108173 si-init-whole-wave.mir crashes on some buildbots (although it passed both locally with sanitizers enabled and in pre-merge tests). Investigating.	2024-09-12 10:12:09 +02:00
Diana Picus	703ebca869	Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 )" (#108173 ) This reverts commit `c7a7767fca`. The buildbots failed because I removed a MI from its parent before updating LIS. This PR should fix that.	2024-09-12 09:11:41 +02:00
Vitaly Buka	c7a7767fca	Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054 ) Breaks bots, see #105822. Reverts llvm/llvm-project#105822	2024-09-10 09:51:43 -07:00
Diana Picus	44556e64f2	[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822 ) This intrinsic is meant to be used in functions that have a "tail" that needs to be run with all the lanes enabled. The "tail" may contain complex control flow that makes it unsuitable for the use of the existing WWM intrinsics. Instead, we will pretend that the function starts with all the lanes enabled, then branches into the actual body of the function for the lanes that were meant to run it, and then finally all the lanes will rejoin and run the tail. As such, the intrinsic will return the EXEC mask for the body of the function, and is meant to be used only as part of a very limited pattern (for now only in amdgpu_cs_chain functions): ``` entry: %func_exec = call i1 @llvm.amdgcn.init.whole.wave() br i1 %func_exec, label %func, label %tail func: ; ... stuff that should run with the actual EXEC mask br label %tail tail: ; ... stuff that runs with all the lanes enabled; ; can contain more than one basic block ``` It's an error to use the result of this intrinsic for anything other than a branch (but unfortunately checking that in the verifier is non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if between the intrinsic and the branch). The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now is expanded in si-wqm (which is where SI_INIT_EXEC is handled too); however the information that the function was conceptually started in whole wave mode is stored in the machine function info (hasInitWholeWave). This will be useful in prolog epilog insertion, where we can skip saving the inactive lanes for CSRs (since if the function started with all the lanes active, then there are no inactive lanes to preserve).	2024-09-10 13:24:53 +02:00
Stanislav Mekhanoshin	0745219d4a	[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293 )	2024-09-06 11:41:21 -07:00
Stanislav Mekhanoshin	bd840a4004	[AMDGPU] Add target intrinsic for s_prefetch_data (#107133 )	2024-09-05 15:14:31 -07:00
Changpeng Fang	26b0bef192	AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761 ) Use GCNPat instead of Custom Lowering to select instructions for intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as a predicate to select only when the rounding mode is supported. "as_hw_round_mode : SDNodeXForm" is developed to translate the round modes to the corresponding ones that hardware recognizes.	2024-08-29 11:43:58 -07:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Brox Chen	afd42fb303	[AMDGPU][True16][CodeGen] Support AND/OR/XOR and LDEXP True16 format (#102620 ) Support AND/OR/XOR true16 and LDEXP true/fake16 format. These instructions are previously implemented with fake16 profile. Fixing the implementation. Added a RA hint so that when using 16bit register in a 32bit instruction, try to use the register directly without an extra 16bit move --------- Co-authored-by: guochen2 <guochen2@amd.com>	2024-08-13 12:23:39 -04:00
Simon Pilgrim	11ba72e651	[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468 )	2024-08-12 10:21:28 +01:00
Petar Avramovic	269cefbc02	AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (#102130 ) isExtractHiElt should return new source register instead of returning instruction that defines it. Src = MI.getOperand(0).getReg() is not correct when MI(for example G_UNMERGE_VALUES) defines multiple registers. Refactor existing code to work with source registers only.	2024-08-07 12:13:39 +02:00
Matt Arsenault	42d641ef5c	AMDGPU/GlobalISel: Select all constants in tablegen (#100788 ) This regresses the arbitrary address space pointer case. Ideally we could write a pattern that matches a pointer based only on its size, but using iPTR/iPTRAny seem to not work for this.	2024-07-30 18:31:18 +04:00
Matt Arsenault	b356aa3e2d	AMDGPU/GlobalISel: Partially move constant selection to patterns (#100786 ) This is still relying on the manual code for splitting 64-bit constants, and handling pointers. We were missing some of the tablegen patterns for all immediate types, so this has some side effect DAG path improvements. This also reduces the diff in the 2 selector outputs.	2024-07-30 18:18:16 +04:00
Matt Arsenault	3113bfbfa7	AMDGPU/GlobalISel: Use getSubRegFromChannel (#100732 )	2024-07-26 18:35:35 +04:00
Carl Ritson	62aa596ba1	[AMDGPU] Add no return image_sample intrinsics and instructions (#97542 ) An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to the backend as instruction encoding is unchanged; however, a mechanism is require to allow frontends to communicate that these instructions do not require destination VGPRs and store to memory. Flagging these as stores means they will not be optimized away.	2024-07-20 17:26:58 +09:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Matt Arsenault	2ff22d7485	AMDGPU/GlobalISel: Reorganize select switch cases	2024-06-30 10:28:58 +02:00
Jay Foad	bf536cc7db	[AMDGPU] Fix unwanted LICM/CSE of llvm.amdgcn.pops.exiting.wave.id (#96190 ) Mark both the intrinsic and the selected MachineInstr as having side effects to prevent MachineLICM and MachineCSE from moving/removing them.	2024-06-27 09:27:52 +01:00
vangthao95	3aef525aa4	[AMDGPU] Fix negative immediate offset for unbuffered smem loads (#89165 ) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of https://github.com/llvm/llvm-project/pull/79553.	2024-06-24 14:18:23 -07:00
Matt Arsenault	8520061281	AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590 ) This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.	2024-06-18 18:34:34 +02:00
Matt Arsenault	08d168c56d	AMDGPU/GlobalISel: Use correct type for intrinsic ID	2024-05-30 14:31:19 +02:00
Jay Foad	4e0f8a4919	[AMDGPU] Fix EXPENSIVE_CHECKS failure in #89612	2024-05-23 12:22:25 +01:00
Jay Foad	990bed64fb	[AMDGPU] New intrinsic llvm.amdgcn.pops.exiting.wave.id (#89612 ) This provides access to the special scalar source value SRC_POPS_EXITING_WAVE_ID on GFX9 and GFX10.	2024-05-22 19:47:59 +01:00
Jay Foad	6eb9e214b3	RFC: [AMDGPU] Check subtarget features for consistency (#86957 ) Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to check subtarget features for consistency and diagnose any inconsistencies. To start with, the implementation just checks that either wavefrontsize32 or wavefrontsize64 is selected. checkSubtargetFeatures is called at the start of instruction selection. This is pretty arbitrary. It is just a convenient point at which we have access to the subtarget that we're going to use for codegenning a particular function.	2024-05-09 11:37:28 +01:00
David Stuttard	75e528fdd9	[AMDGPU] Extend zero initialization of return values for TFE (#85759 ) buffer_load instructions that use TFE also need to zero initialize return values similar to how the image instructions currently work. Add support for this with standard zero init of all results + zero init of just TFE flag when enable-prt-strict-null subtarget feature is disabled.	2024-03-25 09:01:46 +00:00
Noah Goldstein	61c06775c9	[KnownBits] Add API for `nuw` flag in `computeForAddSub`; NFC	2024-03-05 12:59:58 -06:00
Petar Avramovic	0d572c41f9	AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426 ) AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs. Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI. Lane mask merging algorithm for GlobalISel is now responsible for selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering. Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.	2024-02-29 15:38:54 +01:00
Martin Wehking	4bf06c16fc	Initialize unsigned integer when declared (#81894 ) Initialize ModOpcode directly before the loop execution to silence static analyzer warnings about the usage of an uninitialized variable. This leads to a redundant assignment of ElV2F16 inside the first loop execution, but also avoids superfluous emptiness checks of EltsV2F16 after the first execution of the loop.	2024-02-25 18:26:12 +05:30
Shilei Tian	9c6a2de24b	[AMDGPU] Clean up functions for checking inline literals (#81282 ) This patch cleans up functions for checking inline literals.	2024-02-15 12:11:51 -05:00
Petar Avramovic	06f711a906	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-02-05 14:07:01 +01:00

1 2 3 4 5 ...

563 Commits