llvm-project

Author	SHA1	Message	Date
Diana Picus	a910a6a8b5	[AMDGPU] AsmPrinter: Unify arg handling (#151672 ) When computing the number of registers required by entry functions, the `AMDGPUAsmPrinter` needs to take into account both the register usage computed by the `AMDGPUResourceUsageAnalysis` pass, and the number of registers initialized by the hardware. At the moment, the way it computes the latter is different for graphics vs compute, due to differences in the implementation. For kernels, all the information needed is available in the `SIMachineFunctionInfo`, but for graphics shaders we would iterate over the `Function` arguments in the `AMDGPUAsmPrinter`. This pretty much repeats some of the logic from instruction selection. This patch introduces 2 new members to `SIMachineFunctionInfo`, one for SGPRs and one for VGPRs. Both will be computed during instruction selection and then used during `AMDGPUAsmPrinter`, removing the need to refer to the `Function` when printing assembly. This patch is NFC except for the fact that we now add the extra SGPRs (VCC, XNACK etc) to the number of SGPRs computed for graphics entry points. I'm not sure why these weren't included before. It would be nice if someone could confirm if that was just an oversight or if we have some docs somewhere that I haven't managed to find. Only one test is affected (its SGPR usage increases because we now take into account the XNACK registers).	2025-08-08 12:00:37 +02:00
Stanislav Mekhanoshin	469863111f	[AMDGPU] Enable CodeGen for v_pk_fma_bf16 (#152578 )	2025-08-07 16:19:59 -07:00
Stanislav Mekhanoshin	abc22f771e	[AMDGPU] Fix buffer addressing mode matching (#152584 ) Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.	2025-08-07 14:23:41 -07:00
Stanislav Mekhanoshin	c2eddec4ff	[AMDGPU] System scope atomics are emulated over PCIe in gfx1250 (#152369 ) HW will emulate unsupported PCIe atomics via CAS loop, we do not need to expand these anymore.	2025-08-06 13:08:12 -07:00
Stanislav Mekhanoshin	b8eb61adc9	[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218 )	2025-08-05 16:25:23 -07:00
Matt Arsenault	1d7a0fa08a	AMDGPU: Move asm constraint physreg parsing to utils (#150903 ) Also fixes an assertion on out of bound physical register indexes.	2025-08-01 16:11:11 +09:00
paperchalice	8bacfb2538	[AMDGPU] Remove `UnsafeFPMath` uses (#151079 ) Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-07-31 17:36:57 +08:00
Kazu Hirata	c6a376371d	[AMDGPU] Remove an unnecessary cast (NFC) (#151279 ) value() already returns uint64_t.	2025-07-30 07:29:57 -07:00
Stanislav Mekhanoshin	3dfd939a16	[AMDGPU] gfx1250 V_{MIN\|MAX}_{I\|U}64 opcodes (#151256 )	2025-07-29 19:13:51 -07:00
Changpeng Fang	3b66d4a987	[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058 )	2025-07-29 08:20:05 -07:00
Daniil Fukalov	e650c4b9ef	[NFC][AMDGPU] Move cmp+select arguments optimization to SIISelLowering. (#150929 ) As requested in #148740.	2025-07-28 22:11:36 +02:00
Changpeng Fang	400ce1a3d3	[AMDGPU] Support AMDGPUClamp for bf16 on gfx1250 (#150663 ) Scalar version uses V_MAX_BF16_PSEUDO which is expanded to V_PK_MAX_BF16 with unused high bits. If V_PK_MAX_BF16 is produced directly instead that creates problem with folding of the clamp into other scalar instructions due to incompatible clamp bits. FIXME-TRUE16: enable bf16 clamp with true16 --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-25 12:13:06 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin	96e5eed92a	[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493 ) We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.	2025-07-24 13:22:50 -07:00
Stanislav Mekhanoshin	9deb7f6062	[AMDGPU] gfx1250 vmem prefetch target intrinsics and builtins (#150466 )	2025-07-24 12:13:59 -07:00
Changpeng Fang	473bc0d188	[AMDGPU] Support V_FMA_MIX*_BF16 instructions on gfx1250 (#150381 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-24 09:43:49 -07:00
Changpeng Fang	9a563b08e2	[AMDGPU] Support V_PK_MIN3/MAX3_NUM_F16 on gfx1250 (#150326 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-23 15:15:19 -07:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00
Changpeng Fang	bc1f85d234	AMDGPU: Support packed bf16 instructions on gfx1250 (#150283 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-23 12:01:23 -07:00
Shilei Tian	7fc65569c1	[AMDGPU] Mark `amdgcn_tanh` as canonicalized (#150059 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-22 20:03:39 -04:00
Shilei Tian	e801a10b44	[AMDGPU] Add the code generation support for `llvm.[sin/cos].bf16` (#149631 ) This is a partial support because some other instructions have not been upstreamed yet.	2025-07-21 11:01:59 -04:00
Shilei Tian	ba81903196	[gfx1250][SDAG] Lower unsafe bf16 divisions (#149628 ) Co-authored-by: Kosarev, Ivan <Ivan.Kosarev@amd.com>	2025-07-21 10:58:08 -04:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Fabian Ritter	daa6de37ba	[AMDGPU][SDAG] Add target-specific ISD::PTRADD combines (#143673 ) This patch adds several (AMDGPU-)target-specific DAG combines for ISD::PTRADD nodes that reproduce existing similar transforms for ISD::ADD nodes. There is no functional change intended for the existing target-specific PTRADD combine. For SWDEV-516125.	2025-07-18 10:00:54 +02:00
Shoreshen	f761d73265	[AMDGPU] Add freeze for LowerSELECT (#148796 ) Trying to solve https://github.com/llvm/llvm-project/issues/147635 Add freeze for legalizer when breaking i64 select to 2 i32 select. Several tests changed, still need to investigate why. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-18 13:29:33 +08:00
Kazu Hirata	2da59287aa	[Target] Remove unnecessary casts (NFC) (#149342 ) getFunction().getParent() already returns Module *.	2025-07-17 15:24:25 -07:00
Pierre van Houtryve	7d52b72239	[AMDGPU] Compute GISel KnownBits for S_BFE instructions (#141588 ) Next patches in the stack will emit them in the RegBankCombiner. With this, S_BFE instructions will hopefully interfere less with optimizations.	2025-07-16 09:56:45 +02:00
Stanislav Mekhanoshin	2d6534b7da	[AMDGPU] gfx1250 64-bit relocations and fixups (#148951 )	2025-07-15 17:13:42 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Shilei Tian	23ac7b938d	[AMDGPU] Add support for `v_sqrt_bf16` on gfx1250 (#148921 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-15 16:15:47 -04:00
Stanislav Mekhanoshin	a32040e483	[AMDGPU] Use 64-bit literals in codegen on gfx1250 (#148727 )	2025-07-14 15:47:24 -07:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
Kazu Hirata	f1791c0ae3	[AMDGPU] Remove unnecessary casts (NFC) (#148340 ) getRegisterInfo() already returns const SIRegisterInfo . Likewise, getInstrInfo() already returns const SIInstrInfo .	2025-07-12 11:28:41 -07:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
Stanislav Mekhanoshin	d0a4af725e	[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594 ) Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2025-07-08 14:32:44 -07:00
Changpeng Fang	5035d20dcb	AMDGPU: Implement ds_atomic_async_barrier_arrive_b64/ds_atomic_barrier_arrive_rtn_b64 (#146409 ) These two instructions are supported by gfx1250. We define the instructions and implement the corresponding intrinsic and builtin. Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-01 11:08:49 -07:00
Changpeng Fang	1f5f381920	AMDGPU: Implement intrinsic/builtins for gfx1250 load transpose instructions (#146289 )	2025-06-29 14:33:31 -07:00
Fabian Ritter	215e61c088	[AMDGPU][SDAG] Add ISD::PTRADD DAG combines (#142739 ) This patch focuses on generic DAG combines, plus an AMDGPU-target-specific one that is closely connected. The generic DAG combine is based on a part of PR #105669 by rgwott, which was adapted from work by jrtc27, arichardson, davidchisnall in the CHERI/Morello LLVM tree. I added some parts and removed several disjuncts from the reassociation condition: - `isNullConstant(X)`, since there are address spaces where 0 is a perfectly normal value that shouldn't be treated specially, - `(YIsConstant && ZOneUse)` and `(N0OneUse && ZOneUse && !ZIsConstant)`, since they cause regressions in AMDGPU. For SWDEV-516125.	2025-06-26 09:40:04 +02:00
Matt Arsenault	020fefb6af	AMDGPU: Avoid report_fatal_error in image intrinsic lowering (#145201 )	2025-06-26 00:00:36 +09:00
Chinmay Deshpande	3413aa83f3	Revert "[AMDGPU] Implement IR variant of isFMAFasterThanFMulAndFAdd (… (#145580 ) …#121465)" This reverts commit 211bcf67aadb1175af382f55403ae759177281c7.	2025-06-24 16:10:27 -04:00
Matt Arsenault	48155f93dd	CodeGen: Emit error if getRegisterByName fails (#145194 ) This avoids using report_fatal_error and standardizes the error message in a subset of the error conditions.	2025-06-23 16:33:35 +09:00
Matt Arsenault	16607f6437	AMDGPU: Fix typo in argument allocation error message (#145265 )	2025-06-23 16:26:10 +09:00
Aaditya	6a0593b0a3	[AMDGPU] Extend wave reduce intrinsics for i32 type (#126469 ) Currently, wave reduction intrinsics are supported for `umin` and `umax` operations for `i32` type only. This patch extends support for the following operations: `add`, `sub`, `min`, `max`, `and`, `or`, `xor` for `i32` type. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-23 10:31:22 +05:30
Matt Arsenault	ed155ff9f2	AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202 )	2025-06-23 13:09:09 +09:00
Matt Arsenault	584a2c2e7c	AMDGPU: Avoid report_fatal_error for reporting libcalls (#145134 )	2025-06-22 23:10:18 +09:00
Matt Arsenault	f280d3b705	AMDGPU: Avoid report_fatal_error for getRegisterByName subtarget case (#145173 )	2025-06-22 08:19:19 +09:00
Jay Foad	6e86b7e34b	[AMDGPU] Do not replace SALU floating point multiply with VALU-only ldexp (#145048 )	2025-06-20 16:52:43 +01:00
Matt Arsenault	dd4776d429	AMDGPU: Remove AMDGPUInstrInfo class (#144984 ) This was never constructed and only provided one static helper function.	2025-06-20 18:26:56 +09:00
Nicolai Hähnle	3bee9ba015	AMDGPU/GFX12: Fix s_barrier_signal_isfirst for single-wave workgroups (#143634 ) Barrier instructions are no-ops in single-wave workgroups. This includes s_barrier_signal_isfirst, which will leave SCC unmodified. Model this correctly (via an implicit use of SCC) and ensure SCC==1 before the barrier instruction (if the wave is the only one of the workgroup, then it is the first). --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-19 11:22:49 -07:00
Matt Arsenault	c80282d333	AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 (#141903 ) The hardware min/max follow the IR rules with IEEE mode disabled, so we can avoid the canonicalizes of the input. We lose the quieting of a signaling nan if both inputs are nans, but we only require that with strictfp.	2025-06-18 00:27:41 +09:00

1 2 3 4 5 ...

1622 Commits