llvm-project

Author	SHA1	Message	Date
Jun Wang	54470176af	[AMDGPU] Add inreg support for SGPR arguments (#67182 ) Function parameters marked with inreg are supposed to be allocated to SGPRs. However, for compute functions, this is ignored and function parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-11-08 11:35:52 -08:00
Diana Picus	3b905a0be5	[AMDGPU] ISel for llvm.amdgcn.set.inactive.chain.arg Add patterns to select int_amdgcn_set_inactive_chain_arg to V_SET_INACTIVE. This could probably use some more testing, but at least for simple cases V_SET_INACTIVE seems to mostly work out of the box. Differential Revision: https://reviews.llvm.org/D158605	2023-11-08 09:53:47 +01:00
Diana Picus	39830fea28	[AMDGPU][PEI] Set up SP for chain functions Initialize the SP to 0 in the prologue of functions with the `amdgpu_cs_chain` or `amdgpu_cs_chain_preserve` calling conventions, but only if they need one (i.e. if they contain calls to `amdgpu_gfx` functions or if they have stack objects). Also make sure we don't try to realign the stack (since 0 is aligned enough). Differential Revision: https://reviews.llvm.org/D156413	2023-11-08 09:27:34 +01:00
Diana	1fa58c7790	[AMDGPU] Callee saves for amdgpu_cs_chain[_preserve] (#71526 ) Teach prolog epilog insertion how to handle functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. For amdgpu_cs_chain functions, we only need to preserve the inactive lanes of VGPRs above v8, and only in the presence of calls via @llvm.amdgcn.cs.chain. For amdgpu_cs_chain_preserve functions, we will also need to preserve the active lanes for registers above the last argument VGPR. AFAICT there's no direct way to find out what the last argument VGPR is, so instead the patch uses the fact that chain calls from amdgpu_cs_chain_preserve functions can't use more VGPRs than the caller's VGPR arguments. In other words, it removes the operands of SI_CS_CHAIN_TC instructions from the list of callee saved registers. For both calling conventions, registers v0-v7 never need to be saved and restored, so we should never add them as WWM spills. Differential Revision: https://reviews.llvm.org/D156412	2023-11-08 08:28:15 +01:00
Carl Ritson	af6ff98c53	[AMDGPU] Move WWM register pre-allocation to during regalloc (#70618 ) Move SIPreAllocateWWMRegs pass to just before VGPR allocation. This saves recomputation of the virtual matrix and live reg map, with the slight regression in O0 that live intervals and slot indexes must be computed.	2023-11-08 11:54:28 +09:00
Pierre van Houtryve	5db63d29fd	[AMDGPU] PromoteAlloca: Handle load/store subvectors using non-constant indexes (#71505 ) I assumed indexes were always ConstantInts, but that's not always the case. They can be other things as well. We can easily handle that by just emitting an add and let InstSimplify do the constant folding for cases where it's really a ConstantInt. Solves SWDEV-429935	2023-11-07 15:29:41 +01:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Nikita Popov	17764d2c87	[IR] Remove FP cast constant expressions (#71408 ) Remove support for the fptrunc, fpext, fptoui, fptosi, uitofp and sitofp constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. With this, the only remaining FP operation that still has constant expression support is fcmp. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-07 09:34:16 +01:00
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Amara Emerson	6b69584660	[GlobalISel] Fall back for bf16 conversions. (#71470 ) We don't support these correctly since we don't yet have FP types. AMDGPU tests were silently miscompiling bf16 as if they were fp16.	2023-11-06 21:18:57 -08:00
Jay Foad	521ac12a25	[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407 ) The special handling for blocks ending with a long branch has been unnecessary since D106445: "[amdgpu] Add 64-bit PC support when expanding unconditional branches."	2023-11-06 16:29:52 +00:00
Jay Foad	1c6102d19b	[AMDGPU] Regenerate checks for long-branch-reserve-register.ll	2023-11-06 15:33:23 +00:00
Nikita Popov	f9404a1b57	[AMDGPU] Regenerate test to fix failure	2023-11-06 15:42:02 +01:00
Valery Pykhtin	fe6893b1d8	Improve selection of conditional branch on amdgcn.ballot!=0 condition in SelectionDAG. (#68714 ) Improve selection of the following pattern: bool cnd = ... if (amdgcn.ballot(cnd) != 0) { ... } which means "execute _then_ if any lane has satisfied the _cnd_ condition".	2023-11-06 15:16:49 +01:00
sstipanovic	22a323e3db	[AMDGPU] Select v_lshl_add_u32 instead of v_mul_lo_u32 by constant (#71035 ) Instead of: v_mul_lo_u32 v0, v0, 5 we should generate: v_lshl_add_u32 v0, v0, 2, v0.	2023-11-06 14:52:27 +01:00
Diana	7f5d59b38d	[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186 ) The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call parameters are bundled up into 2 intrinsic arguments, one for those that should go in the SGPRs (the 3rd intrinsic argument), and one for those that should go in the VGPRs (the 4th intrinsic argument). Both will often be some kind of aggregate. Both instruction selection frameworks have some internal representation for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel, ISD::INTRINSIC_[VOID\|WITH_CHAIN] for DAGISel), but we can't use those because aggregates are dissolved very early on during ISel and we'd lose the inreg information. Therefore, this patch shortcircuits both the IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call from the very start. It tries to use the existing infrastructure as much as possible, by calling into the code for lowering tail calls. This has already gone through a few rounds of review in Phab: Differential Revision: https://reviews.llvm.org/D153761	2023-11-06 12:30:07 +01:00
Carl Ritson	19bfe08c7f	Reapply [AMDGPU] Generate wwm-reserved.ll (NFC) Fix target triple so address locations are host independent.	2023-11-06 13:26:06 +09:00
Jessica Del	6e4692c9ee	[AMDGPU] - Add s_wqm intrinsics (#71048 ) Add intrinsics to generate `s_wqm_b32` and `s_wqm_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-03 14:48:59 +01:00
Nikita Popov	e4a4122eb6	[IR] Remove zext and sext constant expressions (#71040 ) Remove support for zext and sext constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. There is some additional cleanup that can be done on top of this, e.g. we can remove the ZExtInst vs ZExtOperator footgun. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-03 10:46:07 +01:00
Nico Weber	6acd1671e6	Revert "[AMDGPU] Generate wwm-reserved.ll (NFC)" This reverts commit b3523d7e6d8834468cfcb66e629adbe17da90ea5. Breaks tests on mac, see: https://github.com/llvm/llvm-project/commit/b3523d7e6d88344#commitcomment-131547708	2023-11-02 14:55:41 -04:00
Jay Foad	b90cfe4601	[AMDGPU] New ttracedata intrinsics (#70235 ) Add llvm.amdgcn.s.ttracedata and llvm.amdgcn.s.ttracedata.imm which map directly to the corresponding instructions s_ttracedata and s_ttracedata_imm. These are inherently whole-wave operations so any non-uniform inputs are readfirstlaned.	2023-11-02 10:35:15 +00:00
Jay Foad	65bad23e43	[AMDGPU] Fix test for #70532 (Implement moveToVALU for S_CSELECT_B64)	2023-11-02 10:31:02 +00:00
Jay Foad	1590cac494	[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352 ) moveToVALU previously only handled S_CSELECT_B64 in the trivial case where it was semantically equivalent to a copy. Implement the general case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of V_CNDMASK_B64_PSEUDO with immediate as well as register operands.	2023-11-02 10:08:09 +00:00
Jessica Del	41cf94e6b8	[AMDGPU] - Add s_quadmask intrinsics (#70804 ) Add intrinsics to generate `s_quadmask_b32` and `s_quadmask_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-02 10:37:52 +01:00
Thomas Symalla	18839aec4e	[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293 ) During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to be alive. This is forced by clearing the kill flags on the operand after V_CMPX has been generated. However, if we have a kill of a register set that contains said register, this will not be detected by clearKillFlags. With this change, possible additional kill-flag candidates will be detected during the final call to findInstrBackwards and then, the kill flag will be removed to keep all registers in the set alive. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>	2023-11-02 10:36:27 +01:00
Carl Ritson	b3523d7e6d	[AMDGPU] Generate wwm-reserved.ll (NFC)	2023-11-02 17:50:42 +09:00
Carl Ritson	0eb516817d	[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012 ) SIWholeQuadMode preserves dominator and post dominator trees, but does not require them.	2023-11-02 17:16:19 +09:00
Tobias Stadler	373c343a77	Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND Reland 3686a0b after fixing an exposed miscompile in #68840 Differential Revision: https://reviews.llvm.org/D159140	2023-11-02 00:18:19 +01:00
Valery Pykhtin	e808f8a616	[AMDGPU] GCNRegPressurePrinter pass to print GCNRegPressure values for testing. (#70031 ) Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.	2023-11-01 23:01:39 +01:00
Jay Foad	86f2e09250	[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960 ) When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the GlobalAddress operands have to be adjusted by 4 or 12 bytes to account for the offset from the end of the S_GETPC instruction to the literal operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of duplicating the adjustment code in both AMDGPULegalizerInfo and SITargetLowering. NFCI.	2023-11-01 19:48:30 +00:00
Simon Pilgrim	51d4ad6701	[AMDGPU] amdgpu-codegenprepare-idiv.ll - regenerate checks. NFC. Reduces diffs in a future patch	2023-10-31 13:24:27 +00:00
Jay Foad	a6dabed348	[AMDGPU] Fix nondeterminism in SIFixSGPRCopies (#70644 ) There are a couple of loops that iterate over V2SCopies. The iteration order needs to be deterministic, otherwise we can call moveToVALU in different orders, which causes temporary vregs to be allocated in different orders, which can affect register allocation heuristics.	2023-10-31 11:47:42 +00:00
Jessica Del	b8d3ccdff1	[AMDGPU] - Add s_bitreplicate intrinsic (#69209 ) Add intrinsic for s_bitreplicate. Lower to S_BITREPLICATE_B64_B32 machine instruction in both GISel and Selection DAG. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-10-31 11:26:45 +01:00
Craig Topper	9a7c26a399	[GISel] Restrict G_BSWAP to multiples of 16 bits. (#70245 ) This is consistent with the IR verifier and SelectionDAG's getNode. Update tests accordingly. I tried to keep some coverage of non-pow2 when possible. X86 didn't like a G_UNMERGE_VALUES from s48 to 3 s16 that got created when I tried s48.	2023-10-30 10:27:57 -07:00
Jay Foad	101008be83	[AMDGPU] CodeGen for 64-bit buffer atomic cmpswap intrinsics (#70475 ) Implement codegen for: llvm.amdgcn.raw.buffer.atomic.cmpswap.i64 llvm.amdgcn.raw.ptr.buffer.atomic.cmpswap.i64 llvm.amdgcn.struct.buffer.atomic.cmpswap.i64 llvm.amdgcn.struct.ptr.buffer.atomic.cmpswap.i64	2023-10-30 16:44:22 +00:00
Jessica Del	849297c97d	[AMDGPU][wmma] - Add tied wmma intrinsic (#69903 ) These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and `amdgcn_wmma_tied_f16_16x16x16_f16`, explicitly tie the destination accumulator matrix to the input accumulator matrix. The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the 32-bit destination VGPRs. Which half is determined via the `op_sel` argument. The other half of the destination registers remains unchanged. In some cases however, we expect the destination to copy the other halves from the input accumulator. For instance, when packing two separate accumulator matrices into one. In that case, the two matrices are tied into the same registers, but separate halves. Then it is important to copy the other matrix values to the new destination.	2023-10-30 16:23:49 +01:00
Stanislav Mekhanoshin	fe8335babb	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395 ) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781	2023-10-30 08:12:28 -07:00
Stanislav Mekhanoshin	ee6d62db99	[AMDGPU] Prevent folding of the negative i32 literals as i64 (#70274 ) We can use sign extended 64-bit literals, but only for signed operands. At the moment we do not know if an operand is signed. Such operand will be encoded as its low 32 bits and then either correctly sign extended or incorrectly zero extended by HW.	2023-10-30 08:07:43 -07:00
Simon Pilgrim	d96529af3c	[DAG] Attempt shl narrowing in SimplifyDemandedBits (REAPPLIED) If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext. Followup to D146121 Reapplied - moved after the ShrinkDemandedOp call; reuse the existing KnownBits result; ensure that we only attempt this if all the upper bits are demanded; 547dc461225ba should address the remaining regressions that were noticed in the previous commit. Differential Revision: https://reviews.llvm.org/D155472	2023-10-29 15:38:46 +00:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Stanislav Mekhanoshin	d136432038	[AMDGPU] Remove unneeded implicit-def from shrink-i32-kimm.mir. NFC. (#70489 )	2023-10-27 13:32:48 -07:00
Guozhi Wei	9a091de7fe	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-27 19:47:23 +00:00
Christudasan Devadasan	a0eb6b88f9	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924 ) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog.	2023-10-27 19:10:18 +05:30
Christudasan Devadasan	f9cd789658	[AMDGPU] Add pseudo instructions for SGPR spill to VGPR (#69923 ) For a future patch, is it important to keep the lowered SGPR spills to be recognized as spill instructions during regalloc. Directly lowering them into V_WRITELANE/V_READLANE won't allow us to attach the SPILL flag to their instructions. This patch introduces the pseudo instructions with the SGPRSpill flag set in their Desc. They will get lowered to equivalent instructions later during post RA pseudo expansion.	2023-10-27 17:24:10 +05:30
Matt Arsenault	b8b491c9d7	AMDGPU: Add infinite looping testcase after subrange spilling change This infinite looped after d8127b2ba8a87a610851b9a462f2fc2526c36e37	2023-10-27 17:42:14 +09:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Alexander Richardson	f118d474eb	[AMDGPU] Use alloca address space in rewrite-out-arguments.ll (#70269 ) This is needed for the transform to fire with a correct data layout. Pre-commiting this change to keep the diff of D141060 smaller.	2023-10-26 15:08:58 +01:00
Christudasan Devadasan	bb2b7530ad	[AMDGPU] precommit lit test for PR 69924.	2023-10-26 17:43:14 +05:30
Jay Foad	e9c4dc18bc	Revert "[AMDGPU] Use `S_CSELECT` for uniform i1 ext (#69703 )" This reverts commit a1260b5209968c08886e3c6183aa793de8931578. It was causing some Vulkan CTS failures.	2023-10-26 12:56:32 +01:00
Christudasan Devadasan	16fbc45f48	Revert "[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 )" This reverts commit 7ce613fc77af092dd6e9db71ce3747b75bc5616e.	2023-10-26 17:04:28 +05:30

1 2 3 4 5 ...

6927 Commits