llvm-project

Author	SHA1	Message	Date
Joseph Huber	a1da746157	[AMDGPU] Place global constructors in .init_array and .fini_array For the GPU, we emit external kernels that call the initializers and constructors, however if we had a persistent kernel like in the `_start` kernel for the `libc` project, we could initialize the standard way of calling constructors. This patch adds new global variables containing pointers to the constructors to be called. If these are placed in the `.init_array` and `.fini_array` sections, then the backend will handle them specially. The linker will then provide the `__init_array_` and `__fini_array_` sections to traverse them. An implementation would look like this. ``` extern uintptr_t __init_array_start[]; extern uintptr_t __init_array_end[]; extern uintptr_t __fini_array_start[]; extern uintptr_t __fini_array_end[]; using InitCallback = void(int, char , char ); using FiniCallback = void(void); extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void _start(int argc, char argv, char envp) { uint64_t init_array_size = __init_array_end - __init_array_start; for (uint64_t i = 0; i < init_array_size; ++i) reinterpret_cast<InitCallback >(__init_array_start[i])(argc, argv, env); uint64_t fini_array_size = __fini_array_end - __fini_array_start; for (uint64_t i = 0; i < fini_array_size; ++i) reinterpret_cast<FiniCallback >(__fini_array_start[i])(); } ``` Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D149340	2023-04-29 08:40:19 -05:00
Jay Foad	56af0e913c	[EarlyCSE] Do not CSE convergent calls in different basic blocks "convergent" is documented as meaning that the call cannot be made control-dependent on more values, but in practice we also require that it cannot be made control-dependent on fewer values, e.g. it cannot be hoisted out of the body of an "if" statement. In code like this, if we allow CSE to combine the two calls: x = convergent_call(); if (cond) { y = convergent_call(); use y; } then we get this: x = convergent_call(); if (cond) { use x; } This is conceptually equivalent to moving the second call out of the body of the "if", up to the location of the first call, so it should be disallowed. Differential Revision: https://reviews.llvm.org/D149348	2023-04-28 14:50:48 +01:00
Jay Foad	5534d1d834	[CSE] Precommit an AMDGPU test case for D149348 Differential Revision: https://reviews.llvm.org/D149349	2023-04-28 14:50:48 +01:00
Jeffrey Byrnes	7f0a881e6c	[AMDGPU] Track liveins for max-ilp-sched-strategy Even if optimizing for ILP, it is still useful to track RP to avoid spilling. Given that, we need to maintin consistent liveness state with the RP tracker. This patch makes RP tracking consistent by updating for liveins. Otherwise, we should completely eliminate RP tracking for this scheduler (checkScheduling, initCandidate). Differential Revision: https://reviews.llvm.org/D149358	2023-04-27 16:45:45 -07:00
Changpeng Fang	1ab8b9ae15	AMDGPU: Define sub-class of SGPR_64 for tail call return Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824	2023-04-27 10:45:11 -07:00
skc7	e016fb57b3	[AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. Legalize soffset of buffer instructions using waterfall loop. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141030	2023-04-27 19:36:50 +05:30
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Jay Foad	47d3cbcf84	[BranchFolder] Skip redundant IMPLICIT_DEFs of subregs Differential Revision: https://reviews.llvm.org/D148509	2023-04-27 09:40:06 +01:00
Jay Foad	12b70ad68c	[BranchFolder] Precommit AMDGPU test case for D148509	2023-04-27 09:40:06 +01:00
Nicolai Hähnle	1e63f8272e	AMDGPU: Fix an assertion in SIOptimizeVGPRLiveRange As the comment notes, the shader results in an INSERT_SUBREG with "undef" (dead) operand in the Endif block. The same can happen with REG_SEQUENCE. The register is considered dead from a liveness analysis perspective. The correct thing to do seems to be nothing: we keep the undef use of the register, the register allocator should still be able to take the liveness into account correctly. Differential Revision: https://reviews.llvm.org/D149161	2023-04-27 09:39:44 +02:00
Matt Arsenault	bbc7b30fbf	AMDGPU: Remove invalid testcase for enqueue kernel The call didn't have the right calling convention, but calls to kernels are supposed to be illegal anyway.	2023-04-26 17:25:30 -04:00
Jay Foad	22516593ae	[AMDGPU] Add GFX11 ds_min_f32 / ds_max_f32 tests	2023-04-26 17:09:12 +01:00
Joe Nash	f8ec7a0944	[AMDGPU] Delete test for illegal v_cndmask_b16_dpp There are no VOP2 or VOP2 with dpp forms of v_cndmask_b16. Delete the test. NFC. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D149184	2023-04-26 09:50:44 -04:00
Janek van Oirschot	124acb7ca3	[AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D149080	2023-04-26 14:10:25 +01:00
OCHyams	2b3c13b716	[DebugInfo] Treat empty metadata operands the same as undef operands in SelectionDAG Without this patch SelectionDAG silently drops dbg.values using `!{}` operands. Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D140990	2023-04-26 09:03:07 +01:00
Jay Foad	0c13e0b748	[AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer After D147334 we never select _SGPR forms of SMEM instructions on subtargets that also support the _SGPR_IMM form, so there is no need to handle them here. Differential Revision: https://reviews.llvm.org/D149139	2023-04-25 15:40:13 +01:00
Matt Arsenault	2fce50e8f5	AMDGPU: Fix assertion with multiple uses of f64 fneg of select A bitcast needs to be inserted back to the original type. Just skip the multiple use case for a safer quick fix. Handling the multiple use case seems to be beneficial in some but not all cases.	2023-04-20 10:15:18 -04:00
Matt Arsenault	99d4c722e3	AMDGPU: Really invert handling of enqueued block detection Remove the broken call graph analysis in the block enqueue lowering pass. The previous iteration was reverted due to a runtime bug when the completion action was unconditionally enabled.	2023-04-20 06:58:24 -04:00
Pravin Jagtap	21a69bdb66	[NewPM][AMDGPU] Port amdgpu-atomic-optimizer Reviewed By: arsenm, sameerds, gandhi21299 Differential Revision: https://reviews.llvm.org/D148628	2023-04-20 00:27:47 -04:00
Jay Foad	e1ae0e2b7d	[AMDGPU] Fix some check prefixes	2023-04-19 16:15:14 +01:00
Jay Foad	141c476a36	[AMDGPU] Remove unused check lines from tests	2023-04-19 16:15:14 +01:00
Jay Foad	43b035b483	[AMDGPU] Remove unused check lines from GlobalISel IR tests	2023-04-19 15:13:02 +01:00
Jay Foad	d9ed0dee0c	[AMDGPU] Remove unused check lines from GlobalISel MIR tests	2023-04-19 15:03:32 +01:00
Jay Foad	bf4dc4381e	[AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL This reverts parts of D123693. The functionality of allowing unsupported intrinsics to select has been superseded by D139000 "Remove function with incompatible features". Retain assembler/disassembler support for v_illegal on GFX10+ only, where it is documented. Differential Revision: https://reviews.llvm.org/D148127	2023-04-19 09:59:46 +01:00
Chen Zheng	3f4055dec4	[GlobalISelEmitter] handle operand without MVT/class There are some patterns in td files without MVT/class set for some operands in target pattern that are from the source pattern. This prevents GlobalISelEmitter from adding them as a valid rule, because the target child operand is an unsupported kind operand. For now, for a leaf child, only IntInit and DefInit are handled in GlobalISelEmitter. This issue can be workaround by adding MVT/class to the patterns in the td files, like the workarounds for patterns anyext and setcc in PPCInstrInfo.td in D140878. To avoid adding the same workarounds for other patterns in td files, this patch tries to handle the UnsetInit case in GlobalISelEmitter. Adding the new handling allows us to remove the workarounds in the td files and also generates many selection rules for PPC target. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141247	2023-04-19 07:00:57 +00:00
chenglin.bi	9356097206	Revert "[AMDGPU] Ressociate patterns with sub to use SALU" The patch will caused dead loop because of DAGCombiner's canonicalization: // (x + C) - y -> (x - y) + C // y - (x + C) -> (y - x) - C // (x - C) - y -> (x - y) - C // (C - x) - y -> C - (x + y) This reverts commit b3529b5bf3ba2cd7f38665de16450afefb263c9b.	2023-04-19 11:15:14 +08:00
David Stuttard	4d54565436	[AMDGPU] Remove unnecessary assert Also remove the function attributes from the test. For PAL based shaders this isn't required. Differential Revision: https://reviews.llvm.org/D148625	2023-04-18 13:41:38 +01:00
pvanhout	ec82188451	[AMDGPU] Do not crash on agpr_hi16 in AMDGPUResourceUsageAnalysis Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D148438	2023-04-18 13:53:01 +02:00
Kriti Gupta	a3dfa4e083	[test] Remove occurences of br undef in CodeGen/AMDGPU tests Differential Revision: https://reviews.llvm.org/D148041	2023-04-18 08:47:29 +01:00
chenglin.bi	b3529b5bf3	[AMDGPU] Ressociate patterns with sub to use SALU Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D148463	2023-04-18 06:49:41 +08:00
Jay Foad	2be3189601	[AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+ On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly more powerful than the _SGPR form. It simplifies codegen if we always select the _SGPR_IMM form with an immediate offset of 0 instead of the _SGPR form. Note that this patch just makes minimal changes to the selection patterns to prove the concept. Further simplifications are possible to reduced the number of selection patterns. On GFX9 the _SGPR form of the Real instruction is still required for assembly/disassembly but on GFX10+ it can be removed completely. Differential Revision: https://reviews.llvm.org/D147334	2023-04-17 16:23:30 +01:00
pvanhout	ae77aceba5	[Analysis] Remove DA & LegacyDA UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs. See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538 - Remove LegacyDivergenceAnalysis.h/.cpp - Remove DivergenceAnalysis.h/.cpp + Unit tests - Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs - Remove/adjust references to the passes in the docs where applicable - Remove TTI hook associated with those passes. - Move tests to UniformityAnalysis folder. - Remove RUN lines for the DA, leave only the UA ones. - Some tests had to be adjusted/removed depending on how they used the legacy DAs. Reviewed By: foad, sameerds Differential Revision: https://reviews.llvm.org/D148116	2023-04-17 09:01:22 +02:00
Jay Foad	6b5067a81a	[AMDGPU] Don't assert that image intrinsics are supported Unsupported intrinsics should give a regular "cannot select" error. Differential Revision: https://reviews.llvm.org/D148147	2023-04-16 19:54:55 +01:00
pvanhout	b3b3cb2d2f	[AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason. Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147786	2023-04-14 15:41:26 +02:00
David Stuttard	fc83f1de5d	[AMDGPU] Add backend support for new PAL ELF Metadata 3.0 PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. Rather than using opaque registers which can change between different architectures and requires encoding the bitfield information in the backend, which may change between versions. This is the initial minimal implementation that enables the use of PAL Metadata 3.0. The change itself should be NFC for non-PAL, although the way RSRC2 register is handled has been changed slightly. The test is fairly minimal, but checks that the metadata format looks as expected and verifies a couple of special cases such as tgid_[xyz]_en handling and PsInputAddr/Ena which also change to explicit fields. Differential Revision: https://reviews.llvm.org/D147143	2023-04-14 09:57:13 +01:00
Diana Picus	b9ba05360e	[AMDGPU] Don't S_MOV_B32 into $scc The peephole optimizer tries to replace ``` %n:sgpr_32 = S_MOV_B32 x $scc = COPY %n ``` with a `S_MOV_B32` directly into `$scc`. This crashes because `S_MOV_B32` cannot take `$scc` as input. We currently generate code like this from GlobalISel when lowering a G_BRCOND with a constant condition. We should probably look into removing this kind of branch altogether, but until then we should at least not crash. This patch fixes the issue by making sure we don't apply the peephole optimization when trying to move into a physical register that doesn't belong to the correct register class. Differential Revision: https://reviews.llvm.org/D148117	2023-04-14 10:24:43 +02:00
Jay Foad	2d39f5b5cd	[AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis With architected SGPRs, workgroup IDs are passed into a compute shader in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead of failing an assertion. Differential Revision: https://reviews.llvm.org/D148239	2023-04-13 16:56:22 +01:00
pvanhout	fd1d60873f	[AMDGPU] Remove CC exception for Promote Alloca Limits Apparently it was used to work around some issue that has been fixed. Removing it helps with high scratch usage observed in some cases due to failed alloca promotion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D145586	2023-04-13 08:48:34 +02:00
Sebastian Neubauer	fee3980df5	[AMDGPU] Fix amdgpu_gfx tail-call test The inreg argument prevented the tail call optimization to kick in. Remove the inreg, so this test actually uses a tail call. Note that it now uses s[4:5] for the return address, which is invalid, because these registers are supposed to be callee-save. D147096 tried to fix that problem for the C calling convention. Differential Revision: https://reviews.llvm.org/D148119	2023-04-12 16:15:09 +02:00
Matt Arsenault	f608ac6286	AMDGPU: Push fneg into bitcast of integer select Avoids some regressions in the math libraries in a future patch.	2023-04-12 06:48:58 -04:00
Joe Nash	57fe7f450d	[AMDGPU] NFC. Clean up check prefixes in fcmp test Fix the test introduced in D136592 which appeared to have a few check lines containing an un-checked prefix "GISEL-GFX". Also canonicalize the other prefixes to minimize churn if SDag and GISel diverge. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147958	2023-04-11 09:58:03 -04:00
skc7	97f8d6b2ec	[AMDGPU][NFC] Regenerate test checks for sub.ll	2023-04-11 17:55:49 +05:30
Matt Arsenault	0f59720e1c	AMDGPU: Fold fneg into bitcast of build_vector The math libraries have a lot of code that performs manual sign bit operations by bitcasting doubles to int2 and doing bithacking on them. This is a bad canonical form we should rewrite to use high level sign operations directly on double. To avoid codegen regressions, we need to do a better job moving fnegs to operate only on the high 32-bits. This is only halfway to fixing the real case.	2023-04-11 07:12:01 -04:00
Diana Picus	d9bf8aba23	[AMDGPU] Add MMOs for GFX11 Streamout Instructions The GFX11 NGG Streamout Instructions perform atomic operations on dedicated registers. At the moment, they lack machine memory operands, which causes the si-memory-legalizer pass to treat them conservatively and introduce several unnecessary waits and cache invalidations. This patch introduces a new address space to represent these special registers and teaches instruction selection to add memory operands with this new address space to DS_ADD/SUB_GS_REG_RTN. Since this address space is meant to be compiler-internal, we move it up a bit from the other address spaces and give it the number 128. According to the LLVM Language Reference, address space numbers can go all the way up to 2^24, but I'm not sure how well this is supported in practice [1], so using a smaller number seems safer. [1] `0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)` Differential Revision: https://reviews.llvm.org/D146031	2023-04-11 11:11:32 +02:00
pvanhout	0ff02cf015	[AMDGPU] Hide "removing function" diagnostics by default Use an `OptimizationRemark` for them even though it's not really an optimization. It just integrates better with the other diagnostics (enabling is easy with `-pass-remark`). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147703	2023-04-11 09:26:20 +02:00
Changpeng Fang	3bc1e084ee	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-04-10 10:53:33 -07:00
skc7	635c725b30	[AMDGPU][NFC] Regenerate test checks for merge-tbuffer.mir	2023-04-10 18:47:52 +05:30
mmarjano	f6e70ed1c7	[AMDGPU] Extend tbuffer_load_format merge Add support for merging _IDXEN and _BOTHEN variants of TBUFFER_LOAD_FORMAT instruction.	2023-04-10 12:24:21 +02:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Sergei Barannikov	790252dfb3	[AMDGPU] Update mcp-overlap-after-propagation.mir test The test was added in D69953, but since D67794 landed a bit later it no longer catches the bug. Update the test to be representative again. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146934	2023-04-09 01:09:00 +03:00

1 2 3 4 5 ...

6326 Commits