llvm-project

Author	SHA1	Message	Date
Ronak Chauhan	5f0b92e580	[AMDGPU] Also consider global and scratch instructions when flushing vmcnt counter in loop preheader Reviewed By: foad Differential Revision: https://reviews.llvm.org/D149332	2023-05-05 21:12:10 +05:30
Nicolai Hähnle	ef13308b26	AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors v2: - simplify the escape to TableGen patterns Differential Revision: https://reviews.llvm.org/D149841	2023-05-05 10:55:18 +02:00
Nicolai Hähnle	909095a880	AMDGPU: Precommit test showing codegen weakness The code sequence on gfx9 has a lot of useless v_bfi instructions. Differential Revision: https://reviews.llvm.org/D149840	2023-05-04 14:11:04 +02:00
Krzysztof Drewniak	fc05b7f0d0	[AMDGPU] Add gfx940 to fp64 atomic tests in global ISel This changes the test in GlobalISel, which makes it match the test elsewhere. Differential Revision: https://reviews.llvm.org/D149795	2023-05-03 22:40:16 +00:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Mateja Marjanovic	cf76074a36	[AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass, check if it has the exact same value. For LLVM vector types that don't have a corresponding Register Class, widen them during legalization. That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR. Differential revision: https://reviews.llvm.org/D148096 Reviewers: foad, arsenm	2023-05-03 17:32:24 +02:00
Mateja Marjanovic	6175ec0bb6	Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR" This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.	2023-05-03 17:28:01 +02:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Mateja Marjanovic	b25c7cafcb	[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT, G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.	2023-05-03 17:14:38 +02:00
pvanhout	415956fe7e	[llvm-readobj][AMDGPU] Bypass MD verification for PAL Small split change from D146023. Migrate elf-notes to v4 and fix llvm-readobj to work with PAL metadata. Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D146119	2023-05-03 08:45:24 +02:00
Joseph Huber	a1da746157	[AMDGPU] Place global constructors in .init_array and .fini_array For the GPU, we emit external kernels that call the initializers and constructors, however if we had a persistent kernel like in the `_start` kernel for the `libc` project, we could initialize the standard way of calling constructors. This patch adds new global variables containing pointers to the constructors to be called. If these are placed in the `.init_array` and `.fini_array` sections, then the backend will handle them specially. The linker will then provide the `__init_array_` and `__fini_array_` sections to traverse them. An implementation would look like this. ``` extern uintptr_t __init_array_start[]; extern uintptr_t __init_array_end[]; extern uintptr_t __fini_array_start[]; extern uintptr_t __fini_array_end[]; using InitCallback = void(int, char , char ); using FiniCallback = void(void); extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void _start(int argc, char argv, char envp) { uint64_t init_array_size = __init_array_end - __init_array_start; for (uint64_t i = 0; i < init_array_size; ++i) reinterpret_cast<InitCallback >(__init_array_start[i])(argc, argv, env); uint64_t fini_array_size = __fini_array_end - __fini_array_start; for (uint64_t i = 0; i < fini_array_size; ++i) reinterpret_cast<FiniCallback >(__fini_array_start[i])(); } ``` Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D149340	2023-04-29 08:40:19 -05:00
Jay Foad	56af0e913c	[EarlyCSE] Do not CSE convergent calls in different basic blocks "convergent" is documented as meaning that the call cannot be made control-dependent on more values, but in practice we also require that it cannot be made control-dependent on fewer values, e.g. it cannot be hoisted out of the body of an "if" statement. In code like this, if we allow CSE to combine the two calls: x = convergent_call(); if (cond) { y = convergent_call(); use y; } then we get this: x = convergent_call(); if (cond) { use x; } This is conceptually equivalent to moving the second call out of the body of the "if", up to the location of the first call, so it should be disallowed. Differential Revision: https://reviews.llvm.org/D149348	2023-04-28 14:50:48 +01:00
Jay Foad	5534d1d834	[CSE] Precommit an AMDGPU test case for D149348 Differential Revision: https://reviews.llvm.org/D149349	2023-04-28 14:50:48 +01:00
Jeffrey Byrnes	7f0a881e6c	[AMDGPU] Track liveins for max-ilp-sched-strategy Even if optimizing for ILP, it is still useful to track RP to avoid spilling. Given that, we need to maintin consistent liveness state with the RP tracker. This patch makes RP tracking consistent by updating for liveins. Otherwise, we should completely eliminate RP tracking for this scheduler (checkScheduling, initCandidate). Differential Revision: https://reviews.llvm.org/D149358	2023-04-27 16:45:45 -07:00
Changpeng Fang	1ab8b9ae15	AMDGPU: Define sub-class of SGPR_64 for tail call return Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address. Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention. Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input. Reviewers: arsenm, cdevadas and sebastian-ne Differential Revision: https://reviews.llvm.org/D148824	2023-04-27 10:45:11 -07:00
skc7	e016fb57b3	[AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. Legalize soffset of buffer instructions using waterfall loop. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141030	2023-04-27 19:36:50 +05:30
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Jay Foad	47d3cbcf84	[BranchFolder] Skip redundant IMPLICIT_DEFs of subregs Differential Revision: https://reviews.llvm.org/D148509	2023-04-27 09:40:06 +01:00
Jay Foad	12b70ad68c	[BranchFolder] Precommit AMDGPU test case for D148509	2023-04-27 09:40:06 +01:00
Nicolai Hähnle	1e63f8272e	AMDGPU: Fix an assertion in SIOptimizeVGPRLiveRange As the comment notes, the shader results in an INSERT_SUBREG with "undef" (dead) operand in the Endif block. The same can happen with REG_SEQUENCE. The register is considered dead from a liveness analysis perspective. The correct thing to do seems to be nothing: we keep the undef use of the register, the register allocator should still be able to take the liveness into account correctly. Differential Revision: https://reviews.llvm.org/D149161	2023-04-27 09:39:44 +02:00
Matt Arsenault	bbc7b30fbf	AMDGPU: Remove invalid testcase for enqueue kernel The call didn't have the right calling convention, but calls to kernels are supposed to be illegal anyway.	2023-04-26 17:25:30 -04:00
Jay Foad	22516593ae	[AMDGPU] Add GFX11 ds_min_f32 / ds_max_f32 tests	2023-04-26 17:09:12 +01:00
Joe Nash	f8ec7a0944	[AMDGPU] Delete test for illegal v_cndmask_b16_dpp There are no VOP2 or VOP2 with dpp forms of v_cndmask_b16. Delete the test. NFC. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D149184	2023-04-26 09:50:44 -04:00
Janek van Oirschot	124acb7ca3	[AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D149080	2023-04-26 14:10:25 +01:00
OCHyams	2b3c13b716	[DebugInfo] Treat empty metadata operands the same as undef operands in SelectionDAG Without this patch SelectionDAG silently drops dbg.values using `!{}` operands. Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D140990	2023-04-26 09:03:07 +01:00
Jay Foad	0c13e0b748	[AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer After D147334 we never select _SGPR forms of SMEM instructions on subtargets that also support the _SGPR_IMM form, so there is no need to handle them here. Differential Revision: https://reviews.llvm.org/D149139	2023-04-25 15:40:13 +01:00
Matt Arsenault	2fce50e8f5	AMDGPU: Fix assertion with multiple uses of f64 fneg of select A bitcast needs to be inserted back to the original type. Just skip the multiple use case for a safer quick fix. Handling the multiple use case seems to be beneficial in some but not all cases.	2023-04-20 10:15:18 -04:00
Matt Arsenault	99d4c722e3	AMDGPU: Really invert handling of enqueued block detection Remove the broken call graph analysis in the block enqueue lowering pass. The previous iteration was reverted due to a runtime bug when the completion action was unconditionally enabled.	2023-04-20 06:58:24 -04:00
Pravin Jagtap	21a69bdb66	[NewPM][AMDGPU] Port amdgpu-atomic-optimizer Reviewed By: arsenm, sameerds, gandhi21299 Differential Revision: https://reviews.llvm.org/D148628	2023-04-20 00:27:47 -04:00
Jay Foad	e1ae0e2b7d	[AMDGPU] Fix some check prefixes	2023-04-19 16:15:14 +01:00
Jay Foad	141c476a36	[AMDGPU] Remove unused check lines from tests	2023-04-19 16:15:14 +01:00
Jay Foad	43b035b483	[AMDGPU] Remove unused check lines from GlobalISel IR tests	2023-04-19 15:13:02 +01:00
Jay Foad	d9ed0dee0c	[AMDGPU] Remove unused check lines from GlobalISel MIR tests	2023-04-19 15:03:32 +01:00
Jay Foad	bf4dc4381e	[AMDGPU] Don't transform illegal intrinsics to V_ILLEGAL This reverts parts of D123693. The functionality of allowing unsupported intrinsics to select has been superseded by D139000 "Remove function with incompatible features". Retain assembler/disassembler support for v_illegal on GFX10+ only, where it is documented. Differential Revision: https://reviews.llvm.org/D148127	2023-04-19 09:59:46 +01:00
Chen Zheng	3f4055dec4	[GlobalISelEmitter] handle operand without MVT/class There are some patterns in td files without MVT/class set for some operands in target pattern that are from the source pattern. This prevents GlobalISelEmitter from adding them as a valid rule, because the target child operand is an unsupported kind operand. For now, for a leaf child, only IntInit and DefInit are handled in GlobalISelEmitter. This issue can be workaround by adding MVT/class to the patterns in the td files, like the workarounds for patterns anyext and setcc in PPCInstrInfo.td in D140878. To avoid adding the same workarounds for other patterns in td files, this patch tries to handle the UnsetInit case in GlobalISelEmitter. Adding the new handling allows us to remove the workarounds in the td files and also generates many selection rules for PPC target. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141247	2023-04-19 07:00:57 +00:00
chenglin.bi	9356097206	Revert "[AMDGPU] Ressociate patterns with sub to use SALU" The patch will caused dead loop because of DAGCombiner's canonicalization: // (x + C) - y -> (x - y) + C // y - (x + C) -> (y - x) - C // (x - C) - y -> (x - y) - C // (C - x) - y -> C - (x + y) This reverts commit b3529b5bf3ba2cd7f38665de16450afefb263c9b.	2023-04-19 11:15:14 +08:00
David Stuttard	4d54565436	[AMDGPU] Remove unnecessary assert Also remove the function attributes from the test. For PAL based shaders this isn't required. Differential Revision: https://reviews.llvm.org/D148625	2023-04-18 13:41:38 +01:00
pvanhout	ec82188451	[AMDGPU] Do not crash on agpr_hi16 in AMDGPUResourceUsageAnalysis Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D148438	2023-04-18 13:53:01 +02:00
Kriti Gupta	a3dfa4e083	[test] Remove occurences of br undef in CodeGen/AMDGPU tests Differential Revision: https://reviews.llvm.org/D148041	2023-04-18 08:47:29 +01:00
chenglin.bi	b3529b5bf3	[AMDGPU] Ressociate patterns with sub to use SALU Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D148463	2023-04-18 06:49:41 +08:00
Jay Foad	2be3189601	[AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+ On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly more powerful than the _SGPR form. It simplifies codegen if we always select the _SGPR_IMM form with an immediate offset of 0 instead of the _SGPR form. Note that this patch just makes minimal changes to the selection patterns to prove the concept. Further simplifications are possible to reduced the number of selection patterns. On GFX9 the _SGPR form of the Real instruction is still required for assembly/disassembly but on GFX10+ it can be removed completely. Differential Revision: https://reviews.llvm.org/D147334	2023-04-17 16:23:30 +01:00
pvanhout	ae77aceba5	[Analysis] Remove DA & LegacyDA UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs. See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538 - Remove LegacyDivergenceAnalysis.h/.cpp - Remove DivergenceAnalysis.h/.cpp + Unit tests - Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs - Remove/adjust references to the passes in the docs where applicable - Remove TTI hook associated with those passes. - Move tests to UniformityAnalysis folder. - Remove RUN lines for the DA, leave only the UA ones. - Some tests had to be adjusted/removed depending on how they used the legacy DAs. Reviewed By: foad, sameerds Differential Revision: https://reviews.llvm.org/D148116	2023-04-17 09:01:22 +02:00
Jay Foad	6b5067a81a	[AMDGPU] Don't assert that image intrinsics are supported Unsupported intrinsics should give a regular "cannot select" error. Differential Revision: https://reviews.llvm.org/D148147	2023-04-16 19:54:55 +01:00
pvanhout	b3b3cb2d2f	[AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason. Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147786	2023-04-14 15:41:26 +02:00
David Stuttard	fc83f1de5d	[AMDGPU] Add backend support for new PAL ELF Metadata 3.0 PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. Rather than using opaque registers which can change between different architectures and requires encoding the bitfield information in the backend, which may change between versions. This is the initial minimal implementation that enables the use of PAL Metadata 3.0. The change itself should be NFC for non-PAL, although the way RSRC2 register is handled has been changed slightly. The test is fairly minimal, but checks that the metadata format looks as expected and verifies a couple of special cases such as tgid_[xyz]_en handling and PsInputAddr/Ena which also change to explicit fields. Differential Revision: https://reviews.llvm.org/D147143	2023-04-14 09:57:13 +01:00
Diana Picus	b9ba05360e	[AMDGPU] Don't S_MOV_B32 into $scc The peephole optimizer tries to replace ``` %n:sgpr_32 = S_MOV_B32 x $scc = COPY %n ``` with a `S_MOV_B32` directly into `$scc`. This crashes because `S_MOV_B32` cannot take `$scc` as input. We currently generate code like this from GlobalISel when lowering a G_BRCOND with a constant condition. We should probably look into removing this kind of branch altogether, but until then we should at least not crash. This patch fixes the issue by making sure we don't apply the peephole optimization when trying to move into a physical register that doesn't belong to the correct register class. Differential Revision: https://reviews.llvm.org/D148117	2023-04-14 10:24:43 +02:00
Jay Foad	2d39f5b5cd	[AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis With architected SGPRs, workgroup IDs are passed into a compute shader in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead of failing an assertion. Differential Revision: https://reviews.llvm.org/D148239	2023-04-13 16:56:22 +01:00
pvanhout	fd1d60873f	[AMDGPU] Remove CC exception for Promote Alloca Limits Apparently it was used to work around some issue that has been fixed. Removing it helps with high scratch usage observed in some cases due to failed alloca promotion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D145586	2023-04-13 08:48:34 +02:00
Sebastian Neubauer	fee3980df5	[AMDGPU] Fix amdgpu_gfx tail-call test The inreg argument prevented the tail call optimization to kick in. Remove the inreg, so this test actually uses a tail call. Note that it now uses s[4:5] for the return address, which is invalid, because these registers are supposed to be callee-save. D147096 tried to fix that problem for the C calling convention. Differential Revision: https://reviews.llvm.org/D148119	2023-04-12 16:15:09 +02:00

1 2 3 4 5 ...

6337 Commits