llvm-project

Author	SHA1	Message	Date
Yeaseen	96c723374a	[llvm] Remove `br i1 undef` from some `llvm/test/CodeGen` tests (#128272 )	2025-02-23 09:23:33 +00:00
Matt Arsenault	ccad5e7744	AMDGPU: Respect amdgpu-no-agpr in functions and with calls (#128147 ) Remove the MIR scan to detect whether AGPRs are used or not, and the special case for callable functions. This behavior was confusing, and not overridable. The amdgpu-no-agpr attribute was intended to avoid this imprecise heuristic for how many AGPRs to allocate. It was also too confusing to make this interact with the pending amdgpu-num-agpr replacement for amdgpu-no-agpr. Also adds an xfail-ish test where the register allocator asserts after allocation fails which I ran into. Future work should reintroduce a more refined MIR scan to estimate AGPR pressure for how to split AGPRs and VGPRs.	2025-02-23 09:00:37 +07:00
Matt Arsenault	1bb43068f1	PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052 ) This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle composing subregister extracts through reg_sequence.	2025-02-22 09:16:14 +07:00
Brox Chen	61c6e0061c	[AMDGPU][True16][CodeGen] flat/global/scratch load/store pseudo for true16 (#127945 ) T16D16 table is implemented in https://github.com/llvm/llvm-project/pull/127673 this is a follow up patch to add load/store pseudo for: flat_store global_load/global_store scratch_load/scratch_store in true16 mode and updated the codegen test file	2025-02-21 17:06:48 -05:00
Brox Chen	c896f7bdaa	[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904 ) build_vector pattern in true16 SDAG	2025-02-21 14:02:12 -05:00
Matt Arsenault	0c50054820	Revert "RegAlloc: Fix verifier error after failed allocation (#119690 )" This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5. Different set of verifier errors appears after other regalloc failure tests with EXPENSIVE_CHECKS.	2025-02-22 00:23:21 +07:00
Matt Arsenault	34167f9966	RegAlloc: Fix verifier error after failed allocation (#119690 ) In some cases after reporting an allocation failure, this would fail the verifier. It picks the first allocatable register and assigns it, but didn't update the liveness appropriately. When VirtRegRewriter relied on the liveness to set kill flags, it would incorrectly add kill flags if there was another overlapping kill of the virtual register. We can't properly assign the register to an overlapping range, so break the liveness of the failing register (and any other interfering registers) instead. Give the virtual register dummy liveness by effectively deleting all the uses by setting them to undef. The edge case not tested here which I'm worried about is if the read of the register is a def of a subregister. I've been unable to come up with a test where this occurs. https://reviews.llvm.org/D122616	2025-02-21 22:11:51 +07:00
Akshat Oke	bd16a87d05	[AMDGPU][NewPM] Port SIPostRABundler to NPM (#123717 )	2025-02-21 16:05:58 +05:30
Matt Arsenault	cc46d00a86	AMDGPU: Form v2f16 minimum3/maximum3 on gfx950 (#128123 )	2025-02-21 12:11:51 +07:00
Matt Arsenault	e729dc759d	AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121 ) Unfortunately we only have the vector versions of v2f16 minimum3 and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y).	2025-02-21 12:08:49 +07:00
Pravin Jagtap	7c2ebe5dbb	AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464 ) The Src0 operand width higher that 32-bits of cvt_scale opcodes operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.	2025-02-21 07:27:25 +05:30
Akshat Oke	9855d761f3	[AMDGPU][NewPM] Port SIOptimizeExecMaskingPreRA to NPM (#125351 )	2025-02-20 17:35:56 +05:30
Diana Picus	611a648327	[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190 ) Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to explicitly preserve the inactive lanes of VGPRs of interest by adding them as dummy arguments. The code usually looks something like this: ``` define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) { entry: %c = call i1 @llvm.amdgcn.init.whole.wave() br i1 %c, label %shader, label %tail shader: [...] tail: %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader] [...] ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call ``` Unfortunately, this kind of phi node will get optimized away and the backend won't be able to figure out that it's ok to use the active lanes of `%inactive.vgpr*` inside `shader`. This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic, whose result can be used as a PHI operand instead of the poison. This will be selected to an IMPLICIT_DEF, which the backend can work with. At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values. Support for other types can be added later if needed.	2025-02-20 09:25:48 +01:00
Matt Arsenault	37c341df28	Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 )" This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed. This is not a sound approach to dealing with this instruction change. The new behavior is a different opcode pair, not a modifier on the existing opcode.	2025-02-20 10:19:14 +07:00
Changpeng Fang	36eaf0daf5	AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 ) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding _min_num_fXY/_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.	2025-02-19 11:16:43 -08:00
Brox Chen	210036a22e	[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#127240 ) Previous PR https://github.com/llvm/llvm-project/pull/122950 get reverted since it hit the buildbot failure. Another patch get merged when this PR is under review, and thus causing one test not up to date. repen this PR and fixed the issue.	2025-02-19 11:37:24 -05:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Shilei Tian	a44284c02f	[AMDGPU] Add `isAsCheapAsAMove` for `v_pk_mov_b32` (#127632 ) Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-02-19 00:51:57 -05:00
Shilei Tian	8187caf8e3	[NFC][AMDGPU] Pre-commit a test case of checking register coalescer on `v_pk_mov_b32` (#127715 ) This PR serves as a preliminary step, adding a test case for register coalescer on v_pk_mov_b32. It is intended to demonstrate the code changes introduced in an upcoming PR.	2025-02-18 23:42:51 -05:00
Matt Arsenault	22d65d8989	AMDGPU: Teach isOperandLegal about SALU literal restrictions (#127626 ) isOperandLegal mostly implemented the VALU operand rules, and largely ignored SALU restrictions. This theoretically avoids folding literals into SALU insts which already have a literal operand. This issue is currently avoided due to a bug in SIFoldOperands; this change will allow using raw operand legality rules. This breaks the formation of s_fmaak_f32 in SIFoldOperands, but it probably should not have been forming there in the first place. TwoAddressInsts or RA should generally handle that, and this only worked by accident.	2025-02-19 10:53:03 +07:00
Chaitanya	aed9f11965	[AMDGPU] Handle lowering addrspace casts from LDS to FLAT address in amdgpu-sw-lower-lds. (#121214 ) "infer-address-spaces" pass replaces all refinable generic pointers with equivalent specific pointers. At -O0 optimisation level, infer-address-spaces pass doesn't run in the pipeline. "amdgpu-sw-lower-lds" pass instruments memory operations on addrspace(3) ptrs. Since, extra addrspacecasts are present from lds to flat addrspaces at -O0 and the actual store/load memory instructions are now on flat addrspace, these addrspacecast need to be handled in the amdgpu-sw-lower-lds pass itself. This patch lowers the lds ptr first to the corresponding ptr in the global memory from the asan_malloc. Then replaces the original cast with addrspacecast from global ptr to flat ptr.	2025-02-19 08:50:23 +05:30
Brox Chen	7c24041895	[AMDGPU][True16][CodeGen] reopen "FLAT_load using D16 pseudo instruction" (#127673 ) Previous patch is merged https://github.com/llvm/llvm-project/pull/114500 and it hit a buildbot failure and thus reverted It seems the AMDGPU::OpName::OPERAND_LAST is removed at the meantime when previous patch is merged and that's causing the compile error. Fixed and reopen it here	2025-02-18 18:16:23 -05:00
Stanislav Mekhanoshin	8529bd7b96	[AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (#127142 )	2025-02-18 13:19:33 -08:00
Krzysztof Drewniak	f7d03707d1	[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828 ) Attempting to pass a `ptr addrspace(7)` to functions that take `ptr` arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7) to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP operations on buffer resources, which can't be GEP'd. (However, note that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr is legal - it's just an effective address computation) To resolve this problem, and thus prevent illegal `getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this commit extends amdgcn.make.buffer.rsrc to also be variadic in its result type, auto-upgrading old manglings. The logic for handling a make.buffer.rsrc in instruction selection remains untouched and expects the output type to be a ptr addrspace(8), as does the Clang lowering for its builtin (the pointer-to-pointer version might want a different name in clang). LowerBufferFatPointers has been updated to lower amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* . This'll also make exposing buffer fat pointers in Clang easier, since you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.	2025-02-18 14:15:28 -06:00
Nikita Popov	2cb5241c77	Revert "[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500 )" This reverts commit f7a5f067885b7f6cc4a000c8392adf6b777a9108. Fails to build with: llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:126:37: error: no member named 'OPERAND_LAST' in 'llvm::AMDGPU::OpName' 126 \| uint16_t OpName = AMDGPU::OpName::OPERAND_LAST;	2025-02-18 17:16:12 +01:00
Brox Chen	f7a5f06788	[AMDGPU][True16][CodeGen] FLAT_load using D16 pseudo instruction (#114500 ) Implement new pseudos with the suffix _t16 for FLAT_LOAD which have VGPR_16 as the load dst. Lower the pseudos to the existing real instructions with VGPR_32 src or dst (which makes them consistent with the hardware encoding). This patch reduces VGPR usage by making hi halves of VGPRs available for other values. There are more 8/16 bits ld/st instructions to be supported in the up-coming patches	2025-02-18 11:05:25 -05:00
Matt Arsenault	eb7c947272	AMDGPU: Correct legal literal operand logic for multiple uses (#127594 ) The same literal can be used multiple times in an instruction, not just once. We were not tracking the used value to verify this, so correct this. This helps avoid regressions in a future patch.	2025-02-18 19:58:42 +07:00
Matt Arsenault	cd10c01767	AMDGPU: Handle subregister uses in SIFoldOperands constant folding (#127485 )	2025-02-18 17:19:53 +07:00
Stanislav Mekhanoshin	bc4f05d8a8	[AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (#127129 ) It does not change the estimate because getInstSizeInBytes() already returns 0 for meta instructions, but added a test and early bail.	2025-02-18 02:08:28 -08:00
Matt Arsenault	c5def84ca4	AMDGPU: Handle brev and not cases in getConstValDefinedInReg (#127483 ) We should not encounter these cases in the peephole-opt use today, but get the common helper function to handle these.	2025-02-18 11:23:49 +07:00
Matt Arsenault	4dee305ce2	AMDGPU: Fix foldImmediate breaking register class constraints (#127481 ) This fixes a verifier error when folding an immediate materialized into an aligned vgpr class into a copy to an unaligned virtual register.	2025-02-18 10:34:48 +07:00
Matt Arsenault	fe1ef413ab	AMDGPU: Add more tests for peephole-opt immediate folding (#127480 )	2025-02-18 10:31:46 +07:00
Matt Arsenault	b5b8a59a53	AMDGPU: Implement getRequiredProperties for SIFoldOperands (#127522 ) Fix the broken MIR tests violating isSSA.	2025-02-18 08:22:45 +07:00
Matt Arsenault	ed38d6702f	PeepholeOpt: Handle subregister compose when looking through reg_sequence (#127051 ) Previously this would give up on folding subregister copies through a reg_sequence if the input operand already had a subregister index. d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these subregister uses, and this is the first step to lifting that restriction. I was expecting to be able to implement this only purely with compose / reverse compose, but I wasn't able to make it work so relies on testing the lanemasks for whether the copy reads a subset of the input.	2025-02-18 08:07:29 +07:00
Scott Linder	29ca3b8b28	[AMDGPU] Push amdgpu-preload-kern-arg-prolog after livedebugvalues (#126148 ) This is effectively a workaround for a bug in livedebugvalues, but seems to potentially be a general improvement, as BB sections seems like it could ruin the special 256-byte prelude scheme that amdgpu-preload-kern-arg-prolog requires anyway. Moving it even later doesn't seem to have any material impact, and just adds livedebugvalues to the list of things which no longer have to deal with pseudo multiple-entry functions. AMDGPU debug-info isn't supported upstream yet, so the bug being avoided isn't testable here. I am posting the patch upstream to avoid an unnecessary diff with AMD's fork.	2025-02-17 13:29:56 -05:00
Scott Linder	eaa460ca49	[AMDGPU] Remove dead function metadata after amdgpu-lower-kernel-arguments (#126147 ) The verifier ensures function !dbg metadata is unique across the module, so ensure the old nameless function we leave behind doesn't violate this invariant. Removing the function via e.g. eraseFromParent seems like a better option, but doesn't seem to be legal from a FunctionPass.	2025-02-17 13:27:23 -05:00
Shilei Tian	8aff59d3f4	[NFC][AMDGPU] Auto generate check lines for three test cases (#127352 ) - `CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll` - `CodeGen/AMDGPU/call-preserved-registers.ll` - `CodeGen/AMDGPU/stack-realign.ll` This is to make preparation for another PR.	2025-02-17 11:22:08 -05:00
Matt Arsenault	18ea6c9280	AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487 ) These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d0370872f28ec9965448f33db1b105addaf64ae.	2025-02-17 21:03:50 +07:00
Vikram Hegde	06a3abd9e8	[AMDGPU][NewPM] Port "SIFormMemoryClauses" to NPM (#127181 )	2025-02-17 11:07:17 +05:30
Yeaseen	6e94007623	[llvm] Remove `br i1 undef` in some `llvm/test/CodeGen` tests (#127368 ) This PR replaces some instances of `br i1 undef` with function argument value in several tests under `llvm/test/CodeGen/ `directory. This PR is a continuation of PR #125460	2025-02-16 18:44:46 +00:00
Jeffrey Byrnes	a1120c9b79	[AMDGPU] NFC: Fix some details for lit test (#127141 ) Addressed comments in https://github.com/llvm/llvm-project/pull/126976	2025-02-16 19:34:20 +07:00
Brox Chen	cf1165cb9c	Revert "[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#12… (#127175 ) Reverting this patch since it raise buildbot failure This reverts commit 2a7487cc2e0fb8bd91784e2d9636a65baa6d90ed.	2025-02-14 02:28:45 -05:00
Brox Chen	2a7487cc2e	[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (#122950 ) true16 codegen pattern for f16 fma. created a duplicated shrink-mad-fma-gfx10.mir from shrink-mad-fma to seperate pre-GFX11 and GFX11 mir test.	2025-02-14 02:16:00 -05:00
Scott Linder	0aafb8aca3	[AMDGPU] Add test for failure with function !dbg info in amdgpu-lower-kernel-arguments (#126146 )	2025-02-13 15:58:45 -05:00
LU-JOHN	5decab178f	AMDGPU: Reduce shl64 to shl32 if shift range is [63-32] (#125574 ) Reduce: DST = shl i64 X, Y where Y is in the range [63-32] to: DST = [0, shl i32 X, (Y & 32)] Alive2 analysis: https://alive2.llvm.org/ce/z/w_u5je --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-02-13 13:40:25 -06:00
Robert Imschweiler	41e49fadd4	[AMDGPU] Fix llvm.amdgcn.workitem.id-unsupported-calling-convention.ll (#127041 ) Follow-up fix for #126058. (@arsenm)	2025-02-13 22:23:47 +07:00
Robert Imschweiler	0da8d0f9b7	[AMDGPU] Change handling of unsupported non-compute shaders with HSA (#126798 ) Previous handling in `SITargetLowering::LowerFormalArguments` only reported a diagnostic message and continued execution by returning a non-usable `SDValue`. This results in llvm crashing later with an unrelated error. This commit changes the detection of an unsupported non-compute shader to be a fatal error right away. As an example situation, take the usage of an `amdgpu_ps` function and the `amdgcn-unknown-amdhsa` target triple. ``` define amdgpu_ps void @foo(ptr %p, i32 %i) { store i32 %i, ptr %p ret void } ``` Compiling this code (with `llc -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx942`, for example) fails with: ``` error: <unknown>:0:0: in function foo void (ptr, i32): unsupported non-compute shaders with HSA llc: [...]/git/trunk21.0/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11790: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed. [...] ```	2025-02-13 22:23:08 +07:00
Fabian Ritter	a33a84ee63	[AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test (#125711 ) [AMDGPU][NFC] Replace gfx940 and gfx941 with gfx942 in llvm/test gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR uses gfx942 instead of gfx940 and gfx941 in the test RUN-lines (unless there is already a RUN-line for gfx942). The only notable difference in the test output is that gfx942 does not force the use of sc0 and sc1 on stores while gfx940 and gfx941 do (cf. https://reviews.llvm.org/D149986). For SWDEV-512631	2025-02-13 15:17:12 +01:00
Matt Arsenault	eef0205345	AMDGPU: Add baseline test for treating v_pk_mov_b32 like reg_sequence (#125656 )	2025-02-13 18:12:09 +07:00
Jay Foad	0b0f3da6a8	[AMDGPU] Add a regression test for -mattr=dumpcode (#116982 )	2025-02-13 11:04:08 +00:00

1 2 3 4 5 ...

8310 Commits