llvm-project

Author	SHA1	Message	Date
Vikram Hegde	f1fa292cd6	[AMDGPU] Pre-commit tests for "lshr + mad" fold (#119509 )	2025-01-01 10:17:37 +05:30
Jay Foad	2d6d723a85	[AMDGPU] Add some more GFX12 test coverage (#120581 )	2024-12-23 09:42:52 +00:00
Chaitanya	21996bd69c	[AMDGPU] Remove amdgpu-no-heap-ptr and amdgpu-no-lds-kernel-id attributes from lowered kernels in amdgpu-sw-lower-lds pass (#120887 ) 'amdgpu-sw-lower-lds' pass internally calls '__asan_malloc_impl' for heap memory allocation. Pass also uses 'amdgcn_lds_kernel_id' for non-kernel lds accesses lowering. This patch removes 'amdgpu-no-heap-ptr' and 'amdgpu-no-lds-kernel-id' from all kernels lowered by the pass.	2024-12-23 12:42:31 +05:30
Aaditya	c7606710f9	[AMDGPU] Update base addr of dyn alloca considering GrowingUp stack (#119822 ) Currently, compiler calculates the base address of dynamic sized stack object (alloca) as follows: 1. `NewSP = Align(CurrSP + Size)` _where_ `Size = # of elements * wave size * alloca type` 2. `BaseAddr = NewSP` 3. The alignment is computed as: `AlignedAddr = Addr & ~(Alignment - 1)` 4. Return the `BaseAddr` This makes sense when stack is grows downwards. AMDGPU stack grows upwards, the base address needs to be aligned first and SP bump by required size later: 1. `BaseAddr = Align(CurrSP)` 2. `NewSP = BaseAddr + Size` 3. `AlignedAddr = (Addr + (Alignment - 1)) & ~(Alignment - 1)` 4. and returns the `BaseAddr`.	2024-12-20 10:27:27 +05:30
Brox Chen	08db696c87	[AMDGPU][True16][MC] V_MED3_I/U16_fake16 CodeGen pattern (#120600 ) In this patch https://github.com/llvm/llvm-project/pull/113603 replace `V_MED3_I/U16` to `V_MED3_I/U16_fake16` for Post-GFX11, but it miss to update the CodeGen pattern. This patch update and corrert the CodeGen pattern	2024-12-20 10:53:58 +07:00
Matt Arsenault	44201679c6	AMDGPU: Fix mishandling of search for constantexpr addrspacecasts (#120346 )	2024-12-20 07:37:19 +07:00
Konstantina Mitropoulou	d3508ccd15	[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588 ) - [AMDGPU] Add new test. - [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2024-12-19 11:20:43 -08:00
Jun Wang	d57230c72e	[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485 ) In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g., v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be allowed.	2024-12-18 10:50:47 -08:00
Brox Chen	c6f753b9a0	[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630 ) Support true16 format for v_pack_b32_f16 in MC. Since we are replacing v_alignbit_b32 to `v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test passing. There is no pattern modified/created, but just replacing the `v_pack_b32_f16` with fake16 format. Some of the true16 CodeGen test are impacted since `v_pack_b32_f16` selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done is the following patch.	2024-12-18 13:28:42 -05:00
Aaditya	0446990cc7	Reapply "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120410 ) This reapplies commit https://github.com/llvm/llvm-project/pull/120063. A machine-verifier bug was causing a crash in the previous commit. This has been addressed in https://github.com/llvm/llvm-project/pull/120393.	2024-12-18 18:20:45 +05:30
Sergei Barannikov	1941f34172	[TableGen][GISel] Import more "multi-level" patterns (#120332 ) Previously, if the destination DAG has an untyped leaf, we would import the pattern only if that leaf is defined by the top-level source DAG. This is an unnecessary restriction. Here is an example of such pattern: ``` def : Pat<(add (mul v8i16:$vA, v8i16:$vB), v8i16:$vC), (VMLADDUHM $vA, $vB, $vC)>; ``` Previously, it failed to import because `add` doesn't define neither `$vA` nor `$vB`. This change reduces the number of skipped patterns as follows: ``` AArch64: 8695 -> 8548 (-147) AMDGPU: 11333 -> 11240 (-93) ARM: 4297 -> 4278 (-1) PowerPC: 3955 -> 3010 (-945) ``` Other GISel-enabled targets are unaffected.	2024-12-18 14:44:55 +03:00
Aaditya	414c462a83	[AMDGPU] Modify Dyn Alloca test to account for Machine-Verifier bug (#120393 ) Machine-Verifier crashes in kernel functions, but fails gracefully in device functions. This is due to the buffer resource descriptor selected during G-ISEL, before the fallback path. Device functions use `$sgpr0_sgpr1_sgpr2_sgpr3`. while Kernel functions select `$private_rsrc_reg` where machine-verifier complains: `$private_rsrc_reg is not a SReg_128 register.` Modifying test case to capture both behaviors, this is related to https://github.com/llvm/llvm-project/pull/120063	2024-12-18 16:08:17 +05:30
Aaditya	d6e8ab1fa6	Revert "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120369 ) Reverts llvm/llvm-project#120063 due to build-bot failures	2024-12-18 14:06:49 +07:00
Aaditya	99c2e3b782	[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas (#120063 ) For #119822	2024-12-18 12:14:37 +05:30
Ruiling, Song	67c55b1ffc	[AMDGPU] Make max dwords of memory cluster configurable (#119342 ) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.	2024-12-18 14:17:27 +08:00
Mirko Brkušanin	f7988a338d	[AMDGPU][SIPreEmitPeephole] Fix mustRetainExeczBranch (#120121 ) Do not remove S_CBRANCH_EXECZ if one of the following blocks contains an unconditional branch to a block other than the one immediately following it. This can cause unwanted behavior like infinite loops.	2024-12-17 11:47:38 +01:00
Matt Arsenault	3508d8f6dd	RegAllocFast: Avoid using temporary DiagnosticInfo (#120184 ) This reverts commit 1297933f35b4948b4d281259627a72094c407a75.	2024-12-17 16:19:26 +07:00
Matt Arsenault	8387cbd0f9	AMDGPU: Delete spills of undef values (#119684 ) AMDGPU: Delete spills of undef values It would be a bit more logical to preserve the undef and do the normal expansion, but this is less work. This avoids verifier errors in a future patch which starts deleting liveness from registers after allocation failures which results in spills of undef values. https://reviews.llvm.org/D122607 Move where undef sgpr spills are deleted	2024-12-17 13:08:38 +07:00
Thurston Dang	1297933f35	[CodeGen] Disable ran-out-of-registers-error* tests (#120142 ) Two tests are failing on the buildbot in stage2/asan with a stack use-after-scope: https://lab.llvm.org/buildbot/#/builders/52/builds/4533 (first failure here; contains https://github.com/llvm/llvm-project/pull/119492 and https://github.com/llvm/llvm-project/pull/119640) ... https://lab.llvm.org/buildbot/#/builders/52/builds/4550 This patch disables the tests for now, to allow the bots to return to green (instead of reverting the patch series).	2024-12-16 12:39:03 -08:00
Matt Arsenault	d866005f69	AMDGPU: Do not assert on unhandled types when demangling libcalls (#120068 )	2024-12-16 20:27:06 +07:00
Juan Manuel Martinez Caamaño	ace87ec04c	[AMDGPU][AMDGPURegBankInfo] Map S_BUFFER_LOAD_XXX to its corresponding BUFFER_LOAD_XXX (#117574 ) In one test code generation diverged between GISEL and DAG For example, this intrinsic > %ld = call i8 @llvm.amdgcn.s.buffer.load.u8(<4 x i32> %src, i32 %offset, i32 0) would be lowered into these two cases: * `buffer_load_u8 v2, v2, s[0:3], null offen` * `buffer_load_b32 v2, v2, s[0:3], null offen` This patch fixes this issue.	2024-12-16 10:24:33 +01:00
Matt Arsenault	1100d6a995	AMDGPU: Fix libcall recognition of image array types (#119832 ) Add tests with get_image_width as a sample for all of the non-extension image types. The transform doesn't do anything, but this runs through all the mangled libfunc parsing and shows it does not crash. It would probably be smarter to check for exact match of the types, rather than checking the prefix.	2024-12-16 15:04:53 +09:00
Matt Arsenault	b446c208a5	AMDGPU: Verify function type matches when matching libcalls (#119043 ) Previously this would recognize a call to a mangled ldexp(float, float) as a candidate to replace with the intrinsic. We need to verify the second parameter is in fact an integer. Fixes: SWDEV-501389	2024-12-16 15:01:48 +09:00
David Green	9ba7e2da00	[GlobalISel] Use replaceRegOrBuildCopy when legalizer-combining s/zext(undef) (#119850 ) Similar to #119721, this helps remove some of the COPYs created by the CSE builder.	2024-12-16 05:57:11 +00:00
Matt Arsenault	818bffcb1c	RegAlloc: Fix failure on undef use when all registers are reserved (#119647 ) Greedy and fast would hit different assertions on undef uses if all registers in a class were reserved.	2024-12-16 10:56:45 +09:00
Matt Arsenault	61f99a1c75	RegAlloc: Do not fatal error if there are no registers in the alloc order (#119640 ) Try to use DiagnosticInfo if every register in the class is reserved by forcing assignment to a reserved register. Also reduces the number of redundant errors emitted, particularly with fast. This is still broken in the case of undef uses. There are additional complications in greedy and fast, so leave it for a separate fix.	2024-12-16 10:52:49 +09:00
Matt Arsenault	bb18e49edb	RegAlloc: Use DiagnosticInfo to report register allocation failures (#119492 ) Improve the non-fatal cases to use DiagnosticInfo, which will now provide a location. The allocators attempt to report different errors if it happens to see inline assembly is involved (this detection is quite unreliable) using srcloc instead of dbgloc. For now, leave this behavior unchanged. I think reporting the full location and context function would be more useful.	2024-12-16 10:49:08 +09:00
Fangrui Song	9afaf9c6c8	[AMDGPU,test] Change llc -march= to -mtriple= Follow-up to 806761a7629df268c8aed49657aeccffa6bca449	2024-12-15 10:54:21 -08:00
Kirill Stoimenov	e821f642fd	Revert "[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687 )" Causes bot failure: https://lab.llvm.org/buildbot/#/builders/55/builds/4246/steps/11/logs/stdio This reverts commit 7a648554f886fbc043c4f3f58ca88f6c4535f2cf.	2024-12-14 03:47:53 +00:00
Akshat Oke	7a648554f8	[AMDGPU][CodeGen] Do not backtrace invalid -regalloc param (#119687 ) No need to generate a stack trace and a GitHub issue prompt on a wrongly set regalloc option.	2024-12-13 11:58:53 +05:30
Pedro Lobo	05137cc507	[AsmParser] Convert empty arrays to `poison` (#119754 ) Empty arrays can be converted to `poison` instead of `undef`.	2024-12-12 22:44:10 +01:00
choikwa	463e93b95f	Reapply [AMDGPU] prevent shrinking udiv/urem if either operand exceeds signed max (#119325 ) This reverts commit 254d206ee2a337cb38ba347c896f7c6a14c7f218. +Added a fix in ExpandDivRem24 to disqualify if DivNumBits exceed 24. Original commit & msg: ce6e955ac374f2b86cbbb73b2f32174dffd85f25. Handle signed and unsigned path differently in getDivNumBits. Using computeKnownBits, this rejects shrinking unsigned div/rem if operands exceed signed max since we know NumSignBits will be always 0.	2024-12-12 15:24:34 -05:00
Craig Topper	7ece560a50	[GISel] Support narrowing G_ICMP with more than 2 parts. (#119335 ) This allows us to support i128 G_ICMP on RV32. I'm not sure how to test the "left over" part of this as RISC-V always widens to a power of 2 before narrowing.	2024-12-12 09:50:26 -08:00
Pravin Jagtap	bdaa82a7bb	[AMDGPU] Mark AGPR tuple implicit in the first instr of AGPR spills. (#115285 ) When AGPRs are spilled to stack through VGPRs, the pei only marks the AGPR tuple as implicit-def. To preserve the liveness, it should also mark the tuple implicit. Fixes: SWDEV-462189	2024-12-12 19:47:17 +05:30
Matt Arsenault	ea632e1b34	Reapply "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575 ) (#119634 ) This reverts commit 40986feda8b1437ed475b144d5b9a208b008782a. Reapply with fix to prevent temporary Twine from going out of scope.	2024-12-11 16:01:48 -08:00
Shilei Tian	f4037277bb	[AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (#114438 )	2024-12-11 16:50:06 -05:00
Shilei Tian	7dbd6cd294	[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357 ) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.	2024-12-11 16:47:51 -05:00
Vitaly Buka	40986feda8	Revert "DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm" (#119575 ) Reverts llvm/llvm-project#119485 Breaks builders, details in llvm/llvm-project#119485	2024-12-11 07:51:36 -08:00
Pravin Jagtap	5e007afa9d	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589 ) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table.	2024-12-11 18:38:10 +05:30
Matt Arsenault	884f2ad6f9	DiagnosticInfo: Clean up usage of DiagnosticInfoInlineAsm (#119485 ) Currently LLVMContext::emitError emits any error as an "inline asm" error which does not make any sense. InlineAsm appears to be special, in that it uses a "LocCookie" from srcloc metadata, which looks like a parallel mechanism to ordinary source line locations. This meant that other types of failures had degraded source information reported when available. Introduce some new generic error types, and only use inline asm in the appropriate contexts. The DiagnosticInfo types are still a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase exists instead of just having an optional DiagnosticLocation in the base class. DK_Generic is for any error that derives from an IR level instruction, and thus can pull debug locations directly from it. DK_GenericWithLoc is functionally the generic codegen error, since it does not depend on the IR and instead can construct a DiagnosticLocation from the MI debug location.	2024-12-11 17:16:07 +09:00
Shilei Tian	15f87bc10c	[NFC][AMDGPU] Auto generate check lines for `llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll`	2024-12-10 12:40:28 -05:00
Dan Gohman	e665e781dc	[SelectionDAG] Use the nuw flag when expanding loads. (#119288 ) When expanding a load into two loads, use nuw for the add that computes the offset from the base of the second load, because the original load doesn't straddle the address space. It turns out there's already a dedicated helper function for doing this, `getObjectPtrOffset`. This is in target-independent code, however in practice it only seems to affact WebAssembly code, because WebAssembly load and store instructions' constant offsets don't perform wrapping, so constant folding often depends on the nuw flag being present. This was noticed in the development of #119204.	2024-12-10 06:28:09 -08:00
Piotr Sobczak	a2d086af2c	[AMDGPU] Fix FMA combine (#119217 ) Update the check in the FMA combine to check dot10-insts instead of dot7-insts. The target of the combine, v_dot2_f32_f16, is available only if dot10-insts target feature is enabled. The issue probably dates back to the change that split out dot10-insts out of dot7-insts. As far as I can see, this does not affect any current targets, but if a future target has dot7-insts, but not dot10-insts that would cause a crash ("cannot select") for the input ir in the test.	2024-12-10 10:11:19 +01:00
Jun Wang	41ed16c3b3	Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" (#118907 ) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.	2024-12-09 16:44:48 -08:00
Brox Chen	85142f5b35	[AMDGPU][True16][CodeGen] support for true16 for vinterp 16bit instructions (#116702 ) vinterp 16bit instructions codeGen support in True16 format Currently only enable two tests, will enable more when more true16 instructions are supported	2024-12-09 11:52:05 -05:00
Matt Arsenault	009368f130	AMDGPU: Mark grid size loads with range metadata (#113019 ) Only handles the v5 case.	2024-12-09 11:01:55 -05:00
Matt Arsenault	664a226bf6	AMDGPU: Propagate amdgpu-max-num-workgroups attribute (#113018 ) I'm not sure what the interpretation of 0 is supposed to be, AMDGPUUsage doesn't say.	2024-12-09 09:57:27 -06:00
Joseph Huber	254d206ee2	Revert "Reapply "[AMDGPU] prevent shrinking udiv/urem if either operand is in… (#118928 )" This reverts commit 509893b58ff444a6f080946bd368e9bde7668f13. This broke the libc build again https://lab.llvm.org/buildbot/#/builders/73/builds/9787.	2024-12-09 08:10:49 -06:00
David Green	9a415f6d6b	[GlobalISel] Fold ptrtoint(undef) and inttoptr(undef) to undef. (#119073 ) This helps with shuffles a little, and one of the amd gpu tests is now equivalent to the SDAG version.	2024-12-09 08:52:22 +00:00
Vikash Gupta	0b0d9a3bee	[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp (#111109 ) The materialization cost of 32-bit non-inline in case of fmul is quite relatively more, rather than if possible to combine it into ldexp instruction for specific scenarios (for datatypes like f64, f32 and f16) as this is being handled here : The dag combine for any pair of select values which are exact exponent of 2. ``` fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. ``` This dagCombine is handled separately in fmulCombine (newly defined in SIIselLowering), targeting fmul fusing it with select type operand into ldexp. Thus, it fixes #104900.	2024-12-09 12:52:04 +05:30

1 2 3 4 5 ...

8103 Commits