llvm-project

Author	SHA1	Message	Date
Matt Arsenault	0db4393762	AMDGPU: Add baseline tests for f64 rsq pattern handling (#172052 )	2025-12-19 10:12:25 +01:00
vangthao95	031e9c989e	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FPTRUNC (#171723 )	2025-12-18 13:16:46 -08:00
vangthao95	55089733b6	[AMDGPU][GlobalISel] Add readanylane combines for merge-like instruct… (#172546 ) …ions When a merge-like instruction has all readanylane sources and the result is copied to VGPRs, eliminate the readanylanes by either using the original unmerge source directly or building a new merge with the VGPR sources.	2025-12-18 08:04:06 -08:00
macurtis-amd	e741cd88a1	AMDGPU/PromoteAlloca: Fix handling of users of multiple allocas (#172771 ) With recent refactoring, LDS promotion worklists for all allocas are populated upfront. In some cases, this results in a User in multiple lists. Then as each list is processed, a User might get deleted via removeFromParent, potentially leaving a dangling pointer in a subsequent worklist. Currently this only occurs for memcpy and memmove. Prior to refactoring, these were handled by DeferredInstr, and were processed after the last use of the then singular worklist. This change moves processing of DeferredInstr to after all worklists have be processed.	2025-12-18 08:41:21 -06:00
Frederik Harwath	5c05824d2b	[CodeGen] Rename expand-fp to expand-ir-insts (#172681 ) The pass now contains a non-fp expansion and should be used for any similar expansions regardless of the types involved. Hence a generic name seems apt. Rename the source files, pass, and adjust the pass description. Move all tests for the expansions that have previously been merged into the pass to a single directory.	2025-12-18 11:15:04 +00:00
Matt Arsenault	d6f159dd05	AMDGPU: Add pattern for copysign of 0 (#172699 ) Avoiding v_bfi_b32 is desirable since on gfx9 it requires materializing the constant. Similar could be done for infinity, with or 0x7fffffff	2025-12-18 11:34:24 +01:00
Frederik Harwath	71760f324f	[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680 ) Both passes expand instructions at the IR level. They use the same kind of instruction visitation logic and contain significant code duplication e.g. for scalarization.	2025-12-18 09:22:47 +01:00
Matt Arsenault	399b33086f	AMDGPU: Add baseline tests for fcopysign with 0 magnitude (#172698 )	2025-12-17 20:22:52 +01:00
Pankaj Dwivedi	28d4e33b65	[AMDGPU][SIInsertWaitCnt] Optimize loadcnt insertion at function boundaries (#169647 ) On GFX12+, GLOBAL_INV increments the loadcnt counter but does not write results to any VGPRs. Previously, we unconditionally inserted s_wait_loadcnt 0 at function returns even when the only pending loadcnt was from GLOBAL_INV instructions. This patch optimizes waitcnt insertion by skipping the loadcnt wait at function boundaries when no VGPRs have pending loads. This is determined by checking if any VGPR has a score greater than the lower bound for LOAD_CNT - if not, the pending loadcnt must be from non-VGPR-writing instructions like GLOBAL_INV. The optimization is limited to GFX12+ targets where GLOBAL_INV exists and uses the extended wait count instructions. This is a follow-up optimization to PR #135340 which added tracking for GLOBAL_INV in the waitcnt pass.	2025-12-17 17:53:00 +05:30
Matt Arsenault	68aea8e202	AMDGPU: Avoid introducing unnecessary fabs in fast fdiv lowering (#172553 ) If the sign bit of the denominator is known 0, do not emit the fabs. Also, extend this to handle min/max with fabs inputs. I originally tried to do this as the general combine on fabs, but it proved to be too much trouble at this time. This is mostly complexity introduced by expanding the various min/maxes into canonicalizes, and then not being able to assume the sign bit of canonicalize (fabs x) without nnan. This defends against future code size regressions in the atan2 and atan2pi library functions.	2025-12-17 00:22:12 +01:00
Matt Arsenault	b971b510d6	AMDGPU: Add baseline test for redundant fabs on fdiv expansion (#172552 )	2025-12-16 23:26:55 +01:00
Matt Arsenault	eb1876c960	DAG: Fix arith_fence handling in SignBitIsZeroFP (#172537 )	2025-12-16 20:10:38 +00:00
Frederik Harwath	51cdebf339	[AMDGPU] SIOptimizeExecMaskingPreRA: Fix crash on exec copy fold into INLINEASM (#172481 ) The optimization crashed attempting to fix a fold of a COPY $exec instruction into a use in an INLINEASM instruction because it attempts to call isOperandLegal which crashes since the index is out of the MCInstrDesc's operands array bounds. Change SIOptimizeExecMaskingPreRA to skip the optimization if the operand index is out of bounds. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-16 17:55:05 +01:00
vangthao95	a3b3c027bb	[AMDGPU][NFC] Pre-commit tests for readanylane combines (#172398 )	2025-12-16 08:25:29 -08:00
Dark Steve	7d381f2a56	[AMDGPU] Schedule independent instructions between s_barrier_signal and s_barrier_wait (#172057 ) On gfx12+, the unified` s_barrier` is lowered to split `s_barrier_signal/s_barrier_wait` pairs. By default, the dependency edge between signal and wait has zero latency, causing the scheduler to emit them adjacent to each other. This misses the opportunity to hide barrier latency. This patch adds synthetic latency to the signal-wait barrier edge to encourage latency hiding. Independent instructions are scheduled in the gap between split barrier signal and wait. The latency is tunable via -amdgpu-barrier-signal-wait-latency. Fixes: SWDEV-567090	2025-12-16 11:48:50 +05:30
vangthao95	e9b1f56d35	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_BITREVERSE (#172101 )	2025-12-15 17:45:33 -08:00
michaelselehov	3645cef1ef	[AMDGPU] LiveRegOptimizer: consider i8/i16 binops on SDWA (#155800 ) PHI-node part was merged with PR#160909. Extend `isOpLegal` to treat 8/16-bit vector add/sub/and/or/xor as profitable on SDWA targets (stores and intrinsics remain profitable). This repacks loop-carried values to i32 across BBs and restores SDWA lowering instead of scattered lshr/lshl/or sequences. Testing: - Local: `check-llvm-codegen-amdgpu` is green (4314/4320 passed, 6 XFAIL). - Additional: validated in AMD internal CI	2025-12-15 12:04:33 -05:00
Petar Avramovic	f024026a21	AMDGPU/GlobalISel: Regbanklegalize for G_CONCAT_VECTORS (#171471 ) RegBankLegalize using trivial mapping helper, assigns same reg bank to all operands, vgpr or sgpr. Uncovers multiple codegen and regbank combiner regressions related to looking through sgpr to vgpr copies. Skip regbankselect-concat-vector.mir since agprs are not yet supported.	2025-12-15 10:37:40 +01:00
Juan Manuel Martinez Caamaño	c13bf9eb26	Reapply "[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323 ) (#171838 ) A buildbot failed for the original patch. https://github.com/llvm/llvm-project/pull/171835 addresses the issue raised by the buildbot. After the fix is merged, the original patch is reapplied without any change.	2025-12-15 09:05:00 +01:00
Craig Topper	0cdc1b6dd4	[SelectionDAG] Support integer types with multiple registers in ComputePHILiveOutRegInfo. (#172081 ) PHIs that are larger than a legal integer type are split into multiple virtual registers that are numbered sequentially. We can propagate the known bits for each of these registers individually. Big endian is not supported yet because the register order needs to be reversed. Fixes #171671	2025-12-13 13:24:41 -08:00
Jeffrey Byrnes	e45241a4fe	[AMDGPU] Hoist s_set_vgpr_msb past SALU program state instructions (#172108 ) Hoisting past the program state instructions is legal and allows for better coissue.	2025-12-12 18:04:20 -08:00
Syadus Sefat	f3c16454b4	[Reland][AMDGPU][GlobalISel] Add register bank legalization for buffer_load byte and short (#172065 ) This patch adds register bank legalization support for buffer load byte and short operations in the AMDGPU GlobalISel pipeline. This is a re-land of #167798. I have fixed the failing test /CodeGen/AMDGPU/GlobalISel/buffer-load-byte-short.ll	2025-12-12 14:47:08 -06:00
Aiden Grossman	b8816a4e83	Revert "[AMDGPU][GlobalISel] Add register bank legalization for buffer_load byte and short (#167798 )" This reverts commit 4dbd16bb62ca18b0c588e2f387ac5cc94a782efb. This was causing buildbot failures, including on premerge when running check-llvm. https://lab.llvm.org/buildbot/#/builders/185/builds/30323	2025-12-12 17:35:25 +00:00
Matt Arsenault	2af693bbec	AMDGPU: Fix selection failure on bf16 inverse sqrt (#172044 ) On !hasBF16TransInsts targets, an illegal rsq would form and fail to select.	2025-12-12 18:10:08 +01:00
Syadus Sefat	4dbd16bb62	[AMDGPU][GlobalISel] Add register bank legalization for buffer_load byte and short (#167798 ) This patch adds register bank legalization support for buffer load byte and short operations in the AMDGPU GlobalISel pipeline.	2025-12-12 10:35:15 -06:00
Juan Manuel Martinez Caamaño	55c0e2e20f	[AMDGPU] Add missing cases for V_INDIRECT_REG_{READ/WRITE}_GPR_IDX and V/S_INDIRECT_REG_WRITE_MOVREL (#171835 ) A buildbot failure in https://github.com/llvm/llvm-project/pull/170323 when expensive checks were used highlighted that some of these patterns were missing. This patch adds `V_INDIRECT_REG_{READ/WRITE}_GPR_IDX` and `V/S_INDIRECT_REG_WRITE_MOVREL` for `V6` and `V7` vector sizes.	2025-12-12 15:45:34 +00:00
Pierre van Houtryve	025d0c0d1d	(reland) [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077 ) (#171779 ) Fixed a crash in Blender due to some weird control flow. The issue was with the "merge" function which was only looking at the keys of the "Other" VMem/SGPR maps. It needs to look at the keys of both maps and merge them. Original commit message below ---- The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.	2025-12-12 09:41:04 +01:00
Nicolai Hähnle	e760d0619f	AMDGPU/PromoteAlloca: Refactor into analysis / commit phases (#170512 ) This change is motivated by the overall goal of finding alternative ways to promote allocas to VGPRs. The current solution is effectively limited to allocas whose size matches a register class, and we can't keep adding more register classes. We have some downstream work in this direction, and I'm currently looking at cleaning that up to bring it upstream. This refactor paves the way to adding a third way of promoting allocas, on top of the existing alloca-to-vector and alloca-to-LDS. Much of the analysis can be shared between the different promotion techniques. Additionally, the idea behind splitting the pass into an analysis phase and a commit phase is that it ought to allow us to more easily make better "big picture" decision about which allocas to promote how in the future.	2025-12-12 01:24:38 +00:00
vangthao95	854ef8df06	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FSUB (#171244 )	2025-12-11 11:55:28 -08:00
Brox Chen	16c0893f04	[AMDGPU][True16] remove pack32 pattern from true16 mode (#171756 ) Remove pack32 so that isel use reg_sequence in true16 mode for build_vector. This generates better code	2025-12-11 09:49:27 -05:00
Stephen Thomas	7c328d8a0a	[AMDGPU][GCNHazardRecognizer] Remove instances of hardcoded S_WAITCNT_DEPCTR operand values (#171811 ) Two S_WAITCNT_DEPCTR instructions are constructed with hardcoded operand values. Replace these with appropriate calls to AMDGPU::DepCtr::encodeFieldVmVsrc(). NFC, except that the original code was setting reserved operand bits that should-be-zero, and this is now corrected.	2025-12-11 13:26:54 +00:00
Juan Manuel Martinez Caamaño	c02978867e	Revert "[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323 ) (#171787 ) ``` Step 7 (test-check-all) failure: Test just built components: check-all completed (failure) ****************** TEST 'LLVM :: CodeGen/AMDGPU/insert_vector_dynelt.ll' FAILED ****************** Exit Code: 1 Command Output (stdout): -- # RUN: at line 2 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll \| /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # RUN: at line 3 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll \| /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck --check-prefixes=GCN-O0 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji # .---command stderr------------ # \| # \| # After Instruction Selection # \| # Machine code for function insert_dyn_i32_6: IsSSA, TracksLiveness # \| Function Live Ins: $sgpr16 in %8, $sgpr17 in %9, $sgpr18 in %10, $sgpr19 in %11, $sgpr20 in %12, $sgpr21 in %13, $vgpr0 in %14, $vgpr1 in %15 # \| # \| bb.0 (%ir-block.0): # \| successors: %bb.1(0x80000000); %bb.1(100.00%) # \| liveins: $sgpr16, $sgpr17, $sgpr18, $sgpr19, $sgpr20, $sgpr21, $vgpr0, $vgpr1 # \| %15:vgpr_32 = COPY $vgpr1 # \| %14:vgpr_32 = COPY $vgpr0 # \| %13:sgpr_32 = COPY $sgpr21 # \| %12:sgpr_32 = COPY $sgpr20 # \| %11:sgpr_32 = COPY $sgpr19 # \| %10:sgpr_32 = COPY $sgpr18 # \| %9:sgpr_32 = COPY $sgpr17 # \| %8:sgpr_32 = COPY $sgpr16 # \| %17:sgpr_192 = REG_SEQUENCE %8:sgpr_32, %subreg.sub0, %9:sgpr_32, %subreg.sub1, %10:sgpr_32, %subreg.sub2, %11:sgpr_32, %subreg.sub3, %12:sgpr_32, %subreg.sub4, %13:sgpr_32, %subreg.sub5 # \| %16:sgpr_192 = COPY %17:sgpr_192 # \| %19:vreg_192 = COPY %17:sgpr_192 # \| %28:sreg_64_xexec = IMPLICIT_DEF # \| %27:sreg_64_xexec = S_MOV_B64 $exec # \| # \| bb.1: # \| ; predecessors: %bb.1, %bb.0 # \| successors: %bb.1(0x40000000), %bb.3(0x40000000); %bb.1(50.00%), %bb.3(50.00%) # \| # \| %26:vreg_192 = PHI %19:vreg_192, %bb.0, %18:vreg_192, %bb.1 # \| %29:sreg_64 = PHI %28:sreg_64_xexec, %bb.0, %30:sreg_64, %bb.1 # \| %31:sreg_32_xm0 = V_READFIRSTLANE_B32 %14:vgpr_32, implicit $exec # \| %32:sreg_64 = V_CMP_EQ_U32_e64 %31:sreg_32_xm0, %14:vgpr_32, implicit $exec # \| %30:sreg_64 = S_AND_SAVEEXEC_B64 killed %32:sreg_64, implicit-def $exec, implicit-def $scc, implicit $exec # \| $m0 = COPY killed %31:sreg_32_xm0 # \| %18:vreg_192 = V_INDIRECT_REG_WRITE_MOVREL_B32_V8 %26:vreg_192(tied-def 0), %15:vgpr_32, 3, implicit $m0, implicit $exec # \| $exec = S_XOR_B64_term $exec, %30:sreg_64, implicit-def $scc # \| S_CBRANCH_EXECNZ %bb.1, implicit $exec # \| # \| bb.3: ``` This reverts commit 15df9e701f1f1194a25e6123612cc735ad392ae4.	2025-12-11 10:08:20 +00:00
Juan Manuel Martinez Caamaño	15df9e701f	[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323 ) Before this patch, `insertelement/extractelement` with dynamic indices would fail to select with `-O0` for vector 32-bit element types with sizes 3, 5, 6 and 7, which did not map to a `SI_INDIRECT_SRC/DST` pattern. Other "weird" sizes bigger than 8 (like 13) are properly handled already. To solve this issue we add the missing patterns for the problematic sizes. Solves SWDEV-568862	2025-12-11 09:17:43 +01:00
Jay Foad	6ae0b9f586	[AMDGPU] Implement codegen for GFX11+ V_CVT_PK_[IU]16_F32 (#168719 )	2025-12-10 22:26:59 +00:00
vangthao95	d162afa912	[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FPEXT (#171483 )	2025-12-10 08:58:27 -08:00
Nikita Popov	5a24dfa339	[SDAG] Remove most non-canonical libcall handing (#171288 ) This is a followup to https://github.com/llvm/llvm-project/pull/171114, removing the handling for most libcalls that are already canonicalized to intrinsics in the middle-end. The only remaining one is fabs, which has more test coverage than the others.	2025-12-10 11:45:26 +01:00
Diana Picus	578a26ada2	[AMDGPU] Relax restrictions on amdgcn.cs.chain intrinsic (#169785 ) We have a new use-case for chain functions, so slightly relax the restriction on which calling conventions may contain calls to chain functions.	2025-12-10 11:12:46 +01:00
Mirko Brkušanin	5759a3a779	[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501 )	2025-12-10 09:45:13 +01:00
Vikram Hegde	aebab0578b	[NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (#168832 ) RegUsageInfoCollectorPass requires PhysicalRegisterUsageAnalysis to be valid. this change is required since its a module analysis.	2025-12-10 11:15:37 +05:30
anjenner	27651133e2	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2025-12-09 23:13:33 +00:00
Anshil Gandhi	5052b6ce1d	[AMDGPU] Scavenge a VGPR to eliminate a frame index (#166979 ) If the subtarget supports flat scratch SVS mode and there is no SGPR available to replace a frame index, convert a scratch instruction in SS form into SV form and replace the frame index with a scavenged VGPR. Resolves #155902 Co-authored-by: Matt Arsenault <matthew.arsenault@amd.com>	2025-12-09 13:59:36 -05:00
pvanhout	4572f4f5b1	Revert "[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077 )" Fails on https://lab.llvm.org/buildbot/#/builders/123/builds/31922 This reverts commit bf9344099c63549b2f19f8ede29f883669b0baca.	2025-12-09 14:48:19 +01:00
Pierre van Houtryve	bf9344099c	[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077 ) The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.	2025-12-09 13:51:19 +01:00
Guy David	29611f4cbe	[DAGCombiner] Relax nsz constraint for FP optimizations (#165011 ) Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled. However, this flag is not a hard requirement when all of the users of the combined value are either guaranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.). The optimizations affected: - fadd x, +0.0 -> x - fsub x, -0.0 -> x - fsub +0.0, x -> fneg x - fdiv(x, sqrt(x)) -> sqrt(x) - frem lowering with power-of-2 divisors	2025-12-09 12:07:46 +02:00
Vikram Hegde	c590b35f0f	[AMDGPU][NPM] Enable SIModeRegister and SIInsertHardclauses passes (#168831 ) Passes already ported.	2025-12-09 14:01:15 +05:30
Matt Arsenault	786498b281	AMDGPU: Fix truncstore from v6f32 to v6f16 (#171212 ) The v6bf16 cases work, but that's likely because v6bf16 isn't currently an MVT. Fixes: SWDEV-570985	2025-12-08 22:46:36 +00:00
Fei Peng	f803e463f9	Reland "Redesign Straight-Line Strength Reduction (SLSR) (#162930 )" (#169614 ) This PR implements parts of https://github.com/llvm/llvm-project/issues/162376 - Broader equivalence than constant index deltas: - Add Base-delta and Stride-delta matching for Add and GEP forms using ScalarEvolution deltas. - Reuse enabled for both constant and variable deltas when an available IR value dominates the user. - Dominance-aware dictionary instead of linear scans: - Tuple-keyed candidate dictionary grouped by basic block. - Walk the immediate-dominator chain to find the nearest dominating basis quickly and deterministically. - Simple cost model and best-rewrite selection: - Score candidate expressions and rewrites; select the highest-profit rewrite per instruction. - Skip rewriting when expressions are already foldable or high-efficiency. - Path compression for better ILP: - Compress chains of rewrites to a deeper dominating basis when a constant delta exists along the path, reducing dependent bumps on critical paths. - Dependency-aware rewrite ordering: - Build a dependency graph (basis, stride, variable delta producers) and rewrite in topological order. - This dependency graph will be needed by the next PR that adds partial strength reduction. - Correctness enhencment - Fix a correctness issue that reusing instructions with the same SCEV may introduce poison. --------- Co-authored-by: Kazu Hirata <kazu@google.com>	2025-12-08 16:07:27 -06:00
Shilei Tian	3ccd67295b	[AMDGPU] Fix a crash when a bool variable is used in inline asm (#171004 ) Fixes SWDEV-570184.	2025-12-08 14:44:21 -05:00
Dark Steve	cc19f420b9	[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886 ) Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code generation when NPM is enabled by default. Previously, DAG.getPass() returns nullptr when using NPM, causing the argument usage info to be unavailable during ISel. This resulted in fallback to FixedABIFunctionInfo which assumes all implicit arguments are needed, generating unnecessary register setup code for entry functions. Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll Changes: - Split AMDGPUArgumentUsageInfo into a data class and NPM analysis wrapper - Update SIISelLowering to use DAG.getMFAM() for NPM path - Add RequireAnalysisPass in addPreISel() to ensure analysis availability This follows the same pattern used for PhysicalRegisterUsageInfo.	2025-12-08 20:38:00 +05:30
Jay Foad	07bafab83d	[AMDGPU] Do not generate V_FMAC_DX9_ZERO_F32 on GFX12 (#171116 ) GFX12 does not have the FMAC form of this instruction, only the FMA form. Fixes: #170437	2025-12-08 13:20:02 +00:00

1 2 3 4 5 ...

9730 Commits