llvm-project

Author	SHA1	Message	Date
Fabian Ritter	125af56867	[AMDGPU][SDAG] Only fold flat offsets if they are inbounds PTRADDs (#165427 ) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, SDISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from PTRADD address computations with the inbounds flag: If the PTRADD does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Affected tests: - CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added new positive tests where we still do fold them. - CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem integral to this test, so the test is not changed to make offset folding still happen. - CodeGen/AMDGPU/loop-prefetch-data.ll: loop-reduce transforms inbounds addresses for accesses to be based on potentially OOB addresses used for prefetching. - I think the remaining ones suffer from the limited preservation of the inbounds flag in PTRADD DAGCombines due to the provenance problems pointed out in PR #165424 and the fact that `AMDGPUTargetLowering::SplitVector{Load\|Store}` legalizes too-wide accesses by repeatedly splitting them in half. Legalizing a V32S32 memory accesses therefore leads to inbounds ptradd chains like (ptradd inbounds (ptradd inbounds (ptradd inbounds P, 64), 32), 16). The DAGCombines fold them into a single ptradd, but the involved transformations generally cannot preserve the inbounds flag (even though it would be valid in this case). Similar previous PR that relied on `ISD::ADD inbounds` instead of `ISD::PTRADD inbounds` (closed): #132353 Analogous PR for GISel (merged): #153001 Fixes SWDEV-516125.	2025-11-19 11:48:12 +01:00
Shilei Tian	e7f47e70e7	[AMDGPU] Enable XNACK on gfx1250 (#161457 ) This should be always on. Fixes SWDEV-555931.	2025-10-03 11:04:55 -04:00
Fabian Ritter	bf847a8b9d	[AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (#146076 ) Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125.	2025-10-02 10:06:25 +02:00
Pierre van Houtryve	d6d0f4f156	[AMDGPU][gfx1250] Remove SCOPE_SE for scratch stores (#157640 )	2025-09-10 11:03:58 +02:00
Nikita Popov	0a41e7c87e	[LICM] Do not reassociate constant offset GEP (#151492 ) LICM tries to reassociate GEPs in order to hoist an invariant GEP. Currently, it also does this in the case where the GEP has a constant offset. This is usually undesirable. From a back-end perspective, constant GEPs are usually free because they can be folded into addressing modes, so this just increases register pressume. From a middle-end perspective, keeping constant offsets last in the chain makes it easier to analyze the relationship between multiple GEPs on the same base, especially after CSE. The worst that can happen here is if we start with something like ``` loop { p + 4x p + 4x + 1 p + 4x + 2 p + 4x + 3 } ``` And LICM converts it into: ``` p.1 = p + 1 p.2 = p + 2 p.3 = p + 3 loop { p + 4x p.1 + 4x p.2 + 4x p.3 + 4x } ``` Which is much worse than leaving it for CSE to convert to: ``` loop { p2 = p + 4*x p2 + 1 p2 + 2 p2 + 3 } ```	2025-08-01 09:43:15 +02:00
Nikita Popov	e2bd92e56b	[AMDGPU] Regenerate test checks (NFC)	2025-07-31 17:02:48 +02:00
Stanislav Mekhanoshin	96e5eed92a	[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493 ) We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.	2025-07-24 13:22:50 -07:00
Mariusz Sikora	e9de91e989	[AMDGPU] Add safe-smem-prefetch SubtargetFeature off by default (#130050 ) S_PREFETCH_* instructions may cause host to terminate process in case of the invalid address. Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-03-07 11:10:21 +01:00
Shilei Tian	6548b6354d	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.	2024-11-08 20:21:16 -05:00
Shilei Tian	ca33649abe	Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )" This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.	2024-11-08 16:36:35 -05:00
Shilei Tian	e215a1e27d	[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403 )	2024-11-08 13:05:35 -05:00
Stanislav Mekhanoshin	3277c7cd28	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765 ) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.	2024-10-21 09:39:52 -07:00
Carl Ritson	86627149f6	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067 ) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-09-04 12:15:20 +09:00
Matt Arsenault	b1bcb7ca46	Reapply "AMDGPU: Move attributor into optimization pipeline (#83131 )" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851 ) This reverts commit adaff46d087799072438dd744b038e6fd50a2d78. Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.	2024-07-15 11:51:44 +04:00
dyung	adaff46d08	Revert "AMDGPU: Move attributor into optimization pipeline (#83131 )" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851 ) This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and 78bc1b64a6dc3fb6191355a5e1b502be8b3668e7. The test CodeGenHIP/default-attributes.hip is failing on multiple bots even after the attempted fix including the following: - https://lab.llvm.org/buildbot/#/builders/3/builds/1473 - https://lab.llvm.org/buildbot/#/builders/65/builds/1380 - https://lab.llvm.org/buildbot/#/builders/161/builds/595 - https://lab.llvm.org/buildbot/#/builders/154/builds/1372 - https://lab.llvm.org/buildbot/#/builders/133/builds/1547 - https://lab.llvm.org/buildbot/#/builders/81/builds/755 - https://lab.llvm.org/buildbot/#/builders/40/builds/570 - https://lab.llvm.org/buildbot/#/builders/13/builds/748 - https://lab.llvm.org/buildbot/#/builders/12/builds/1845 - https://lab.llvm.org/buildbot/#/builders/11/builds/1695 - https://lab.llvm.org/buildbot/#/builders/190/builds/1829 - https://lab.llvm.org/buildbot/#/builders/193/builds/962 - https://lab.llvm.org/buildbot/#/builders/23/builds/991 - https://lab.llvm.org/buildbot/#/builders/144/builds/2256 - https://lab.llvm.org/buildbot/#/builders/46/builds/1614 These bots have been broken for a day, so reverting to get everything back to green.	2024-07-14 18:48:54 -07:00
Matt Arsenault	78bc1b64a6	AMDGPU: Move attributor into optimization pipeline (#83131 ) Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs.	2024-07-14 08:36:33 +04:00
Jay Foad	ba52f06f9d	[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438 ) Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into separate LOADCNT, SAMPLECNT and BVHCNT counters.	2024-01-18 10:47:45 +00:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Jay Foad	ba131b7017	[AMDGPU] Do not generate s_set_inst_prefetch_distance for GFX12 (#78190 ) GFX12 can still encode the s_set_inst_prefetch_distance instruction but it has no effect.	2024-01-15 18:20:45 +00:00
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00

20 Commits