llvm-project

Author	SHA1	Message	Date
Ruiling, Song	c6fa976d5b	AMDGPU: Make VarIndex WeakTrackingVH in AMDGPUPromoteAlloca (#188921 ) The test used to look all good, but actually not. The WeakVH just make itself null after the pointed value being replaced. So a zero value was used because VarIndex become null. The test checks looks all good. Actually only the WeakTrackingVH have the ability to be updated to new value. Change the test slightly to make that using zero index is wrong.	2026-03-28 09:50:25 +08:00
Ruiling, Song	28497b7e43	AMDGPU: Make VarIndex a WeakVH in AMDGPUPromoteAlloca (#188662 ) The VarIndex might come from (like load) another alloca which maybe promoted before. The value will replaced in this case. WeakVH correctly handles this.	2026-03-26 13:59:28 +08:00
Ruiling, Song	d69fc653c1	AMDGPU: Simplify placeholder replacement in AMDGPUPromoteAlloca (#188202 ) If `promoteAllocaUserToVector` returns the placeholder, it means the instruction does not actually modify the alloca. we don't need to add the placeholder as block available value for correctness. Instructions appear afterwards in the the same block could still get the placeholder as source value through GetCurVal() call. Instructions in other block which access the alloca will be set up later when we really do placeholder replacement. This help simplify the placeholder replacement logic.	2026-03-25 15:26:44 +08:00
Ruiling, Song	c378b79c14	Revert "AMDGPU: Delay value replacement in PromoteAlloca (#186944 )" (#188180 ) This reverts commit 5624cce586c74ec7cfcbd0243f65cb1870677af7. This is causing libclc failure. revert to fix it properly.	2026-03-24 06:42:28 +00:00
Ruiling, Song	5624cce586	AMDGPU: Delay value replacement in PromoteAlloca (#186944 ) When we do alloca promotion, there might be cross references to the values derived from different allocas. RAUW immediately during promotion may fail to update the values cached in the AllocaAnalysis structure. Solving the problem by postpone the value replacement, and also the value deletion as well to make this possible.	2026-03-24 08:31:46 +08:00
Jameson Nash	a6ceae48f5	[AMDGPU] Assert non-array alloca does have a size (#183834 ) Refs https://github.com/llvm/llvm-project/pull/179523/changes#r2851952141	2026-02-28 10:32:36 -05:00
Harrison Hao	1afd7d40af	[AMDGPU] Support i8/i16 GEP indices when promoting allocas to vectors (#175489 ) Allow promote alloca to vector to form a vector element index from i8/i16 GEPs when the dynamic offset is known to be element size aligned. Example: ```llvm %alloca = alloca <3 x float>, addrspace(5) %idx = select i1 %idx_select, i32 0, i32 4 %p = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %idx ``` Or: ```llvm %alloca = alloca <3 x float>, addrspace(5) %idx = select i1 %idx_select, i32 0, i32 2 %p = getelementptr inbounds i16, ptr addrspace(5) %alloca, i32 %idx ```	2026-02-27 18:24:43 +08:00
Jameson Nash	bddc8e20bd	[AMDGPU] Replace getAllocatedType with getAllocationSize in PromoteAlloca (#179523 ) Some progress towards using size-based APIs instead of unreliable querying of alloca element types. The removal of the mis-accounting of alignment to global variable size might have a minor functional impact in edge cases where the overestimation of size used pushed it just over the threshold to stop optimizing, and it wasn't already canonicalized by an earlier pass. Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-24 09:30:39 -05:00
Shilei Tian	a56993a694	[AMDGPU] Remove `FeaturePromoteAlloca` (#177636 ) It looks like `+promote-alloca` is always enabled, and `-promote-alloca` is simply used as a switch to toggle the pass.	2026-02-23 15:24:57 -05:00
Shilei Tian	70905e0afa	[RFC][IR] Remove `Constant::isZeroValue` (#181521 ) `Constant::isZeroValue` currently behaves same as `Constant::isNullValue` for all types except floating-point, where it additionally returns true for negative zero (`-0.0`). However, in practice, almost all callers operate on integer/pointer types where the two are equivalent, and the few FP-relevant callers have no meaningful dependence on the `-0.0` behavior. This PR removes `isZeroValue` to eliminate the confusing API. All callers are changed to `isNullValue` with no test failures. `isZeroValue` will be reintroduced in a future change with clearer semantics: when null pointers may have non-zero bit patterns, `isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will check for the semantic null (which may be non-zero).	2026-02-15 12:06:42 -05:00
Steffen Larsen	c7408d17fa	[AMDGPU][SROA] Unify cast chain implementations (#177945 ) The AMDGPU promote alloca pass is missing a conversion link when casting between vectors of pointers and pointers or vectors of pointers with different number of elements. This causes codegen to crash due to invalid casts being generated. To address this, this commit adds the missing conversion link. In addition to this, the commit moves the common load/store cast logic into a new function `createLoadStoreCastChain`. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com> Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>	2026-02-03 11:12:02 +00:00
Shilei Tian	786a20710d	[NFCI][AMDGPU] Use `GET_SUBTARGETINFO_MACRO` in `GCNSubtarget.h` and `R600Subtarget.h` (#177402 ) We can finally get rid of the manually defined boolean variables, like other targets. Even though most of them are now defined by macros, we still need to add the entries.	2026-01-25 09:38:42 -05:00
Jameson Nash	d10b2b566a	[NFCI] replace getValueType with new getGlobalSize query (#177186 ) Returns uint64_t to simplify callers. The goal is eventually replace getValueType with this query, which should return the known minimum reference-able size, as provided (instead of a Type) during create. Additionally the common isSized query would be replaced with an isExactKnownSize query to test if that size is an exact definition.	2026-01-22 13:55:53 -05:00
Shilei Tian	4e74fba5b2	[AMDGPU] Fix a potential use-after-erase in `AMDGPUPromoteAlloca` pass (#174529 ) In some cases, the placeholder itself can be used as the value for its corresponding block in `SSAUpdater`, and later used as an incoming value in another block in `GetValueInMiddleOfBlock`. If we erase it too early, this can lead to a use-after-erase.	2026-01-08 11:44:16 -05:00
Kevin Choi	5897f276a5	[AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce excessive codegen (#171253 ) Investigation revealed that scalarized copy results in a long chain of extract/insert elements which can explode in generated temps in the AMDGPU backend as there is no efficient representation for extracting subvector with dynamic index. Using identity bitcasts can reduce the number of extract/insert elements down to 1 and produce much smaller, efficient generated code. Credit: ruiling	2025-12-19 14:06:52 -05:00
macurtis-amd	e741cd88a1	AMDGPU/PromoteAlloca: Fix handling of users of multiple allocas (#172771 ) With recent refactoring, LDS promotion worklists for all allocas are populated upfront. In some cases, this results in a User in multiple lists. Then as each list is processed, a User might get deleted via removeFromParent, potentially leaving a dangling pointer in a subsequent worklist. Currently this only occurs for memcpy and memmove. Prior to refactoring, these were handled by DeferredInstr, and were processed after the last use of the then singular worklist. This change moves processing of DeferredInstr to after all worklists have be processed.	2025-12-18 08:41:21 -06:00
Nicolai Hähnle	e760d0619f	AMDGPU/PromoteAlloca: Refactor into analysis / commit phases (#170512 ) This change is motivated by the overall goal of finding alternative ways to promote allocas to VGPRs. The current solution is effectively limited to allocas whose size matches a register class, and we can't keep adding more register classes. We have some downstream work in this direction, and I'm currently looking at cleaning that up to bring it upstream. This refactor paves the way to adding a third way of promoting allocas, on top of the existing alloca-to-vector and alloca-to-LDS. Much of the analysis can be shared between the different promotion techniques. Additionally, the idea behind splitting the pass into an analysis phase and a commit phase is that it ought to allow us to more easily make better "big picture" decision about which allocas to promote how in the future.	2025-12-12 01:24:38 +00:00
Jan Patrick Lehr	ec787501dc	Revert "[AMDGPU] Enable i8 GEP promotion for vector allocas" (#171087 ) Reverts llvm/llvm-project#166132 Broke libc on GPU tests. https://lab.llvm.org/buildbot/#/builders/10/builds/18635	2025-12-08 08:25:48 +00:00
Harrison Hao	6ec8c4351c	[AMDGPU] Enable i8 GEP promotion for vector allocas (#166132 ) This patch adds support for the pattern: ```llvm %index = select i1 %idx_sel, i32 0, i32 4 %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index ``` by scaling the byte offset to an element index (index >> log2(ElemSize)), allowing the vector element to be updated with insertelement instead of using scratch memory.	2025-12-08 12:13:09 +08:00
Nicolai Hähnle	8dee997a85	Reland "AMDGPU/PromoteAlloca: Always use i32 for indexing (#170511 )" (#170956 ) Create more canonical code that may even lead to slightly better codegen.	2025-12-06 08:54:44 -08:00
Nicolai Hähnle	ee77c58e5b	Reland "AMDGPU/PromoteAlloca: Simplify how deferred loads work (#170510 )" (#170955 ) The second pass of promotion to vector can be quite simple. Reflect that simplicity in the code for better maintainability. v2: - don't put placeholders into the SSAUpdater, and add a test that shows the problem	2025-12-06 01:15:28 +00:00
Nicolai Hähnle	0e0ec4c348	Revert "AMDGPU/PromoteAlloca: Simplify how deferred loads work (#170510 )" This reverts commit 22a2c27a0aa0d3aa5d4222f6e766646166450543. Failure on clang-hip-vega20: https://lab.llvm.org/buildbot/#/builders/123/builds/31779	2025-12-05 13:23:05 -08:00
Nicolai Hähnle	de86696dba	Revert "AMDGPU/PromoteAlloca: Always use i32 for indexing (#170511 )" This reverts commit f558c30146e51d5ef72bf3d4b3f0e86ca19e4b99. Failure on clang-hip-vega20: https://lab.llvm.org/buildbot/#/builders/123/builds/31779	2025-12-05 13:22:41 -08:00
Nicolai Hähnle	f558c30146	AMDGPU/PromoteAlloca: Always use i32 for indexing (#170511 ) Create more canonical code that may even lead to slightly better codegen.	2025-12-05 12:54:57 -08:00
Nicolai Hähnle	22a2c27a0a	AMDGPU/PromoteAlloca: Simplify how deferred loads work (#170510 ) The second pass of promotion to vector can be quite simple. Reflect that simplicity in the code for better maintainability.	2025-12-05 12:54:25 -08:00
Nicolai Hähnle	3c5fd492d4	AMDGPU/PromoteAlloca: Extract getVectorTypeForAlloca helper (#170509 )	2025-12-03 22:24:07 -08:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
Fabian Ritter	7982980e07	[AMDGPUPromoteAlloca][NFC] Avoid unnecessary APInt/int64_t conversions (#157864 ) Follow-up to #157682	2025-09-12 09:51:55 +02:00
Fabian Ritter	5b81367960	[AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (#157810 ) When we know that one operand of an addition is a constant, we might was well put it on the right-hand side and avoid the work to canonicalize it in a later pass.	2025-09-10 14:46:46 +02:00
Fabian Ritter	b965f26538	[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca (#157682 ) [AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca AMDGPUPromoteAlloca can transform i32 GEP offsets that operate on allocas into i64 extractelement indices. Before this patch, negative GEP offsets would be zero-extended, leading to wrong extractelement indices with values around (2**32-1). This fixes failing LlvmLibcCharacterConverterUTF32To8Test tests for AMDGPU.	2025-09-10 11:32:14 +02:00
Carl Ritson	1f6648ccaa	[AMDGPU] AMDGPUPromoteAlloca: increase default max-regs to 32 (#155076 ) Increase promote-alloca-to-vector-max-regs to 32 from 16. This restores default promotion of 16 x double which was disabled by #127973. Fixes SWDEV-525817.	2025-08-26 09:30:16 +09:00
Diana Picus	a201f8872a	[AMDGPU] Replace dynamic VGPR feature with attribute (#133444 ) Use a function attribute (amdgpu-dynamic-vgpr) instead of a subtarget feature, as requested in #130030.	2025-06-24 11:09:36 +02:00
Matt Arsenault	1cae21da47	AMDGPU: Remove legacy PM version of AMDGPUPromoteAllocaToVector (#144986 ) This is only run in the middle end with the new pass manager now, so garbage collect the old PM version.	2025-06-20 16:43:39 +09:00
zGoldthorpe	4692f0d344	Revert "[AMDGPU] Extended vector promotion to aggregate types." (#144366 ) Reverts llvm/llvm-project#143784 Patch fails some internal tests. Will investigate more thoroughly before attempting to remerge.	2025-06-16 11:06:18 -04:00
zGoldthorpe	79e06bf1ae	[AMDGPU] Extended vector promotion to aggregate types. (#143784 ) Extends the `amdgpu-promote-alloca-to-vector` pass to also promote aggregate types whose elements are all the same type to vector registers. The motivation for this extension was to account for IR generated by the frontend containing several singleton struct types containing vectors or vector-like elements, though the implementation is strictly more general.	2025-06-13 14:22:21 -04:00
Harrison Hao	1a7f5f5833	[AMDGPU] Promote nestedGEP allocas to vectors (#141199 ) Supports the `nestedGEP`pattern that appears when an alloca is first indexed as an array element and then shifted with a byte‑offset GEP: ```llvm %SortedFragments = alloca [10 x <2 x i32>], addrspace(5), align 8 %row = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4 %val = load i32, ptr addrspace(5) %elt1 ``` The pass folds the two levels of addressing into a single vector lane index and keeps the whole object in a VGPR: ```llvm %vec = freeze <20 x i32> poison ; alloca promote <20 x i32> %idx0 = mul i32 %j, 2 ; j * 2 %idx = add i32 %idx0, 1 ; j * 2 + 1 %val = extractelement <20 x i32> %vec, i32 %idx ``` This eliminates the scratch read.	2025-06-02 16:20:14 +08:00
Robert Imschweiler	dc29901efb	[AMDGPU] PromoteAlloca: handle out-of-bounds GEP for shufflevector (#139700 ) This LLVM defect was identified via the AMD Fuzzing project. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-05-21 15:28:30 +02:00
Lucas Ramirez	e377dc4d38	[AMDGPU] Max. WG size-induced occupancy limits max. waves/EU (#137807 ) The default maximum waves/EU returned by the family of `AMDGPUSubtarget::getWavesPerEU` is currently the maximum number of waves/EU supported by the subtarget (only a valid occupancy range in "amdgpu-waves-per-eu" may lower that maximum). This ignores maximum achievable occupancy imposed by flat workgroup size and LDS usage, resulting in situations where `AMDGPUSubtarget::getWavesPerEU` produces a maximum higher than the one from `AMDGPUSubtarget::getOccupancyWithWorkGroupSizes`. This limits the waves/EU range's maximum to the maximum achievable occupancy derived from flat workgroup sizes and LDS usage. This only has an impact on functions which restrict flat workgroup size with "amdgpu-flat-work-group-size", since the default range of flat workgroup sizes achieves the maximum number of waves/EU supported by the subtarget. Improvements to the handling of "amdgpu-waves-per-eu" are left for a follow up PR (e.g., I think the attribute should be able to lower the full range of waves/EU produced by these methods).	2025-05-01 13:22:23 +02:00
Fabian Ritter	cf188d650c	[AMDGPU] Avoid crashes for non-byte-sized types in PromoteAlloca (#134042 ) This patch addresses three problems when promoting allocas to vectors: - Element types with size < 1 byte in allocas with a vector type caused divisions by zero. - Element types whose size doesn't match their AllocSize hit an assertion. - Access types whose size doesn't match their AllocSize hit an assertion. With this patch, we do not attempt to promote affected allocas to vectors. In principle, we could handle these cases in PromoteAlloca, e.g., by truncating and extending elements from/to their allocation size. It's however unclear if we ever encounter such cases in practice, so that doesn't seem worth the added complexity. For SWDEV-511252	2025-04-14 09:13:54 +02:00
Rahul Joshi	74b7abf154	[IRBuilder] Add new overload for CreateIntrinsic (#131942 ) Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.	2025-03-31 08:10:34 -07:00
Kazu Hirata	71935281e0	[Target] Use *Set::insert_range (NFC) (#132140 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 09:09:30 -07:00
Carl Ritson	0e4116a6b9	[AMDGPU] Fix typing error in multi dimensional promote alloca (#131763 ) Fix type error when GEP uses i64 index introduced in #127973.	2025-03-19 08:17:04 +09:00
Matt Arsenault	c5fe075eaf	AMDGPU: Use freeze poison instead of undef in alloca promotion (#131285 ) Previously the value created to represent the uninitialized memory of the alloca was undef. Use freeze poison instead. Enables some optimization improvements (which need defeating in the limit tests), but also a few regressions. Seems to leave behind dead code in some cases too.	2025-03-18 17:27:02 +07:00
Shilei Tian	51c706c119	[NFC][AMDGPU] Replace direct arch comparison with `isAMDGCN()` (#131357 )	2025-03-14 14:21:44 -04:00
Carl Ritson	525d412cae	[AMDGPU] Fix typing error introduce in promote alloca change Fix type error when GEP uses i64 offset introduced in #127973.	2025-03-12 17:32:57 +09:00
Carl Ritson	d921bf233c	[AMDGPU] Extend promotion of alloca to vectors (#127973 ) * Add multi dimensional array support * Make maximum vector size tunable * Make ratio of VGPRs used for vector promotion tunable * Maximum array size now based on VGPR count (32b) instead of element count	2025-03-12 15:11:30 +09:00
Jay Foad	44607666b3	[AMDGPU] Simplify conditional expressions. NFC. (#129228 ) Simplfy `cond ? val : false` to `cond && val` and similar.	2025-03-03 10:40:49 +00:00
Sumanth Gundapaneni	4c9e14b3ad	[AMDGPU] Update PromoteAlloca to handle GEPs with variable offset. (#122342 ) In case of variable offset of a GEP that can be optimized out, promote alloca is updated to use the refereshed index to avoid an assertion. Issue found by fuzzer. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-02-24 13:36:30 -06:00
Kazu Hirata	aa066e36f8	[AMDGPU] Avoid repeated hash lookups (NFC) (#126430 )	2025-02-09 13:34:28 -08:00
Shilei Tian	6e4105574e	[NFC][AMDGPU] Improve code introduced in #124607 (#124672 )	2025-01-27 22:57:16 -05:00

1 2 3 4 5

205 Commits