llvm-project

Author	SHA1	Message	Date
Fabian Ritter	2697c8cb45	[LowerMemIntrinsics] Factor control flow generation out of the memcpy lowering (#169039 ) So far, memcpy with known size, memcpy with unknown size, memmove with known size, and memmove with unknown size have individual optimized loop lowering implementations, while memset and memset.pattern use an unoptimized loop lowering. This patch extracts the parts of the memcpy lowerings (for known and unknown sizes) that generate the control flow for the loop expansion into an `insertLoopExpansion` function. The `createMemCpyLoop(Unk\|K)nownSize` functions then only collect the necessary arguments for `insertLoopExpansion`, call it, and fill the generated loop basic blocks. The immediate benefit of this is that logic from the two memcpy lowerings is deduplicated. Moreover, it enables follow-up patches that will use `insertLoopExpansion` to optimize the memset and memset.pattern implementations similarly to memcpy, since they can use the exact same control flow patterns. The test changes are due to more consistent and useful basic block names in the loop expansion and an improvement in basic block ordering: previously, the basic block that determines if the residual loop is executed would be put at the end of the function, now it is put before the residual loop body. Otherwise, the generated code should be equivalent. This patch doesn't affect memmove; deduplicating its logic would also be nice, but to extract all CF generation from the memmove lowering, `insertLoopExpansion` would need to be able to also create code that iterates backwards over the argument buffers. That would make `insertLoopExpansion` a lot more complex for a code path that's only used for memmove, so it's probably not worth refactoring. For SWDEV-543208.	2025-12-03 11:13:52 +01:00
Matt Arsenault	c7019c7eda	AMDGPU: Really use AV classes by default for vector classes (#166483 ) AMDGPU: Really use AV classes by default for vector classes Update getRegClassFor to use AV classes in place of VGPRs for gfx90a-gfx950. There are a handful of regressions. Most are enabling unprofitable rematerialization which reduce register count by 1 but add an unnecessary instruction.	2025-11-13 18:54:02 +00:00
Nicolai Hähnle	fa050eadab	Reland: CodeGen: Record MMOs in finalizeBundle (#166689 ) (original PR: #166210) This allows more accurate alias analysis to apply at the bundle level. This has a bunch of minor effects in post-RA scheduling that look mostly beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic). The pre-existing (and unchanged) test in CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a bundle with MMOs can be parsed successfully. v2: - use cloneMergedMemRefs - add another test to explicitly check the MMO bundling behavior v3: - use poison instead of undef to initialize the global variable in the test v4: - treat bundle memory accesses as never trivially disjoint	2025-11-06 07:34:36 -08:00
Jan Patrick Lehr	833983918d	Revert "CodeGen: Record MMOs in finalizeBundle" (#166520 ) Reverts llvm/llvm-project#166210 Buildbot failures in the libc on GPU bot: https://lab.llvm.org/buildbot/#/builders/10/builds/16711	2025-11-05 11:11:08 +01:00
Nicolai Hähnle	304d2ff4d9	CodeGen: Record MMOs in finalizeBundle (#166210 ) This allows more accurate alias analysis to apply at the bundle level. This has a bunch of minor effects in post-RA scheduling that look mostly beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic). The pre-existing (and unchanged) test in CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a bundle with MMOs can be parsed successfully. v2: - use cloneMergedMemRefs - add another test to explicitly check the MMO bundling behavior v3: - use poison instead of undef to initialize the global variable in the test	2025-11-05 06:56:19 +00:00
Mirko Brkušanin	bdec5bf69c	[AMDGPU][GlobalISel] Combine (or s64, zext(s32)) (#151519 ) If we only deal with a one part of 64bit value we can just generate merge and unmerge which will be either combined away or selected into copy / mov_b32.	2025-10-24 17:25:00 +02:00
Gang Chen	640644d68a	[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove the fixme (#161531 ) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.	2025-10-01 17:52:15 -07:00
Matt Arsenault	1614c3b3c7	AMDGPU: Always use AV spill pseudos on targets with AGPRs (#149099 ) This increases allocator freedom to inflate register classes to the AV class, we don't need to introduce a new restriction by basing the opcode on the current virtual register class. Ideally we would avoid this if we don't have any allocatable AGPRs for the function, but it probably doesn't make much difference in the end result if they are excluded from the final allocation order.	2025-07-18 15:31:50 +09:00
Ruiling, Song	3e47d8deba	MachineScheduler: Reset next cluster candidate for each node (#139513 ) When a node is picked, we should reset its next cluster candidate to null before releasing its successors/predecessors.	2025-05-28 14:53:46 +08:00
Austin Kerbow	2c9a46cce3	[AMDGPU] Move kernarg preload logic to separate pass (#130434 ) Moves kernarg preload logic to its own module pass. Cloned function declarations are removed when preloading hidden arguments. The inreg attribute is now added in this pass instead of AMDGPUAttributor. The rest of the logic is copied from AMDGPULowerKernelArguments which now only check whether an arguments is marked inreg to avoid replacing direct uses of preloaded arguments. This change requires test updates to remove inreg from lit tests with kernels that don't actually want preloading.	2025-05-11 21:18:11 -07:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Ana Mihajlovic	459b4e3fe1	Reland "[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212 )" (#131111 ) We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from it). When VALU is issued, it increments internal counter VA_SDST used to track use of this SGPR. SALU will not issue until VA_SDST is zero, that is when VALU is finished writing. Therefore, delays added by s_delay_alu are not needed in this situation.	2025-03-13 10:26:20 +01:00
Kazu Hirata	aa008e0008	Revert "[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212 )" This reverts commit 71582c6667a6334c688734cae628e906b3c1ac1d. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/127212	2025-03-12 12:09:09 -07:00
Ana Mihajlovic	71582c6667	[AMDGPU] Remove s_delay_alu for VALU->SGPR->SALU (#127212 ) We have a VALU->SGPR->SALU (VALU writing to SGPR and SALU reading from it). When VALU is issued, it increments internal counter VA_SDST used to track use of this SGPR. SALU will not issue until VA_SDST is zero, that is when VALU is finished writing. Therefore, delays added by s_delay_alu are not needed in this situation.	2025-03-12 09:33:07 -07:00
Krzysztof Drewniak	f8cc509b69	Reapply "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621 )" (#129078 ) This reverts commit 1559a65efaf327f9c72e14d4bb1834f076e7fc20. Fixed test (I suspect broken by unrelated change in the merge)	2025-02-27 11:26:13 -06:00
Kazu Hirata	1559a65efa	Revert "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621 )" This reverts commit 469757efafebdd5772d993fca4dc0dfa7cbda17c. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/126621	2025-02-26 14:35:07 -08:00
Krzysztof Drewniak	469757efaf	[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621 ) Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which normally handles unsupported memcpy()s,, and since you can't have a `noalias {ptr addrspace(8), i32}` becasue it crashes later passes, manually expand memcpy()s involving buffer fat pointers to loops. Additionally, though they're unlikely to be used, this commit adds support for memset(). This commit doesn't implement writing direct-to-LDS loads as the intrinsics, but leaves the option in the future.	2025-02-26 16:03:32 -06:00

17 Commits