llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	c37b2549ff	Revert "[InstSimplify] Fold `getelementptr inbounds null, idx -> null` (#130742 )" (#138168 ) Revert #130742 for now to avoid breaking glibc failures until the workaround patches are landed.	2025-05-01 14:21:59 -07:00
Shilei Tian	cf9d4048fb	Reapply "[NFC][AMDGPU] Correct the check line update script for `llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior.ll`" This reverts commit 74f55c744a18b848cc780c42f0e3dde7e7c96195 with a fix for the check lines.	2025-05-01 14:13:23 -04:00
Shilei Tian	74f55c744a	Revert "[NFC][AMDGPU] Correct the check line update script for `llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior.ll`" This reverts commit 49a5dd3dac285ba12f3fcaa55cacbea5968f5a37.	2025-05-01 14:12:03 -04:00
Shilei Tian	49a5dd3dac	[NFC][AMDGPU] Correct the check line update script for `llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior.ll`	2025-05-01 14:10:46 -04:00
Shilei Tian	e25ddf9081	[NFC][AMDGPU] Auto generate check lines for `llvm/test/CodeGen/AMDGPU/inline-attr.ll`	2025-05-01 13:52:27 -04:00
Lucas Ramirez	e377dc4d38	[AMDGPU] Max. WG size-induced occupancy limits max. waves/EU (#137807 ) The default maximum waves/EU returned by the family of `AMDGPUSubtarget::getWavesPerEU` is currently the maximum number of waves/EU supported by the subtarget (only a valid occupancy range in "amdgpu-waves-per-eu" may lower that maximum). This ignores maximum achievable occupancy imposed by flat workgroup size and LDS usage, resulting in situations where `AMDGPUSubtarget::getWavesPerEU` produces a maximum higher than the one from `AMDGPUSubtarget::getOccupancyWithWorkGroupSizes`. This limits the waves/EU range's maximum to the maximum achievable occupancy derived from flat workgroup sizes and LDS usage. This only has an impact on functions which restrict flat workgroup size with "amdgpu-flat-work-group-size", since the default range of flat workgroup sizes achieves the maximum number of waves/EU supported by the subtarget. Improvements to the handling of "amdgpu-waves-per-eu" are left for a follow up PR (e.g., I think the attribute should be able to lower the full range of waves/EU produced by these methods).	2025-05-01 13:22:23 +02:00
mssefat	71039bbc58	[AMDGPU] Fix register class constraints for si-fold-operands pass when folding immediate into copies (#131387 ) Fixes https://github.com/llvm/llvm-project/issues/130020 This fixes an issue where the si-fold-operands pass would incorrectly fold immediate values into COPY instructions targeting av_32 registers. The pass now checks register class constraints before attempting to fold the immediate.	2025-04-30 17:36:46 -05:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
mssefat	7495f92f08	[AMDGPU] Fix undefined scc register in successor block of SI_KILL terminators (#134718 ) Fix issue 131298 where an undefined $scc register causes verifier errors when using SI_KILL_F32_COND_IMM_TERMINATOR instructions. The problem occurs because the $scc register defined in a comparison before the kill terminator is used in successor blocks, but was not properly marked as live-in. This patch: - Adds code to check if SCC is used in the successor block - Adds SCC as a live-in to successor blocks - Handles both explicit and implicit uses of SCC With this patch the machine verifier no longer reports undefined $scc errors in following kill terminator instruction. Fixes #131298 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-04-30 09:02:45 -05:00
Akshat Oke	e91cbd4f29	[CodeGen][NPM] Port VirtRegRewriter to NPM (#130564 )	2025-04-30 14:10:46 +05:30
Vikram Hegde	86d8e8d9a6	[CodeGen][NewPM] Port "PrologEpilogInserter" to NPM (#130550 )	2025-04-29 13:13:45 +05:30
Sirish Pande	abec9ff47d	[AMDGPU] Correctly merge noalias scopes during lowering of LDS data. (#131664 ) Currently, if there is already noalias metadata present on loads and stores, lower module lds pass is generating a more conservative aliasing set. This results in inhibiting scheduling intrinsics that would have otherwise generated a better pipelined instruction. The fix is not to always intersect already existing noalias metadata with noalias created for lowering of LDS. But to intersect only if noalias scopes are from the same domain, otherwise concatenate exising noalias sets with LDS noalias. There a few patches that have come for scopedAA in the past. Following three should be enough background information. https://reviews.llvm.org/D91576 https://reviews.llvm.org/D108315 https://reviews.llvm.org/D110049 Essentially, after a pass that might change aliasing info, one should check if that pass results in change number of MayAlias or ModRef using the following: `opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval -evaluate-aa-metadata -print-all-alias-modref-info -disable-output`	2025-04-28 14:02:18 -05:00
Shilei Tian	3570908519	[NFC][AMDGPU] Auto generate check lines for some codegen tests (#137534 ) Make preparation for #137488.	2025-04-28 09:25:05 -04:00
David Stuttard	1a32613dac	[AMDGPU] Update pal metadata for v3.6 and fix v3.0 (#135196 ) Update entry_point for all pal versions below 3.6. 3.6 and above removes entry_point.	2025-04-28 13:31:14 +01:00
John Brawn	dd87127f4e	[DAGCombiner] Eliminate fp casts if we have the right fast math flags (#131345 ) When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.	2025-04-28 11:21:51 +01:00
Brox Chen	72bc0525d8	[AMDGPU][True16][CodeGen] update wwm reg sorting check condition (#135053 ) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting	2025-04-27 14:30:34 -04:00
Shilei Tian	3bc125490a	[AMDGPU][Verifier] Check address space of `alloca` instruction (#135820 ) This PR updates the `Verifier` to enforce that `alloca` instructions on AMDGPU must be in AS5. This prevents hitting a misleading backend error like "unable to select FrameIndex," which makes it look like a backend bug when it's actually an IR-level issue.	2025-04-26 00:54:00 -04:00
Diana Picus	5bad5d84a1	Reland [AMDGPU] Support block load/store for CSR #130013 (#137169 ) Add support for using the existing SCRATCH_STORE_BLOCK and SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It does not include WWM registers - those will be saved and restored individually, just like before. This patch does not change the ABI. Use of this feature may lead to slightly increased stack usage, because the memory is not compacted if certain registers don't have to be transferred (this will happen in practice for calling conventions where the callee and caller saved registers are interleaved in groups of 8). However, if the registers at the end of the block of 32 don't have to be transferred, we don't need to use a whole 128-byte stack slot - we can trim some space off the end of the range. In order to implement this feature, we need to rely less on the target-independent code in the PrologEpilogInserter, so we override several new methods in SIFrameLowering. We also add new pseudos, SI_BLOCK_SPILL_V1024_SAVE/RESTORE. One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the SCRATCH_LOAD_BLOCK instructions will have all the registers that are not transferred added as implicit uses. This is done in order to inform LiveRegUnits that those registers are not available before the restore (since we're not really restoring them - so we can't afford to scavenge them). Unfortunately, this trick doesn't work with the save, so before the save all the registers in the block will be unavailable (see the unit test). This was reverted due to failures in the builds with expensive checks on, now fixed by always updating LiveIntervals and SlotIndexes in SILowerSGPRSpills.	2025-04-25 11:29:27 +02:00
Matt Arsenault	4f5cfa81dc	AMDGPU: Remove amdhsa_code_object_version module flags from most tests (#136363 ) These were added to the migration from v4 to v5 and should be removed now that the default has changed.	2025-04-24 17:13:03 +02:00
Craig Topper	d43ce35048	[TableGen][GISel] Allow isTrivialOperatorNode to import patterns with isStore and a memory VT. (#137080 ) This removes the need to explicitly set isTruncStore on truncstorei8 and other similar PatFrags that include truncstore in their frags DAG. This allows some new patterns to be imported for AMDGPU as you can see in the changed test. The extra isTruncStore were added in ae2b36e8bdfa6, along with some other tablegen changes to look for MemoryVT along with isTruncStore. I did not remove the code, because I'm not sure if any out of tree users have become dependent on it. It's no longer exercised in tree.	2025-04-24 08:10:07 -07:00
anjenner	a3d05e8987	Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131 ) This assert was failing in a fuzzing test. I consulted with @jrbyrnes who said: The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into their source code. This intrinsic applies a highly specialized DAG mutation to result in specific scheduling for a specific set of kernels. These assertions are really just confirming that the characteristics of the kernel match what is expected (i.e. The kernels are similar to the ones this DAG mutation strategy were designed against). However, if we apply this DAG mutation to kernels for which is was not designed, then we may not find the types of instructions we are looking for, and may end up with empty caches. I think it should be fine to just return false if the cache is empty instead of the assert.	2025-04-24 09:22:24 +01:00
Brox Chen	6dbc01e801	[AMDGPU][True16][CodeGen] update GFX11Plus codegen test with true16 flag (#135078 ) This is a NFC patch. This patch run a bulk update on CodeGen tests that are impacted by the true16 features. This patch applies: 1. duplicate GFX11plus runlines and apply them with "+mattr=+real-true16" and "+mattr=-real-true16" 2. update the test with the update script For some GISEL runlines, the current CodeGen do not fully support the true16 version. Still update the runlines, but comment out the failing one, and added a "FIXME-TRUE16" comment to that test for easier tracking. These test will be fixed in the following patches. This is in a transition state that we support both "+real-true16/-real-true16" in our code base. We plan to move to "+real-true16" as default, and finally remove "-real-true16" mode and test lines.	2025-04-23 13:06:52 -04:00
Diana Picus	6bb2f90557	Revert "[AMDGPU] Support block load/store for CSR" (#136846 ) Reverts llvm/llvm-project#130013 due to failures with expensive checks on.	2025-04-23 14:01:00 +02:00
Diana Picus	4a58071d87	[AMDGPU] Support block load/store for CSR (#130013 ) Add support for using the existing `SCRATCH_STORE_BLOCK` and `SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`. It does not include WWM registers - those will be saved and restored individually, just like before. This patch does not change the ABI. Use of this feature may lead to slightly increased stack usage, because the memory is not compacted if certain registers don't have to be transferred (this will happen in practice for calling conventions where the callee and caller saved registers are interleaved in groups of 8). However, if the registers at the end of the block of 32 don't have to be transferred, we don't need to use a whole 128-byte stack slot - we can trim some space off the end of the range. In order to implement this feature, we need to rely less on the target-independent code in the PrologEpilogInserter, so we override several new methods in `SIFrameLowering`. We also add new pseudos, `SI_BLOCK_SPILL_V1024_SAVE/RESTORE`. One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the SCRATCH_LOAD_BLOCK instructions will have all the registers that are not transferred added as implicit uses. This is done in order to inform LiveRegUnits that those registers are not available before the restore (since we're not really restoring them - so we can't afford to scavenge them). Unfortunately, this trick doesn't work with the save, so before the save all the registers in the block will be unavailable (see the unit test).	2025-04-23 10:33:36 +02:00
zhijian lin	afda4c295b	Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701 ) This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-22 17:36:41 -04:00
Pierre van Houtryve	ec3a90509d	[AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (#135340 ) wb/wbinv use storecnt, inv uses loadcnt. Track them as VMEM_WRITE_ACCESS and VMEM_READ_ACCESS to avoid InsertWaitCnt incorrectly eliminating the waitcnts after these instructions. Solves SWDEV-526604	2025-04-22 14:53:55 +02:00
Pierre van Houtryve	47903e3372	[AMDGPU][InsertWaitCnts] Add test for global_wb/inv/wbinv tracking (#135339 )	2025-04-22 14:50:43 +02:00
Pankaj Dwivedi	a25fdd7aca	Reapply "[AMDGPU] Insert readfirstlane in the function returns in sgpr." (#136678 ) Reapply #135326 and fix the target-dependent constant check. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-04-22 17:48:55 +05:30
Mariusz Sikora	1a48e1df45	[AMDGPU] Do not fold COPY with implicit operands (#136003 ) Folding may remove COPY from inside of the divergent loop.	2025-04-22 13:33:06 +02:00
Frederik Harwath	f541a3aad8	[AMDGPU] SIInstrInfo: Fix resultDependsOnExec for VOPC instructions (#134629 ) SIInstrInfo::resultDependsOnExec assumes that operand 0 of a comparison is always the destination of the instruction. This is not true for instructions in VOPC form where it is "src0". This led to a crash in machine-cse. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-04-22 10:17:35 +02:00
Changpeng Fang	a945f5917c	AMDGPU: Add global-isel checks and rename fptrunc.v2f16.fpmath.ll (#136609 ) Also remove the checks with -enable-unsafe-fp-math (already in fptrunc.f16.ll)	2025-04-21 14:38:02 -07:00
Shilei Tian	9968ba8652	Revert "[AMDGPU] Insert readfirstlane in the function returns in sgpr. (#135326 )" This reverts commit 76ced7fa782f0d7db9efea871fa6de74706dd9cc since it breaks a lot of bots.	2025-04-21 14:31:10 -04:00
Pankaj Dwivedi	76ced7fa78	[AMDGPU] Insert readfirstlane in the function returns in sgpr. (#135326 ) insert `readfirstlane` in the function returns in sgpr.	2025-04-21 21:57:16 +05:30
Nico Weber	e18a77cfbe	Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 )" This reverts commit f12078e72601e7c03e5d66afab034313caf8f791. Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741	2025-04-21 10:51:03 -04:00
zhijian lin	f12078e726	[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 ) The PR will fix the issue https://github.com/llvm/llvm-project/issues/122728 This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-21 10:02:21 -04:00
Matt Arsenault	a3f8836ae8	AMDGPU: Regenerate baseline checks Clean up now unnecessary second check prefix.	2025-04-18 22:07:47 +02:00
Matt Arsenault	9bdd9dc895	AMDGPU: Mark workitem ID intrinsics with range attribute (#136196 ) This avoids the need to have special handling at every use site. Unfortunately this means we unnecessarily emit AssertZext in the DAG (where we already directly understand the range of the intrinsic), andt we regress in undefined cases as we don't fold out asserts on undef.	2025-04-18 12:27:38 +02:00
Simon Pilgrim	64ffecfc43	[DAG] isKnownNeverNaN - add DemandedElts element mask to isKnownNeverNaN calls (#135952 ) Matches what we've done for computeKnownBits etc. to improve vector handling	2025-04-18 09:24:02 +01:00
Shoreshen	a3f38f27cd	Revert "[AMDGPU] Implement vop3p complex pattern optmization for gisel" (#136249 ) Reverts llvm/llvm-project#130234	2025-04-17 23:45:30 -04:00
Shoreshen	a04580f71b	[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234 ) Seeking opportunities to optimize VOP3P instructions by altering opsel, opsel_hi, neg, neg_hi bits Tests differences: 1. fix op_sel_hi bit for inline constant: 1. `CodeGen/AMDGPU/packed-fp32.ll` 2. use neg bit to remove xor with 0x80008000 1. `CodeGen/AMDGPU/strict_fsub.f16.ll` 2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll` 3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll` 4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll` 5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll` 6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll` 7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll` 3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit 1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`	2025-04-18 10:56:20 +08:00
Changpeng Fang	8b46b98b91	AMDGPU: Fix the double rounding issue in v2f64 -> v2f16 conversion (#135659 ) On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64 -> v2f16 Legal, we will generate the following sequence of instructions: v_cvt_f32_f64_e32 v1, s[6:7] v_cvt_f32_f64_e32 v2, s[4:5] v_cvt_pk_f16_f32 v1, v2, v1 It possibly returns imprecise results due to double rounding. This patch fixes the issue by not setting the conversion Legal. While we may still expect the above sequence of code when unsafe fpmath is set, I hope https://github.com/llvm/llvm-project/pull/134738 can address that performance concern. Fixes: SWDEV-523856	2025-04-17 11:15:49 -07:00
Yingwei Zheng	5a993558c5	[InstSimplify] Fold `getelementptr inbounds null, idx -> null` (#130742 ) Proof: https://alive2.llvm.org/ce/z/5ZkPx- See also https://github.com/llvm/llvm-project/pull/130734 for the motivation.	2025-04-17 20:44:46 +08:00
Shoreshen	121cd7c6f0	Re apply 130577 narrow math for and operand (#133896 ) Re-apply https://github.com/llvm/llvm-project/pull/130577 Which is reverted in https://github.com/llvm/llvm-project/pull/133880 The old application failed in address sanitizer due to `tryNarrowMathIfNoOverflow` was called after `I.eraseFromParent();` in `AMDGPUCodeGenPrepareImpl::visitBinaryOperator`, it create a use after free failure. To fix this, `tryNarrowMathIfNoOverflow` will be called before and directly return if `tryNarrowMathIfNoOverflow` result in true.	2025-04-17 17:03:32 +08:00
Shoreshen	d647d66da6	[AMDGPU] Add illegal type convertion (#135729 ) Add more bit-convert tests for illegal types conversion	2025-04-17 12:14:34 +08:00
Vikram Hegde	123b0e2a1e	Reapply "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" (#135758 ) reapply https://github.com/llvm/llvm-project/pull/132358, tests updated.	2025-04-16 11:28:28 +05:30
Jun Wang	31f39c8325	[AMDGPU] Remove the AnnotateKernelFeatures pass (#130198 ) Previously the AnnotateKernelFeatures pass infers two attributes: amdgpu-calls and amdgpu-stack-objects, which are used to help determine if flat scratch init is allowed. PR #118907 created the amdgpu-no-flat-scratch-init attribute. Continuing with that work, this patch makes use of this attribute to determine flat scratch init, replacing amdgpu-calls and amdgpu-stack-objects. This also leads to the removal of the AnnotateKernelFeatures pass.	2025-04-15 15:17:33 -07:00
Kazu Hirata	f46cea5b42	Revert "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" This reverts commit 62ef10a0f62c668e1fa7e357f56052f3364544c5. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/132358	2025-04-14 23:03:55 -07:00
Vikram Hegde	62ef10a0f6	[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 ) Fixes https://github.com/llvm/llvm-project/issues/128650 Also adds few previously existing permlane64 tests which somehow got removed in between.	2025-04-15 10:51:58 +05:30
Pierre van Houtryve	c9eebc7af4	[GlobalISel] Combine redundant sext_inreg (#131624 )	2025-04-14 11:48:08 +02:00
Pierre van Houtryve	931a78a1db	[AMDGPU] Add sext_trunc in RegBankCombiner (#131623 )	2025-04-14 10:15:29 +02:00

1 2 3 4 5 ...

8607 Commits