llvm-project

Author	SHA1	Message	Date
Jay Foad	466d266945	[AMDGPU] Fix GFX90x check prefixes in tests (#92254 )	2024-05-15 15:13:53 +01:00
Yingwei Zheng	3aae916ff7	Reland "[ValueTracking] Compute knownbits from known fp classes" (#92084 ) This patch relands https://github.com/llvm/llvm-project/pull/86409. I mistakenly thought that `Known.makeNegative()` clears the sign bit of `Known.Zero`. This patch fixes the assertion failure by explicitly clearing the sign bit.	2024-05-14 18:10:28 +08:00
Martin Storsjö	2e165a2c4b	Revert "[ValueTracking] Compute knownbits from known fp classes (#86409 )" This reverts commit d03a1a6e5838c7c2c0836d71507dfdf7840ade49. This change caused failed assertions, see https://github.com/llvm/llvm-project/pull/86409#issuecomment-2109469845 for details.	2024-05-14 10:37:23 +03:00
Stanislav Mekhanoshin	efc7bbb917	[AMDGPU] Make v2bf16 BUILD_VECTOR legal (#92022 ) There is nothing specific here and it is not different from i16 or f16.	2024-05-13 14:53:26 -07:00
Leon Clark	bd679865c0	[AMDGPU] Add tests for vector rebroadcast. (#91322 ) Co-authored-by: Leon Clark <leoclark@amd.com>	2024-05-13 19:39:26 +01:00
Yingwei Zheng	d03a1a6e58	[ValueTracking] Compute knownbits from known fp classes (#86409 ) This patch calculates knownbits from fp instructions/dominating fcmp conditions. It will enable more optimizations with signbit idioms.	2024-05-14 01:58:45 +08:00
Vikram Hegde	cb4fca929b	[AMDGPU] Extend llvm.amdgcn.update.dpp intrinsic to support f64 (#91190 ) Follow up patch to https://github.com/llvm/llvm-project/pull/89217, before we make changes to atomic optimizer.	2024-05-13 10:20:37 +05:30
Stanislav Mekhanoshin	5d18d575d8	[AMDGPU] Make fneg/fabs/copysign legal for bf16 (#91676 ) These are just bit operations, exactly the same as with f16.	2024-05-10 14:33:47 -07:00
Petar Avramovic	f5e49279c0	AMDGPU: fix isSafeToSink expecting exactly one predecessor (#89224 ) isSafeToSink needs to check if machine cycle has divergent exit branch but first it needs the MBB that contains cycle exit branch. Early-tailduplication can delete exit block created by structurize-cfg so there is still exactly one cycle exit block but the new cycle exit block can have multiple predecessors. Simplify search for MBBs that contain cycle exit branch by introducing helper method getExitingBlocks in GenericCycle. Fixes #89200	2024-05-10 13:02:05 +02:00
Matt Arsenault	6a8d30b1c1	DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326 ) dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to assume these order -0 as less than +0.	2024-05-09 14:41:13 +02:00
Jay Foad	6eb9e214b3	RFC: [AMDGPU] Check subtarget features for consistency (#86957 ) Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to check subtarget features for consistency and diagnose any inconsistencies. To start with, the implementation just checks that either wavefrontsize32 or wavefrontsize64 is selected. checkSubtargetFeatures is called at the start of instruction selection. This is pretty arbitrary. It is just a convenient point at which we have access to the subtarget that we're going to use for codegenning a particular function.	2024-05-09 11:37:28 +01:00
Jay Foad	2cbfe4a823	[AMDGPU] Remove duplicate -mtriple options in tests (#91576 )	2024-05-09 10:59:25 +01:00
Matt Arsenault	b5afda8d76	AMDGPU: Add some more ctlz_zero_undef tests	2024-05-08 16:13:22 +02:00
Stanislav Mekhanoshin	2a3903fa0e	[AMDGPU] Prevent FMINIMUM and FMAXIMUM beeing fully scalarized (#91378 ) This is the same logic as with FMINNUM_IEEE/FMAXNUM_IEEE.	2024-05-07 14:20:13 -07:00
Shilei Tian	0b50d095bc	[AMDGPU] Don't optimize agpr phis if the operand doesn't have subreg use (#91267 ) If the operand doesn't have any subreg use, the optimization could potentially generate `V_ACCVGPR_READ_B32_e64` with wrong register class. The following example demonstrates the issue. Input MIR: ``` bb.0: %0:sgpr_32 = S_MOV_B32 0 %1:sgpr_128 = REG_SEQUENCE %0:sgpr_32, %subreg.sub0, %0:sgpr_32, %subreg.sub1, %0:sgpr_32, %subreg.sub2, %0:sgpr_32, %subreg.sub3 %2:vreg_128 = COPY %1:sgpr_128 %3:areg_128 = COPY %2:vreg_128, implicit $exec bb.1: %4:areg_128 = PHI %3:areg_128, %bb.0, %6:areg_128, %bb.1 %5:areg_128 = PHI %3:areg_128, %bb.0, %7:areg_128, %bb.1 ... ``` Output of current implementation: ``` bb.0: %0:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec %1:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec %2:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec %3:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec %4:areg_128 = REG_SEQUENCE %0:agpr_32, %subreg.sub0, %1:agpr_32, %subreg.sub1, %2:agpr_32, %subreg.sub2, %3:agpr_32, %subreg.sub3 %5:vreg_128 = V_ACCVGPR_READ_B32_e64 %4:areg_128, implicit $exec %6:areg_128 = COPY %46:vreg_128 bb.1: %7:areg_128 = PHI %6:areg_128, %bb.0, %9:areg_128, %bb.1 %8:areg_128 = PHI %6:areg_128, %bb.0, %10:areg_128, %bb.1 ... ``` The problem is the generated `V_ACCVGPR_READ_B32_e64` instruction. Apparently the operand `%4:areg_128` is not valid for this. In this patch, we don't count the none-subreg use because `V_ACCVGPR_READ_B32_e64` can't handle none-32-bit operand. Fixes: SWDEV-459556	2024-05-07 16:44:00 -04:00
Matt Arsenault	f548c4d83c	AMDGPU: Add mode register use to s_getreg_b32 This should fix reading the wrong mode after setting the mode. Ideally we would have separate pseudos for the case that we know does not read mode.	2024-05-07 16:06:51 +02:00
Sameer Sahasrabuddhe	8a65ee8b2a	[AMDGPU] don't mark control-flow intrinsics as convergent (#90026 ) This is really a workaround to allow control flow lowering in the presence of convergence control tokens. Control-flow intrinsics in LLVM IR are convergent because they indirectly represent the wave CFG, i.e., sets of threads that are "converged" or "execute in lock-step". But they exist during a small window in the lowering process, inserted after the structurizer and then translated to equivalent MIR pseudos. So rather than create convergence tokens for these builtins, we simply mark them as not convergent. The corresponding MIR pseudos are marked as having side effects, which is sufficient to prevent optimizations without having to mark them as convergent.	2024-05-06 14:07:11 +05:30
Matt Arsenault	d654278bde	Reapply "AMDGPU: Implement llvm.set.rounding (#88587 )" series (#91113 ) Revert "Revert 4 last AMDGPU commits to unbreak Windows bots" This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f. MSVC does not like constexpr on the definition after an extern declaration of a global.	2024-05-06 09:09:19 +02:00
Mehdi Amini	0d493ed2c6	Revert 4 last AMDGPU commits to unbreak Windows bots Revert "AMDGPU: Try to fix build error with old gcc" This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb. Revert "AMDGPU: Use umin in set.rounding expansion" This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456. Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)" This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd. Revert "AMDGPU: Implement llvm.set.rounding (#88587)" This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.	2024-05-04 19:57:33 +02:00
Matt Arsenault	7ec698e6ed	AMDGPU: Add tests for minimum and maximum intrinsics (#90997 ) Baseline tests for new expansion. I think we can do better and avoid the classes.	2024-05-03 21:43:30 +02:00
Abhinav Garg	76508dce43	[AMDGPU] Fix mode register pass for constrained FP operations (#90085 ) This PR will fix the si-mode-register pass which is inserting an extra setreg instruction in case of constrained FP operations. This pass will be ignored for strictfp functions.	2024-05-03 19:47:15 +02:00
Matt Arsenault	a56f0b51dd	AMDGPU: Use umin in set.rounding expansion Addresses comment from #88587	2024-05-03 19:22:19 +02:00
Jay Foad	cd4287bc44	[AMDGPU] Convert PrologEpilogSGPRSpills from DenseMap to sorted vector (#90957 ) In practice PrologEpilogSGPRSpills never has more than 3 entries so DenseMap is overkill. In addition this means that iteration happens in register number order, instead of DenseMap's hashed order, so it will not be affected by future patches that define new physical registers. This should reduce future test case churn.	2024-05-03 13:42:40 +01:00
Matt Arsenault	4e67b5058e	AMDGPU: Add more tests for atomicrmw handling Add agent scope copies of atomicrmw atomics tests. Expand testing for the undo identity atomicrmw case. Test 16-bit atomic expansions.	2024-05-03 11:50:59 +02:00
Matt Arsenault	9f9856d623	AMDGPU: Update name for amdgpu.no.remote.memory metadata	2024-05-03 11:50:59 +02:00
Matt Arsenault	b4e751e2ab	AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588 ) We don't need to figure out the weird extended rounding modes or handle offsets to keep the lookup table in 64-bits. https://reviews.llvm.org/D153258 Depends #88587	2024-05-03 11:17:18 +02:00
Carl Ritson	44648ccb8b	[AMDGPU] Always emit lds_size in PAL ELF Metadata 3.0 (#87222 ) Emit lds_size for all shader types in PAL metadata.	2024-05-03 17:01:03 +09:00
Matt Arsenault	9731b77e80	AMDGPU: Implement llvm.set.rounding (#88587 ) Use a shift of a magic constant and some offseting to convert from flt_rounds values. I don't know why the enum defines Dynamic = 7. The standard suggests -1 is the cannot determine value. If we could start the extended values at 4 we wouldn't need the extra compare sub and select. https://reviews.llvm.org/D153257	2024-05-03 09:41:27 +02:00
Stanislav Mekhanoshin	57216f7bd6	[AMDGPU] Support byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f32_bf8 (#90887 )	2024-05-02 12:03:51 -07:00
Scott Egerton	eb8236381b	[AMDGPU] Group multiple single use producers under one single use instruction. (#90713 ) Previously each single use producer would be marked with a "S_SINGLEUSE_VDST 1" instruction. This patch adds support for larger immediates that encode multiple single use producers into one S_SINGLEUSE_VDST instruction.	2024-05-02 17:30:11 +01:00
paperchalice	3fe282a83d	[Pass] Add `pre-isel-intrinsic-lowering` to pass registry (#90851 ) This was removed due to avoid circular dependency between `CodeGen` and `Passes`, but now I realized this is no longer a problem since `CodeGenPassBuilder` is moved into `Passes`.	2024-05-02 21:57:46 +08:00
Valery Pykhtin	981aa6fcf6	[AMDGPU] Fix incorrect stepping in gdb for amdgcn.end.cf intrinsic. (#83010 ) After #73958 gdb.rocm/lane-execution.exp test started to fail due to incorrect debug location. This is kind of a revert patch.	2024-05-02 12:59:31 +02:00
Gang Chen	167427f5db	[AMDGPU] change order of fp and sp in kernel prologue (#90626 ) change order of fp and sp in kernel prologue also related codegen tests to make it easier to merge code into our downstream branches Signed-off-by: gangc <gangc@amd.com>	2024-05-01 08:16:55 -07:00
Matt Arsenault	39e24bdd8e	MachineLICM: Allow hoisting REG_SEQUENCE (#90638 )	2024-05-01 16:52:04 +02:00
David Stuttard	f898161bfa	[AMDGPU] Fix image_msaa_load waitcnt insertion for pre-gfx12 (#90710 ) https://github.com/llvm/llvm-project/pull/90201 made some fixes for gfx12 image_msaa_load waitcnt insertion. That fix might break in some situations for pre-gfx12 - this fixes that by explitly checking for VSAMPLE which always requires a s_wait_samplecnt and leaves the previous logic intact for non-gfx12.	2024-05-01 11:37:57 +01:00
David Stuttard	5fb1e2825f	[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595 ) Code to determine if a waitcnt is required before a barrier instruction only considered S_BARRIER. gfx12 adds barrier_signal/wait so need to enhance the existing code to look for a barrier start (which is just an S_BARRIER for earlier architectures).	2024-05-01 11:37:13 +01:00
Jay Foad	0b21b25eac	[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716 ) The autogenerated memory legalizer tests use -O0 so this allows us to see the exact waitcnts that were inserted by the memory legalizer without them being optimized away.	2024-05-01 11:29:11 +01:00
Scott Egerton	d97f25b948	[AMPGPU] Emit s_singleuse_vdst instructions when a register is used multiple times in the same instruction. (#89601 ) Previously, multiple uses of a register within the same instruction were being counted as multiple uses. This has been corrected to only count as a single use as per the specification allowing for more optimisation candidates.	2024-04-30 17:14:01 +01:00
David Stuttard	62dea99a7d	[AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201 ) image_msaa_load is actually encoded as a VSAMPLE instruction and requires the appropriate waitcnt variant.	2024-04-30 10:41:51 +01:00
Shilei Tian	d47c4984e9	[AMDGPU][ISel] Add more trunc store actions regarding bf16 (#90493 )	2024-04-29 18:27:52 -04:00
Jannik Silvanus	5f9ae61dee	[Support][YamlTraits] Add quoting for keys in textual YAML representation (#88763 ) The support library contains helpers to parse and emit YAML documents. In the textual YAML representation, some strings need to be quoted, e.g. when containing unprintable characters. We already have such quoting implemented for YAML values. This patch applies the same quoting to YAML keys. One affected case is output of control registers in AMDGPU Msgpack metadata, which are printed in a format like this: ``` 0x2cca (SPI_SHADER_PGM_RSRC1_ES): 42 ``` With this patch, the key is quoted: ``` '0x2cca (SPI_SHADER_PGM_RSRC1_ES)': 42 ``` Most test changes come from this pattern.	2024-04-29 15:37:42 +02:00
Shilei Tian	8e17c84836	[AMDGPU][ISel] Set trunc store action to expand for v4f32->v4bf16 (#90427 )	2024-04-29 09:08:54 -04:00
Bjorn Pettersson	55c6bda01e	Revert "Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 )" and more..." This reverts commit 16bd10a38730fed27a3bf111076b8ef7a7e7b3ee. Re-applies: b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" 8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)" 73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)" with a fix in DAGCombiner::visitFREEZE.	2024-04-29 13:08:52 +02:00
David Stuttard	2914a11e3f	[AMDGPU] Fix hard clausing for image instructions on gfx12 (#90221 ) Also updated hard-clauses.mir to have separate versions for gfx11 and gfx12 since the MIR instructions are different for each of them.	2024-04-29 11:42:36 +01:00
David Spickett	16bd10a387	Revert "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 )" and more... This reverts: b3c55b707110084a9f50a16aade34c3be6fa18da - "[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921)" (because it updates a test case that I don't know how to resolve the conflict for) 8e2f6495c0bac1dd6ee32b6a0d24152c9c343624 - "[DAGCombiner] Do not always fold FREEZE over BUILD_VECTOR (#85932)" 73472c5996716cda0dbb3ddb788304e0e7e6a323 - "[SelectionDAG] Treat CopyFromReg as freezing the value (#85932)" Due to a test suite failure on AArch64 when compiling for SVE. https://lab.llvm.org/buildbot/#/builders/197/builds/13955 clang: ../llvm/llvm/include/llvm/CodeGen/ValueTypes.h:307: MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.	2024-04-29 09:47:41 +01:00
Björn Pettersson	b3c55b7071	[SelectionDAG] Handle more opcodes in canCreateUndefOrPoison (#84921 ) [SelectionDAG] Handle more opcodes in canCreateUndefOrPoison Handle SELECT_CC similarly as SETCC. Handle these operations that only propagate poison/undef based on the input operands: SADDSAT, UADDSAT, SSUBSAT, USUBSAT, MULHU, MULHS, SMIN, SMAX, UMIN, UMAX These operations may create poison based on shift amount and exact flag being violated: SRL, SRA One goal here is to allow pushing freeze through these operations when allowed, as well as letting analyses such as isGuaranteedNotToBeUndefOrPoison to not break on such operations. Since some problems have been observed with pushing freeze through SRA/SRL we block that explicitly in DAGCombiner::visitFreeze now. That way we can still model SRA/SRL properly in SelectionDAG::canCreateUndefOrPoison, e.g. when used by isGuaranteedNotToBeUndefOrPoison, even if we do not want to push freeze through those instructions.	2024-04-29 07:56:49 +02:00
Stanislav Mekhanoshin	6e722bbe30	[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244 )	2024-04-26 13:02:57 -07:00
Matt Arsenault	2a95022cff	AMDGPU: Add atomic bfloat load/store codegen tests	2024-04-25 16:08:11 +02:00
Emma Pilkington	2c50f8ffbb	[AMDGPU] Include missing FeatureMADIntraFwdBug in gfx11-generic (#89936 ) It seems like this happened because #79460 moved this from `FeatureISAVersion11_Common` to `FeatureISAVersion11_0_Common` while #76955 was being reviewed.	2024-04-25 09:24:07 -04:00
Abhinav Garg	007e859258	AMDGPU: Pre-commit test to verify mode change in fp constrained operations (#88858 ) This test will check the mode register in case of constrained floating point operations. --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2024-04-24 20:36:05 +02:00

1 2 3 4 5 ...

7376 Commits