llvm-project

Author	SHA1	Message	Date
vangthao95	d354ea6add	AMDGPU/GlobalISel: RegBankLegalize rules for buffer atomic cmpswap (#180666 )	2026-02-10 11:11:38 -08:00
Mirko Brkušanin	4280f0d241	[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516 )	2026-02-10 12:14:49 +01:00
Anshil Gandhi	bd6dd94584	[AMDGPU] Add legalization rules for atomicrmw max/min ops (#180502 ) Adds rules for G_ATOMICRMW_{MAX, MIN, UMAX, UMIN, UINC_WRAP, UDEC_WRAP}. Each of these generic opcode are supported for S32 and S64 types on flat, global and local address spaces.	2026-02-10 16:04:05 +05:30
Matt Arsenault	302ff8fd00	InstCombine: Use SimplifyDemandedFPClass on fmul (#177490 ) Start trying to use SimplifyDemandedFPClass on instructions, starting with fmul. This subsumes the old transform on multiply of 0. The main change is the introduction of nnan/ninf. I do not think anywhere was systematically trying to introduce fast math flags before, though a few odd transforms would set them. Previously we only called SimplifyDemandedFPClass on function returns with nofpclass annotations. Start following the pattern of SimplifyDemandedBits, where this will be called from relevant root instructions. I was wondering if this should go into InstCombineAggressive, but that apparently does not make use of InstCombineInternal's worklist.	2026-02-10 09:49:31 +00:00
Diana Picus	24405f070f	[AMDGPU] Add intrinsic exposing s_alloc_vgpr (#163951 ) Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge footgun and use for anything other than compiler internal purposes is heavily discouraged. The calling code must make sure that it does not allocate fewer VGPRs than necessary - the intrinsic is NOT a request to the backend to limit the number of VGPRs it uses (in essence it's not so different from what we do with the dynamic VGPR flags of the `amdgcn.cs.chain` intrinsic, it just makes it possible to use this functionality in other scenarios).	2026-02-10 09:28:31 +01:00
vangthao95	8d8864237b	AMDGPU/GlobalISel: Regbanklegalize rules for G_FSQRT (#179817 ) Add S16 rules for G_FSQRT. S32 and S64 are expanded by the legalizer.	2026-02-09 18:24:28 -08:00
Gheorghe-Teodor Bercea	d1dc843c18	[AMDGPU] Enable sinking of free vector ops that will be folded into their uses (#162580 ) Sinking ShuffleVectors / ExtractElement / InsertElement into user blocks can help enable SDAG combines by providing visibility to the values instead of emitting CopyTo/FromRegs. The sink IR pass disables sinking into loops, so this PR extends the CodeGenPrepare target hook shouldSinkOperands. Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-02-09 14:14:31 -05:00
vangthao95	404f9e6c99	AMDGPU/GlobalISel: RegBankLegalize rules for amdgcn_sffbh (#180099 ) Change test to use update_llc_test_checks.py and make `v_flbit` test actually divergent.	2026-02-09 09:18:03 -08:00
vangthao95	0040bdf532	AMDGPU/GlobalISel: Regbanklegalize rules for buffer atomic swap (#180265 )	2026-02-09 09:04:17 -08:00
Anshil Gandhi	ab2e10d80f	[AMDGPU] Add legalization rules for G_ATOMICRMW_FADD (#175257 ) G_ATOMICRMW_FADD is supported on flat, global and local address spaces for S32, S64 and V2S16 values.	2026-02-09 15:37:27 +00:00
Shilei Tian	65b4099219	[AMDGPU] Fix instruction size for 64-bit literal constant operands (#180387 ) `getLit64Encoding` uses a different approach to determine whether 64-bit literal encoding is used, which caused a size mismatch between the `MachineInstr` and the `MCInst`. For `!isValid32BitLiteral`, it is effectively `!(isInt<32>(Val) \|\| isUInt<32>(Val))`, which is `!isInt<32>(Val) && !isUInt<32>(Val)`, but in `getLit64Encoding`, it is `!isInt<32>(Val) \|\| !isUInt<32>(Val)`.	2026-02-09 14:31:52 +00:00
Shilei Tian	392f0c9767	[NFC][AMDGPU] Add a test to show the impact of wrong `s_mov_b64` instruction size (#180386 )	2026-02-09 08:56:28 -05:00
Mirko Brkušanin	45b037cf7a	[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191 )	2026-02-09 13:56:43 +01:00
Petr Kurapov	27a8ab09fa	[AMDGPU] Fix V_INDIRECT_REG_READ_GPR_IDX expansion with immediate index (#179699 ) The definition for V_INDIRECT_REG_READ_GPR_IDX_B32_V*'s SSrc_b32 operand allows immediates, but the expansion logic handles only register cases now. This can result in expansion failures when e.g. llvm.amdgcn.wave.reduce.umin.i32 is folded into a constant and then used as an insertelement idx.	2026-02-09 11:33:30 +01:00
Matt Arsenault	2ffb54364f	AMDGPU: Add a test for libcall simplify pow handling (#180491 ) This case could be turned into powr or pown, so track which case ends up preferred.	2026-02-09 10:01:26 +00:00
Pierre van Houtryve	b79ba02479	[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343 ) Load monitor operations make more sense as atomic operations, as non-atomic operations cannot be used for inter-thread communication w/o additional synchronization. The previous built-in made it work because one could just override the CPol bits, but that bypasses the memory model and forces the user to learn about ISA bits encoding. Making load monitor an atomic operation has a couple of advantages. First, the memory model foundation for it is stronger. We just lean on the existing rules for atomic operations. Second, the CPol bits are abstracted away from the user, which avoids leaking ISA details into the API. This patch also adds supporting memory model and intrinsics documentation to AMDGPUUsage. Solves SWDEV-516398.	2026-02-09 09:57:27 +01:00
Matt Arsenault	8554ed738f	AMDGPU: Add syntax for s_wait_event values (#180272 ) Previously this would just print hex values. Print names for the recognized values, matching the sp3 syntax.	2026-02-09 08:29:55 +00:00
Matt Arsenault	0c583e784e	AMDGPU: Add llvm.amdgcn.s.wait.event intrinsic (#180170 ) Exactly match the s_wait_event instruction. For some reason we already had this instruction used through llvm.amdgcn.s.wait.event.export.ready, but that hardcodes a specific value. This should really be a bitmask that can combine multiple wait types. gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the interpretation of the bit but also shifting the used bit by 1. Simplify the selection of the old intrinsic by just using the magic number 2, which should satisfy both cases.	2026-02-09 08:45:13 +01:00
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
paperchalice	5c5677d7b8	[llvm] Remove "no-infs-fp-math" attribute support (#180083 ) One of global options in `TargetMachine::resetTargetOptions`, now all backends no longer support it, remove it.	2026-02-09 08:43:33 +08:00
Alex Wang	a947599991	[AMDGPU][GlobalISel] Add lowering for G_FMODF (#180152 ) Add generic expansion for G_FMODF matching the SelectionDAG implementation. Enable G_FMODF lowering for AMDGPU with tests. Related: #179434	2026-02-07 18:43:55 +00:00
Jay Foad	269fda118a	[AMDGPU] Fix pattern selecting fmul to v_fma_mix_f32 (#180210 ) This needs to use an addend of -0.0 to get the correct result when the result should be -0.0.	2026-02-07 09:31:07 +00:00
Iasonaskrpr	6c6fb00c94	[AMDGPU] Optimize S_OR_B32 to S_ADDK_I32 where possible (#177949 ) This PR fixes #177753, converting disjoint S_OR_B32 to S_ADDK_I32 whenever possible, it avoids this transformation in case S_OR_B32 can be converted to bitset. Note on Test Failures (Draft Status) This change causes significant register reshuffling across the test suite due to the new allocation hints and the swaps performed in case src0 is not a register and src1, along with the change from or to addk. To avoid a massive, noisy diff during the initial logic review: This Draft PR only includes a representative sample of updated tests. CodeGen/AMDGPU/combine-reg-or-const.ll -> Showcases change from S_OR to S_ADDK CodeGen/AMDGPU/s-barrier.ll -> Showcases swap between Src0 and Src1 if src0 is not a register The rest of the tests show the result of the register allocation hint we give, I have checked every test I updated and they seem ok to me. Once the core logic is approved, I will run the update script across the remaining ~70 failing tests and mark the PR as "Ready for Review."	2026-02-07 09:10:12 +00:00
vangthao95	e67bfe85d9	[AMDGPU][GlobalISel] Fix D16 buffer load RegBankLegalize rules (#179982 ) Use fast StandardB rule and add uniform rules and uniform tests.	2026-02-06 08:01:30 -08:00
Frederik Harwath	272d6dd445	[AMDGPU] Support v_lshl_add_u64 with non-constant shift amount (#179904 ) This commit also adds GlobalISel testing to llvm/test/CodeGen/AMDGPU/lshl-add-u64.ll.	2026-02-06 16:03:58 +01:00
Jay Foad	4a6697f393	[AMDGPU] Fix and simplify patterns selecting fsub to v_fma_mix_f32 (#180169 ) Select (fsub x, y) -> (fma y, -1.0, x). Using -1.0 as the constant avoids the need for ComplexPatterns to negate x or y. This also fixes the bad pattern (fsub x, y) -> (fma -x, 1.0, y).	2026-02-06 14:39:13 +00:00
Mirko Brkušanin	20b5849e17	[AMDGPU] Define new target gfx1170 (#180185 )	2026-02-06 14:38:50 +01:00
Nikita Popov	0287d789e0	[ExpandIRInsts] Freeze input in itofp expansion (#180157 ) We are introducing branches on the value, and branch on undef/poison is UB, so the value needs to be frozen.	2026-02-06 12:52:31 +01:00
Pierre van Houtryve	6824db46c6	[AMDGPU] Set MOThreadPrivate on memory accesses for spills (#179414 ) Mark the memory operand of spill load/stores as MOThreadPrivate, so that these loads and stores are emitted with `nv` set. The reason is that scratch memory used by spills will never be shared by another thread. It's purely thread local and thus a good fit for the `nv` bit, which is controlled by the MOThreadPrivate flag.	2026-02-06 11:14:14 +00:00
Pierre van Houtryve	b738491d2f	[AMDGPU][GFX12.5] Add support for emitting memory operations with nv bit set (#179413 ) - Add `MONonVolatile` MachineMemOperand flag. - Set nv=1 on memory operations on GFX12.5 if the operation accesses a constant address space, is an invariant load, or has the `MONonVolatile` flag set.	2026-02-06 11:35:46 +01:00
Abhinav Garg	8cc06421a5	Adding support for G_STRICT_FMA in new reg bank select (#170330 ) This patch adds legalization rules for G_STRICT_FMA opcode. --------- Co-authored-by: Abhinav Garg <abhigarg@amd.com>	2026-02-06 15:43:37 +05:30
Petar Avramovic	9d11a6670e	AMDGPU/GlobalISel: Regbanklegalize rules for G_FREEZE (#179796 ) Move G_FREEZE handling to AMDGPURegBankLegalizeRules.cpp. Added support for uniform S1.	2026-02-06 11:05:47 +01:00
Steffen Larsen	5654ecd5dd	[DAGCombiner] Fix exact power-of-two signed division for large integers (#177340 ) Previously, the DAG combiner did not optimize exact signed division by a power-of-two constant divisor for integer types exceeding the size of division supported by the target architecture (e.g., i128 on x86-64). However, such an optimization was expected by the division expansion logic, leading to unsupported division operations making it to instruction selection. This commit addresses this issue by making an exception to the existing exclusion of signed division with the exact flag for the aforementioned operations. That is, the DAG combiner will now optimize exact signed division if the divisor is a power-of-two constant and the integer type exceeds the size of division supported by the target architecture. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>	2026-02-06 09:40:32 +01:00
vangthao95	1ef499b1fc	AMDGPU/GlobalISel: Fix buffer store RegBankLegalize rules (#179994 ) Enable commented out D16 v3f16 tests.	2026-02-05 16:20:09 -08:00
vangthao95	376dc83d7a	[AMDGPU][GlobalISel] Add RegBankLegalize rules for TFE buffer loads (#179529 )	2026-02-05 13:42:11 -08:00
Matt Arsenault	82799a448e	Reapply "AMDGPU: Use real copysign in fast pow (#97152 )" (#178036 ) This reverts commit bff619f91015a633df659d7f60f842d5c49351df. This was reverted due to regressions caused by poor copysign optimization, which have been fixed.	2026-02-05 20:40:38 +00:00
Alexander Weinrauch	3b16468814	[AMDGPU] Global and Buffer loads to LDS should not increase `lgkmcnt` (#179305 ) `global_load_lds` and `buffer_load to lds` do only increment `vmcnt` and not touch `lgkmcnt`. This causes invalid `waitcnts` for some Triton kernels, similar to the added lit tests. Note that the change for buffer ops is not necesssary, i.e. the lit test passes even before this PR, because it seems like `SIInsertWaitcnts` does not use `LGKM_CNT` for buffer ops. But this change might prevent a bug in the future.	2026-02-05 09:36:00 -08:00
anjenner	903a5ab93d	[AMDGPU] [GlobalISel] Add register bank legalize rules for G_FEXP2 (#179954 ) Also G_INTRINSIC_TRUNC, G_INTRINSIC_ROUNDEVEN, G_FFLOOR, G_FCEIL, and G_FLOG2.	2026-02-05 16:35:31 +00:00
Vigneshwar Jayakumar	2dcd75eb44	[AMDGPU] Fix missing waitcnt after buffer_wbl2 (#178316 ) On GFX9, BUFFER_WBL2 is used to write back dirty cache lines and requires an s_waitcnt vmcnt(0) afterwards to ensure completion. This patch fixes by incrementing vmcnt for buffer_wbl2 instruction --------- Co-authored-by: Jay Foad <jay.foad@gmail.com>	2026-02-05 10:13:51 -06:00
Nikita Popov	d3fb3c5d36	[GISel][CallLowering] Keep IR types longer (#179946 ) GISel CallLowering currently does a Type -> EVT -> Type roundtrip early on when populating ArgInfo in splitToValueType(). This is a bit odd as this structure operates at the IR Type level. Keep the original type there and only convert to EVT when performing assignments.	2026-02-05 16:37:08 +01:00
vangthao95	e0c2cc7ed0	[AMDGPU][GlobalISel] Add buffer store byte/short RegBankLegalize rules (#179367 )	2026-02-05 07:18:39 -08:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Acim Maravic	b0827f3b36	[LLVM] Select fma_mix for v_cvt_f32_f16 and v_add_f32/v_mul_f32 (#160151 )	2026-02-05 11:51:25 +01:00
Matt Arsenault	8461579298	AMDGPU: Add nofpclass when expanding pow (#177933 ) The codegen regression is tracked in #177913	2026-02-05 07:40:21 +01:00
Nicolai Hähnle	3e1e86ef1f	[AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics (#175845 ) Accurately represent both the load and the store part of those intrinsics. The test changes seem to be mostly fairly insignificant changes caused by subtly different scheduler behavior.	2026-02-04 12:29:49 -08:00
Alex Wang	b33a0e6101	[SelectionDAG] Add expansion for llvm.modf intrinsic (#179434 ) Targets without a `modf` libcall lower the intrinsic directly, matching the existing `llvm.frexp` expansion. Targets with an existing libcall are unchanged. Fixes #173021	2026-02-04 21:25:47 +01:00
vangthao95	273ee97738	[AMDGPU][GlobalISel] Add G_SADDE/SSUBE RegBankLegalize rule (#179603 )	2026-02-04 09:41:27 -08:00
vangthao95	b0aea0539f	[AMDGPU][GlobalISel] Add buffer load format D16 RegBankLegalize rules (#179566 )	2026-02-04 09:20:41 -08:00
Brox Chen	2e58f6024a	[AMDGPU][True16] t16 pseudo for mubuffer d16 load/store (#178822 ) create t16 pseudos for mubuffer d16 load/store with vgpr16 in vdst/vdata and use these t16 pseudo for isel pattern. Lower them back to d16 machine inst in mc level.	2026-02-04 10:54:11 -05:00
Carl Ritson	be9ba44256	[AMDGPU] Add machineFunctionInfo to recent MIR tests (#179602 ) Initialize machineFunctionInfo in recently added MIR tests to assist in downstream testing.	2026-02-04 22:12:01 +09:00

1 2 3 4 5 ...

9987 Commits