llvm-project

Author	SHA1	Message	Date
Brox Chen	9d7e1d92db	[AMDGPU][True16] added Pre-RA hint to improve copy elimination (#103366 ) The allocation order of 16 bit registers is vgpr0lo16, vgpr0hi16, vgpr1lo16, vgpr1hi16, vgpr2lo16.... We prefer (essentially require) that allocation order, because it uses the minimum number of registers. But when you have 16 bit data passing between 16 and 32 bit instructions you get lots of COPY. This patch teach the compiler that a COPY of a 16-bit value from a 32 bit register to a lo-half 16 bit register is free, to a hi-half 16 bit register is not. This might get improved to coalescing with additional cases, and perhaps as an alternative to the RA hints. For now upstreaming this solution first.	2025-03-12 16:12:58 -04:00
Brox Chen	c8b40867d1	[AMDGPU][True16][CodeGen] test fix for uaddsat/usubsat true16 selection (#128784 ) This is a NFC change. Update the test file and fix the build https://github.com/llvm/llvm-project/pull/128233 is causing a build issue. This is caused by PR https://github.com/llvm/llvm-project/pull/127945 being merged while the 128233 is pending for review.	2025-02-25 20:04:57 -05:00
Brox Chen	61c6e0061c	[AMDGPU][True16][CodeGen] flat/global/scratch load/store pseudo for true16 (#127945 ) T16D16 table is implemented in https://github.com/llvm/llvm-project/pull/127673 this is a follow up patch to add load/store pseudo for: flat_store global_load/global_store scratch_load/scratch_store in true16 mode and updated the codegen test file	2025-02-21 17:06:48 -05:00
Brox Chen	6515fdf73d	[AMDGPU][True16][CodeGen] true16 codegen for FPMinMax pat (#125107 ) true16 codegen for FPMinMax Pattern	2025-02-04 11:20:17 -05:00
Stanislav Mekhanoshin	3277c7cd28	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765 ) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.	2024-10-21 09:39:52 -07:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00

7 Commits