llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	fe5f49942e	[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122 ) Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.	2025-10-24 14:48:27 +02:00
paperchalice	14a42e64cf	[AMDGPU] Remove NoInfsFPMath uses (#163028 ) Only `ninf` should be used.	2025-10-13 19:15:49 +08:00
Shilei Tian	2195fe7e01	[AMDGPU] Add the support for 45-bit buffer resource (#159702 ) On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-09-24 11:12:02 -04:00
Stanislav Mekhanoshin	76efbc068a	[AMDGPU] Fix codegen to emit COPY instead of S_MOV_B64 for aperture regs (#158754 )	2025-09-16 02:26:32 -07:00
Shilei Tian	1180c2ced0	[AMDGPU] Support lowering of cluster related instrinsics (#157978 ) Since many code are connected, this also changes how workgroup id is lowered. Co-authored-by: Jay Foad <jay.foad@amd.com> Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-09-12 21:11:17 -04:00
Anshil Gandhi	c6899193ed	[AMDGPU][Legalizer] Avoid pack/unpack for G_FSHR (#156796 ) Scalarize G_FSHR only if the subtarget does not support V2S16 type.	2025-09-04 17:12:57 -06:00
Pierre van Houtryve	e2bd10cf16	[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418 ) - Add clang built-ins + sema/codegen - Add IR Intrinsic + verifier - Add DAG/GlobalISel codegen for the intrinsics - Add lowering in SIMemoryLegalizer using a MMO flag.	2025-09-04 09:19:25 +00:00
paperchalice	595573d1ed	[AMDGPU] Remove `ApproxFuncFPMath` uses (#155578 ) One of options in `resetTargetOptions`, this removes `ApproxFuncFPMath` in AMDGPU part.	2025-08-28 11:09:01 +08:00
Tiger Ding	4ab14685a0	[AMDGPU] Narrow only on store to pow of 2 mem location (#150093 ) Lowering in GlobalISel for AMDGPU previously always narrows to i32 on truncating store regardless of mem size or scalar size, causing issues with types like i65 which is first extended to i128 then stored as i64 + i8 to i128 locations. Narrowing only on store to pow of 2 mem location ensures only narrowing to mem size near end of legalization. This LLVM defect was identified via the AMD Fuzzing project.	2025-08-19 00:04:27 +09:00
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	abc22f771e	[AMDGPU] Fix buffer addressing mode matching (#152584 ) Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.	2025-08-07 14:23:41 -07:00
Stanislav Mekhanoshin	b8eb61adc9	[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218 )	2025-08-05 16:25:23 -07:00
paperchalice	8bacfb2538	[AMDGPU] Remove `UnsafeFPMath` uses (#151079 ) Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-07-31 17:36:57 +08:00
Fabian Ritter	957ae8ad46	[AMDGPU][GISel] Use buildObjectPtrOffset instead of buildPtrAdd (#150899 ) This concerns offset computations for kernargs and RegBankLegalizeHelper::splitLoad, which should all be within the bounds of a memory object. See #150392 for the motivation for introducing the buildObjectPtrOffset function. For SWDEV-516125.	2025-07-30 08:30:27 +02:00
Stanislav Mekhanoshin	3dfd939a16	[AMDGPU] gfx1250 V_{MIN\|MAX}_{I\|U}64 opcodes (#151256 )	2025-07-29 19:13:51 -07:00
Changpeng Fang	6184ef1c2f	[AMDGPU] Support f64 atomics on gfx1250 (#151172 ) - BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64 - DS_ADD_F64 Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>	2025-07-29 09:41:00 -07:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00
Stanislav Mekhanoshin	2d6534b7da	[AMDGPU] gfx1250 64-bit relocations and fixups (#148951 )	2025-07-15 17:13:42 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Stanislav Mekhanoshin	d0a4af725e	[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594 ) Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2025-07-08 14:32:44 -07:00
Matt Arsenault	c80282d333	AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 (#141903 ) The hardware min/max follow the IR rules with IEEE mode disabled, so we can avoid the canonicalizes of the input. We lose the quieting of a signaling nan if both inputs are nans, but we only require that with strictfp.	2025-06-18 00:27:41 +09:00
Kazu Hirata	03f616eb3a	[llvm] Compare std::optional<T> to values directly (NFC) (#143340 ) This patch transforms: X && *X == Y to: X == Y where X is of std::optional<T>, and Y is of T or similar.	2025-06-08 22:37:59 -07:00
Changpeng Fang	70e78be7dc	AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (#141883 ) The latest asics support v_cvt_pk_f16_f32 instruction. However current implementation of vector fptrunc lowering fully scalarizes the vectors, and the scalar conversions may not always be combined to generate the packed one. We made v2f32 -> v2f16 legal in https://github.com/llvm/llvm-project/pull/139956. This work is an extension to handle wider vectors. Instead of fully scalarization, we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always been generated.	2025-06-06 15:15:24 -07:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
zGoldthorpe	bb7e559740	[AMDGPU] Correct bitshift legality transformation for small vectors (#140940 ) Fix for a bug found by the AMD fuzzing project. The legaliser would originally try to widen a small vector such as `<4 x i1>` to a single `i16` during the legalisation of bitshifts, as it was not originally written with consideration for vector operands. This patch simply adds a guard to prohibit this transformation and allow other legalisation transformations to step in.	2025-05-23 10:56:21 +02:00
Matt Arsenault	2e2bbcacf8	AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900 ) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).	2025-05-21 17:00:45 +02:00
Chinmay Deshpande	3a5af231fd	[GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574 ) Current behavior crashes the compiler. This bug was found using the AMDGPU Fuzzing project. Fixes SWDEV-508816.	2025-05-08 19:31:28 +02:00
Pierre van Houtryve	0d0eed419f	[AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (#131308 ) It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result.	2025-05-07 10:22:15 +02:00
Diana Picus	45d96df797	[AMDGPU] Support arbitrary types in amdgcn.dead (#134841 ) Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.	2025-05-05 14:08:00 +02:00
Changpeng Fang	8b46b98b91	AMDGPU: Fix the double rounding issue in v2f64 -> v2f16 conversion (#135659 ) On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64 -> v2f16 Legal, we will generate the following sequence of instructions: v_cvt_f32_f64_e32 v1, s[6:7] v_cvt_f32_f64_e32 v2, s[4:5] v_cvt_pk_f16_f32 v1, v2, v1 It possibly returns imprecise results due to double rounding. This patch fixes the issue by not setting the conversion Legal. While we may still expect the above sequence of code when unsafe fpmath is set, I hope https://github.com/llvm/llvm-project/pull/134738 can address that performance concern. Fixes: SWDEV-523856	2025-04-17 11:15:49 -07:00
Vikram Hegde	123b0e2a1e	Reapply "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" (#135758 ) reapply https://github.com/llvm/llvm-project/pull/132358, tests updated.	2025-04-16 11:28:28 +05:30
Kazu Hirata	f46cea5b42	Revert "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" This reverts commit 62ef10a0f62c668e1fa7e357f56052f3364544c5. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/132358	2025-04-14 23:03:55 -07:00
Vikram Hegde	62ef10a0f6	[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 ) Fixes https://github.com/llvm/llvm-project/issues/128650 Also adds few previously existing permlane64 tests which somehow got removed in between.	2025-04-15 10:51:58 +05:30
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
Mariusz Sikora	4f5ccf22fa	[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-19 16:08:08 +01:00
Mariusz Sikora	575fde0995	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038 ) - Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-19 07:35:09 +01:00
Tim Gymnich	a5107be031	[NFC][AMDGPU][GlobalISel] Make LLTs constexpr (#131673 ) - static const -> constexpr	2025-03-18 08:30:17 +07:00
Tim Gymnich	887cf1f8ce	[AMDGPU][GlobalISel] Enable vector reductions (#131413 ) - Enable llvm vector reductions for AMDGPU. fixes https://github.com/llvm/llvm-project/issues/114816	2025-03-17 14:25:30 -07:00
Mariusz Sikora	bbabf4e2b8	[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-06 14:26:11 +01:00
Craig Topper	77cf6ecf78	[AMDGPU] Don't store an immediate in a Register. NFC	2025-03-04 22:17:17 -08:00
Brox Chen	e6f6a1e863	[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233 ) Enable gisel selection for uaddsat and usubsat in true16 flow This patch includes: 1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info for recognizing 16bit regclass id and bit width 2. uaddsat/usubsat test update	2025-02-25 17:09:34 -05:00
Matt Arsenault	1affadb7c6	AMDGPU: Drop legacy r600.read.global.size intrinsics from amdgcn (#128700 ) These ancient intrinsics were still consumed by the backend for libclc, which no longer uses them.	2025-02-25 22:21:03 +07:00
Matt Arsenault	37c341df28	Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 )" This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed. This is not a sound approach to dealing with this instruction change. The new behavior is a different opcode pair, not a modifier on the existing opcode.	2025-02-20 10:19:14 +07:00
Changpeng Fang	36eaf0daf5	AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 ) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding _min_num_fXY/_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.	2025-02-19 11:16:43 -08:00
Matt Arsenault	18ea6c9280	AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487 ) These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d0370872f28ec9965448f33db1b105addaf64ae.	2025-02-17 21:03:50 +07:00
Ivan Kosarev	b7188f6313	[AMDGPU][NFC] Remove an unneeded return value. (#126739 ) And rename the function to disassociate it from the one where generating loading of the input value may actually fail.	2025-02-11 16:10:49 +00:00
Craig Topper	4a486e773e	[CodeGen] Use Register/MCRegister::isPhysical. NFC	2025-01-18 23:37:03 -08:00
Tim Gymnich	2db2dc8ab9	[GlobalISel][NFC] Fix LLT Propagation (#119587 ) Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.	2024-12-12 09:47:46 -08:00
Jon Chesterfield	4e0ba801ea	Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no longer special" Test case didn't run locally, investigating This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.	2024-12-08 12:00:13 +00:00
Jon Chesterfield	7bad469182	[amdgpu][lds] Simplify error diag path - lds variable names are no longer special	2024-12-08 11:26:33 +00:00

1 2 3 4 5 ...

758 Commits