llvm-project

Author	SHA1	Message	Date
Tiger Ding	4ab14685a0	[AMDGPU] Narrow only on store to pow of 2 mem location (#150093 ) Lowering in GlobalISel for AMDGPU previously always narrows to i32 on truncating store regardless of mem size or scalar size, causing issues with types like i65 which is first extended to i128 then stored as i64 + i8 to i128 locations. Narrowing only on store to pow of 2 mem location ensures only narrowing to mem size near end of legalization. This LLVM defect was identified via the AMD Fuzzing project.	2025-08-19 00:04:27 +09:00
Stanislav Mekhanoshin	ea14834966	[AMDGPU] Per-subtarget DPP instruction classification (#153096 ) This is NFCI at this point.	2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin	abc22f771e	[AMDGPU] Fix buffer addressing mode matching (#152584 ) Starting in gfx1250, voffset and immoffset are zero-extended from 32 bits to 45 bits before being added together.	2025-08-07 14:23:41 -07:00
Stanislav Mekhanoshin	b8eb61adc9	[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218 )	2025-08-05 16:25:23 -07:00
paperchalice	8bacfb2538	[AMDGPU] Remove `UnsafeFPMath` uses (#151079 ) Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-07-31 17:36:57 +08:00
Fabian Ritter	957ae8ad46	[AMDGPU][GISel] Use buildObjectPtrOffset instead of buildPtrAdd (#150899 ) This concerns offset computations for kernargs and RegBankLegalizeHelper::splitLoad, which should all be within the bounds of a memory object. See #150392 for the motivation for introducing the buildObjectPtrOffset function. For SWDEV-516125.	2025-07-30 08:30:27 +02:00
Stanislav Mekhanoshin	3dfd939a16	[AMDGPU] gfx1250 V_{MIN\|MAX}_{I\|U}64 opcodes (#151256 )	2025-07-29 19:13:51 -07:00
Changpeng Fang	6184ef1c2f	[AMDGPU] Support f64 atomics on gfx1250 (#151172 ) - BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64 - DS_ADD_F64 Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>	2025-07-29 09:41:00 -07:00
Stanislav Mekhanoshin	2346968807	[AMDGPU] Add V_ADD\|SUB\|MUL_U64 gfx1250 opcodes (#150291 )	2025-07-23 13:17:56 -07:00
Stanislav Mekhanoshin	2d6534b7da	[AMDGPU] gfx1250 64-bit relocations and fixups (#148951 )	2025-07-15 17:13:42 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Stanislav Mekhanoshin	d0a4af725e	[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594 ) Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2025-07-08 14:32:44 -07:00
Matt Arsenault	c80282d333	AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 (#141903 ) The hardware min/max follow the IR rules with IEEE mode disabled, so we can avoid the canonicalizes of the input. We lose the quieting of a signaling nan if both inputs are nans, but we only require that with strictfp.	2025-06-18 00:27:41 +09:00
Kazu Hirata	03f616eb3a	[llvm] Compare std::optional<T> to values directly (NFC) (#143340 ) This patch transforms: X && *X == Y to: X == Y where X is of std::optional<T>, and Y is of T or similar.	2025-06-08 22:37:59 -07:00
Changpeng Fang	70e78be7dc	AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (#141883 ) The latest asics support v_cvt_pk_f16_f32 instruction. However current implementation of vector fptrunc lowering fully scalarizes the vectors, and the scalar conversions may not always be combined to generate the packed one. We made v2f32 -> v2f16 legal in https://github.com/llvm/llvm-project/pull/139956. This work is an extension to handle wider vectors. Instead of fully scalarization, we split the vector to packs (v2f32 -> v2f16) to ensure the packed conversion can always been generated.	2025-06-06 15:15:24 -07:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
zGoldthorpe	bb7e559740	[AMDGPU] Correct bitshift legality transformation for small vectors (#140940 ) Fix for a bug found by the AMD fuzzing project. The legaliser would originally try to widen a small vector such as `<4 x i1>` to a single `i16` during the legalisation of bitshifts, as it was not originally written with consideration for vector operands. This patch simply adds a guard to prohibit this transformation and allow other legalisation transformations to step in.	2025-05-23 10:56:21 +02:00
Matt Arsenault	2e2bbcacf8	AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900 ) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).	2025-05-21 17:00:45 +02:00
Chinmay Deshpande	3a5af231fd	[GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574 ) Current behavior crashes the compiler. This bug was found using the AMDGPU Fuzzing project. Fixes SWDEV-508816.	2025-05-08 19:31:28 +02:00
Pierre van Houtryve	0d0eed419f	[AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (#131308 ) It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result.	2025-05-07 10:22:15 +02:00
Diana Picus	45d96df797	[AMDGPU] Support arbitrary types in amdgcn.dead (#134841 ) Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.	2025-05-05 14:08:00 +02:00
Changpeng Fang	8b46b98b91	AMDGPU: Fix the double rounding issue in v2f64 -> v2f16 conversion (#135659 ) On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64 -> v2f16 Legal, we will generate the following sequence of instructions: v_cvt_f32_f64_e32 v1, s[6:7] v_cvt_f32_f64_e32 v2, s[4:5] v_cvt_pk_f16_f32 v1, v2, v1 It possibly returns imprecise results due to double rounding. This patch fixes the issue by not setting the conversion Legal. While we may still expect the above sequence of code when unsafe fpmath is set, I hope https://github.com/llvm/llvm-project/pull/134738 can address that performance concern. Fixes: SWDEV-523856	2025-04-17 11:15:49 -07:00
Vikram Hegde	123b0e2a1e	Reapply "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" (#135758 ) reapply https://github.com/llvm/llvm-project/pull/132358, tests updated.	2025-04-16 11:28:28 +05:30
Kazu Hirata	f46cea5b42	Revert "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 )" This reverts commit 62ef10a0f62c668e1fa7e357f56052f3364544c5. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/132358	2025-04-14 23:03:55 -07:00
Vikram Hegde	62ef10a0f6	[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358 ) Fixes https://github.com/llvm/llvm-project/issues/128650 Also adds few previously existing permlane64 tests which somehow got removed in between.	2025-04-15 10:51:58 +05:30
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
Mariusz Sikora	4f5ccf22fa	[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-19 16:08:08 +01:00
Mariusz Sikora	575fde0995	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038 ) - Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-19 07:35:09 +01:00
Tim Gymnich	a5107be031	[NFC][AMDGPU][GlobalISel] Make LLTs constexpr (#131673 ) - static const -> constexpr	2025-03-18 08:30:17 +07:00
Tim Gymnich	887cf1f8ce	[AMDGPU][GlobalISel] Enable vector reductions (#131413 ) - Enable llvm vector reductions for AMDGPU. fixes https://github.com/llvm/llvm-project/issues/114816	2025-03-17 14:25:30 -07:00
Mariusz Sikora	bbabf4e2b8	[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-06 14:26:11 +01:00
Craig Topper	77cf6ecf78	[AMDGPU] Don't store an immediate in a Register. NFC	2025-03-04 22:17:17 -08:00
Brox Chen	e6f6a1e863	[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233 ) Enable gisel selection for uaddsat and usubsat in true16 flow This patch includes: 1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info for recognizing 16bit regclass id and bit width 2. uaddsat/usubsat test update	2025-02-25 17:09:34 -05:00
Matt Arsenault	1affadb7c6	AMDGPU: Drop legacy r600.read.global.size intrinsics from amdgcn (#128700 ) These ancient intrinsics were still consumed by the backend for libclc, which no longer uses them.	2025-02-25 22:21:03 +07:00
Matt Arsenault	37c341df28	Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 )" This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed. This is not a sound approach to dealing with this instruction change. The new behavior is a different opcode pair, not a modifier on the existing opcode.	2025-02-20 10:19:14 +07:00
Changpeng Fang	36eaf0daf5	AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711 ) For targets that support IEEE fminimum_num/fmaximum_num, the corresponding _min_num_fXY/_max_num_fXY instructions themselves already did the canonicalization for the inputs. As a result, we do not need to explicitly canonicalize the inputs for fminnum/fmaxnum.	2025-02-19 11:16:43 -08:00
Matt Arsenault	18ea6c9280	AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487 ) These cannot be static compile errors, and should be treated as poison. Invalid casts may be introduced which are dynamically dead. For example: ``` void foo(volatile generic int* x) { __builtin_assume(is_shared(x)); *x = 4; } void bar() { private int y; foo(&y); // violation, wrong address space } ``` This could produce a compile time backend error or not depending on the optimization level. Similarly, the new test demonstrates a failure on a lowered atomicrmw which required inserting runtime address space checks. The invalid cases are dynamically dead, we should not error, and the AtomicExpand pass shouldn't have to consider the details of the incoming pointer to produce valid IR. This should go to the release branch. This fixes broken -O0 compiles with 64-bit atomics which would have started failing in 1d0370872f28ec9965448f33db1b105addaf64ae.	2025-02-17 21:03:50 +07:00
Ivan Kosarev	b7188f6313	[AMDGPU][NFC] Remove an unneeded return value. (#126739 ) And rename the function to disassociate it from the one where generating loading of the input value may actually fail.	2025-02-11 16:10:49 +00:00
Craig Topper	4a486e773e	[CodeGen] Use Register/MCRegister::isPhysical. NFC	2025-01-18 23:37:03 -08:00
Tim Gymnich	2db2dc8ab9	[GlobalISel][NFC] Fix LLT Propagation (#119587 ) Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.	2024-12-12 09:47:46 -08:00
Jon Chesterfield	4e0ba801ea	Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no longer special" Test case didn't run locally, investigating This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.	2024-12-08 12:00:13 +00:00
Jon Chesterfield	7bad469182	[amdgpu][lds] Simplify error diag path - lds variable names are no longer special	2024-12-08 11:26:33 +00:00
Matt Arsenault	15676ec552	AMDGPU: Add support for V_CVT_PK_F16_F32 instruction for gfx950 (#118300 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-12-02 16:04:24 -05:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Stanislav Mekhanoshin	6d7e51de5e	[AMDGPU] Extend type support for update_dpp intrinsic (#114597 ) We can split 64-bit DPP as a post-RA pseudo if control values are supported, but cannot handle other types.	2024-11-05 13:59:14 -08:00
Krzysztof Drewniak	ea33af63de	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v3 (#114443 ) This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b. It seems I missed a spot when trying to ensure the code in the instruction selection tests were actually legalized MIR.	2024-11-01 11:13:29 -05:00
Stanislav Mekhanoshin	7cd29741fa	[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296 ) The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32. To allow a corresponding builtin to be overloaded the same way as int_amdgcn_mov_dpp we need it to be able to split unsupported values.	2024-10-31 01:15:25 -07:00
Mikhail Goncharov	8a849a2a56	Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v2 (#111708 )" This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f. New test fails on buildbots https://lab.llvm.org/buildbot/#/builders/63/builds/2039 https://lab.llvm.org/buildbot/#/builders/127/builds/1055	2024-10-10 13:37:44 +02:00
Krzysztof Drewniak	4b4a0d419c	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v2 (#111708 ) This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test lines to prevent behavior mismatches between debug and release builds The first attempted reapply was #111059 This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.	2024-10-09 17:11:41 -05:00

1 2 3 4 5 ...

750 Commits