llvm-project

Author	SHA1	Message	Date
Diana Picus	ff93c766ca	[AMDGPU] Tail call support for whole wave functions Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask. Unnecessary register spills will be dealt with in a future patch.	2025-08-18 12:13:59 +02:00
Adam Nemet	124722bfe5	Revert "[CG] Add VTs for v[567]i1 and v[567]f16" (#152217 ) Reverts llvm/llvm-project#151763 It caused: https://github.com/llvm/llvm-project/issues/152150	2025-08-05 16:47:50 -07:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Adam Nemet	300e41d72f	[CG] Add VTs for v[567]i1 and v[567]f16 (#151763 ) We already had corresponding f32 and i32 vector types for these sizes. Also add VTs v[567]i8 and v[567]i16: these are needed by the Hexagon backend which for each i1 vector types want to query information about the corresponding i8 and i16 types in HexagonTargetLowering::getPreferredHvxVectorAction.	2025-08-02 09:00:31 -07:00
paperchalice	8bacfb2538	[AMDGPU] Remove `UnsafeFPMath` uses (#151079 ) Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to clang and the ultimate goal is to remove `resetTargetOptions` method in `TargetMachine`, see FIXME in `resetTargetOptions`. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-07-31 17:36:57 +08:00
Daniil Fukalov	e650c4b9ef	[NFC][AMDGPU] Move cmp+select arguments optimization to SIISelLowering. (#150929 ) As requested in #148740.	2025-07-28 22:11:36 +02:00
Nikita Popov	fe0dbe0f29	[CodeGen] More consistently expand float ops by default (#150597 ) These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.	2025-07-28 09:46:00 +02:00
Jay Foad	28b85502eb	[AMDGPU] Remove some duplicated lines. NFC. (#128029 )	2025-07-21 17:28:31 +01:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Daniil Fukalov	b7f6abdd05	[AMDGPU] Try to reuse register with the constant from compare in v_cndmask (#148740 ) For some targets, the optimization X == Const ? X : Y -> X == Const ? Const : Y can cause extra register usage or redundant immediate encoding for the constant in cndmask generated from the ternary operation. This patch detects such cases and reuses the register from the compare instruction that already holds the constant, instead of materializing it again for cndmask. The optimization avoids immediates that can be encoded into cndmask instruction (including +-0.0), as well as !isNormal() constants. The change is reworked on the base of #131146 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:18:44 +02:00
Stanislav Mekhanoshin	2d6534b7da	[AMDGPU] gfx1250 64-bit relocations and fixups (#148951 )	2025-07-15 17:13:42 -07:00
Shilei Tian	d7ec80c897	[AMDGPU] Add support for `v_tanh_bf16` on gfx1250 (#147425 ) Co-authored-by: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin@amd.com>	2025-07-14 16:30:18 -04:00
LU-JOHN	896d900c9b	[AMDGPU] Create hi-half of 64-bit ashr with mov of -1 (#146569 ) When performing a 64-bit sra of a negative value with a shift range from [32-63], create the hi-half with a move of -1. Alive verification: https://alive2.llvm.org/ce/z/kXd7Ac Also, preserve exact flag. Alive verification: https://alive2.llvm.org/ce/z/L86tXf. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-07-09 12:02:51 -04:00
Dominik Steenken	acdf1c7526	[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105 ) This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use `__builtin_canonicalize` for all `f(16\|32\|64\|128)` data types. Canonicalization is implemented here as multiplication by `1.0`, as suggested in [the docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).	2025-07-08 16:12:17 +01:00
LU-JOHN	543f948312	[AMDGPU] Preserve exact flag for lshr (#146744 ) When reducing 64-bit lshr to 32-bit preserve exact flag. Alive2 verification: https://alive2.llvm.org/ce/z/LcnX7V --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-07-07 23:26:45 +09:00
Fabian Ritter	fa859ed8c6	[AMDGPU][NFC] Fix typo "store" -> "load" in comment for AMDGPUTLI::performLoadCombine (#147298 )	2025-07-07 16:11:45 +02:00
LU-JOHN	f2991bf791	[AMDGPU] Convert 64-bit sra to 32-bit if shift amt >= 32 (#144421 ) Use KnownBits to convert 64-bit sra to 32-bit sra. Scaled-down alive2 verification with 16/8-bit types: https://alive2.llvm.org/ce/z/LamASk --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-06-26 14:22:59 -04:00
Matt Arsenault	4be4b82e74	AMDGPU: Use reportFatalUsageError for unhandled calling conventions (#145261 ) Should switch this to DiagnosticInfo and use the default calling convention, but that would require passing in the context.	2025-06-23 15:30:13 +09:00
Matt Arsenault	dd4776d429	AMDGPU: Remove AMDGPUInstrInfo class (#144984 ) This was never constructed and only provided one static helper function.	2025-06-20 18:26:56 +09:00
LU-JOHN	c4caf00bfb	[AMDGPU] Convert more 64-bit lshr to 32-bit if shift amt>=32 (#138204 ) Convert vector 64-bit lshr to 32-bit if shift amt is known to be >= 32. Also convert scalar 64-bit lshr to 32-bit if shift amt is variable but known to be >=32. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-06-13 17:03:06 +09:00
LU-JOHN	f88a9a32d9	[AMDGPU] Extend SRA i64 simplification for shift amts in range [33:62] (#138913 ) Extend sra i64 simplification to shift constants in range [33:62]. Shift amounts 32 and 63 were already handled. New testing for shift amts 33 and 62 added in sra.ll. Changes to other test files were to adapt previous test results to this extension. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-05-30 16:21:38 +02:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Pierre van Houtryve	fd85ffb4c4	[AMDGPU] Handle min/max in isNarrowingProfitable (#140206 ) Introduces a slight regression in some cases but it'll even out once we disable the promotion in CGP.	2025-05-16 10:16:44 +02:00
Matt Arsenault	797a580b6a	AMDGPU: Use poison instead of undef in more lowerings (#139208 )	2025-05-09 18:18:11 +02:00
Matt Arsenault	912df60b08	AMDGPU: Handle minimumnum/maximumnum in fneg combines (#139133 )	2025-05-09 08:07:01 +02:00
Brox Chen	9d907a2bb1	AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 (#128911 ) Update the f64 to f16 lowering for targets which support f16 types. For unsafe mode, lowered to two FP_ROUND. (This patch https://reviews.llvm.org/D154528 stops from combining these two FP_ROUND back). In safe mode, select LowerF64ToF16 (round-to-nearest-even rounding mode)	2025-05-08 13:30:09 -04:00
Matt Arsenault	003e501487	AMDGPU: Fix gcc -Wenum-compare warning (#138529 )	2025-05-05 16:26:06 +02:00
Simon Pilgrim	a99e055030	[DAG] shouldReduceLoadWidth - add optional<unsigned> byte offset argument (#136723 ) Based off feedback for #129695 - we need to be able to determine the load offset of smaller loads when trying to determine whether a multiple use load should be split (in particular for AVX subvector extractions). This patch adds a std::optional<unsigned> ByteOffset argument to shouldReduceLoadWidth calls for where we know the constant offset to allow targets to make use of it in future patches.	2025-04-23 12:30:27 +01:00
Simon Pilgrim	64ffecfc43	[DAG] isKnownNeverNaN - add DemandedElts element mask to isKnownNeverNaN calls (#135952 ) Matches what we've done for computeKnownBits etc. to improve vector handling	2025-04-18 09:24:02 +01:00
Juan Manuel Martinez Caamaño	0375ef07c3	[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741 ) This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as a 4-bit signed integer and returns `0.0625f * src`. SWDEV-518861	2025-04-02 19:51:40 +02:00
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
LU-JOHN	827f2ad643	AMDGPU: Convert vector 64-bit shl to 32-bit if shift amt >= 32 (#132964 ) Convert vector 64-bit shl to 32-bit if shift amt is known to be >= 32. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-03-28 23:46:35 +07:00
Diana Picus	e17b3cdfb3	[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain (#130094 ) The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call. A call with the following arguments: ``` llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args, /flags/0x1, %num_vgprs, %fallback_exec, %fallback_callee ``` is supposed to do the following: - copy the SGPR and VGPR args into their respective registers - try to change the VGPR allocation - if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier). Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do). For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it. --------- Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic@amd.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-03-20 08:38:04 +01:00
David Green	bd1be8a242	[CodeGen][GlobalISel] Add a getVectorIdxWidth and getVectorIdxLLT. (#131526 ) From #106446, this adds a variant of getVectorIdxTy that returns an LLT. Many uses only look at the width, so a getVectorIdxWidth was added as the common base.	2025-03-18 08:31:11 +00:00
Jay Foad	44607666b3	[AMDGPU] Simplify conditional expressions. NFC. (#129228 ) Simplfy `cond ? val : false` to `cond && val` and similar.	2025-03-03 10:40:49 +00:00
LU-JOHN	5decab178f	AMDGPU: Reduce shl64 to shl32 if shift range is [63-32] (#125574 ) Reduce: DST = shl i64 X, Y where Y is in the range [63-32] to: DST = [0, shl i32 X, (Y & 32)] Alive2 analysis: https://alive2.llvm.org/ce/z/w_u5je --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-02-13 13:40:25 -06:00
Matt Arsenault	077e0c134a	AMDGPU: Generalize truncate of shift of cast build_vector combine (#125617 ) Previously we only handled cases that looked like the high element extract of a 64-bit shift. Generalize this to handle any multiple indexing. I was hoping this would help avoid some regressions, but it did not. It does however reduce the number of steps the DAG takes to process these cases. NFC-ish, I have yet to find an example where this changes the final output.	2025-02-04 11:46:30 +07:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
LiqinWeng	3083acc215	[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294 ) This patch remove the restriction for folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2), and test case from dhrystone , see this link: riscv32: https://godbolt.org/z/o8GdMKrae riscv64: https://godbolt.org/z/Yh5bPz56z	2024-12-10 11:17:54 +08:00
Jon Chesterfield	4e0ba801ea	Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no longer special" Test case didn't run locally, investigating This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.	2024-12-08 12:00:13 +00:00
Jon Chesterfield	7bad469182	[amdgpu][lds] Simplify error diag path - lds variable names are no longer special	2024-12-08 11:26:33 +00:00
Nikita Popov	3317c9ceac	[AMDGPU] Use getSignedConstant() where necessary (#117328 ) Create signed constant using getSignedConstant(), to avoid future assertion failures when we disable implicit truncation in getConstant(). This also touches some generic legalization code, which apparently only AMDGPU tests.	2024-11-25 09:49:34 +01:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Sergei Barannikov	3d73dbe7f0	[AMDGPU] Remove unused AMDGPUISD enum members (NFC) (#115582 ) Those were only used in `getTargetNodeName`.	2024-11-11 23:39:20 +03:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Matt Arsenault	88e23eb2cf	DAG: Fix legalization of vector addrspacecasts (#113964 )	2024-10-29 08:08:50 -05:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00

1 2 3 4 5 ...

669 Commits