llvm-project

Author	SHA1	Message	Date
Jay Foad	6bf4476ffb	[AMDGPU] Fix @llvm.amdgcn.cs.chain with callee not provably uniform (#114200 ) The correct behavior is to insert a readfirstlane. This worked except for an inappropriate assertion in SITargetLowering::LowerCall.	2024-10-30 16:18:29 +00:00
Jay Foad	8ee5e19c87	[AMDGPU] Fix @llvm.amdgcn.cs.chain with SGPR args not provably uniform (#114232 ) The correct behaviour is to insert a readfirstlane. SelectionDAG was already doing this in some cases, but not in the general case for chain calls. GlobalISel was already doing this for return values but not for arguments.	2024-10-30 16:12:37 +00:00
Shilei Tian	4cf128512b	[NFC][AMDGPU] Use C++17 structured bindings as much as possible (#113939 ) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.	2024-10-28 13:55:57 -04:00
Shilei Tian	c3fe0e46e2	[NFC][AMDGPU] clang-format `llvm/lib/Target/AMDGPU/SIISelLowering.cpp` (#112645 )	2024-10-21 16:42:25 -07:00
Rahul Joshi	6924fc0326	[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428 ) Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.	2024-10-16 07:21:10 -07:00
Fabian Ritter	173c68239d	[AMDGPU] Enable unaligned scratch accesses (#110219 ) This allows us to emit wide generic and scratch memory accesses when we do not have alignment information. In cases where accesses happen to be properly aligned or where generic accesses do not go to scratch memory, this improves performance of the generated code by a factor of up to 16x and reduces code size, especially when lowering memcpy and memmove intrinsics. Also: Make the use of the FeatureUnalignedScratchAccess feature more consistent: FeatureUnalignedScratchAccess and EnableFlatScratch are now orthogonal, whereas, before, code assumed that the latter implies the former at some places. Part of SWDEV-455845.	2024-10-11 08:50:49 +02:00
Jay Foad	62b3a4bc70	[AMDGPU] Improve codegen for s_barrier_init (#111866 )	2024-10-10 19:40:02 +01:00
Matt Arsenault	c198f775cd	AMDGPU: Remove flat/global fmin/fmax intrinsics (#105642 ) These have been replaced with atomicrmw	2024-10-09 09:27:28 +04:00
Shilei Tian	88a239d292	[AMDGPU] Adopt new lowering sequence for `fdiv16` (#109295 ) The current lowering of `fdiv16` can generate incorrectly rounded result in some cases. The new sequence was provided by the HW team, as shown below written in C++. ``` half fdiv(half a, half b) { float a32 = float(a); float b32 = float(b); float r32 = 1.0f / b32; float q32 = a32 * r32; float e32 = -b32 * q32 + a32; q32 = e32 * r32 + q32; e32 = -b32 * q32 + a32; float tmp = e32 * r32; uin32_t tmp32 = std::bit_cast<uint32_t>(tmp); tmp32 = tmp32 & 0xff800000; tmp = std::bit_cast<float>(tmp32); q32 = tmp + q32; half q16 = half(q32); q16 = div_fixup_f16(q16); return q16; } ``` Fixes SWDEV-477608.	2024-10-08 09:49:20 -04:00
Austin Kerbow	c4d89203f3	[AMDGPU] Support preloading hidden kernel arguments (#98861 ) Adds hidden kernel arguments to the function signature and marks them inreg if they should be preloaded into user SGPRs. The normal kernarg preloading logic then takes over with some additional checks for the correct implicitarg_ptr alignment. Special care is needed so that metadata for the hidden arguments is not added twice when generating the code object.	2024-10-06 17:44:33 -07:00
Matt Arsenault	428ae0f12e	AMDGPU: Do not tail call if an inreg argument requires waterfalling (#111002 ) If we have a divergent value passed to an outgoing inreg argument, the call needs to be executed in a waterfall loop and thus cannot be tail called. The waterfall handling of arbitrary calls is broken on the selectiondag path, so some of these cases still hit an error later. I also noticed the argument evaluation code in isEligibleForTailCallOptimization is not correctly accounting for implicit argument assignments. It also seems inreg codegen is generally broken; we are assigning arguments to the reserved private resource descriptor.	2024-10-04 00:04:02 +04:00
Matt Arsenault	c08d7b3de7	AMDGPU: Fix verifier error on tail call target in vgprs (#110984 ) We allow tail calls of known uniform function pointers. This would produce a verifier error if the uniform value is in VGPRs. Insert readfirstlanes just in case this occurs, which will fold out later if it is unnecessary. GlobalISel should need a similar fix, but it currently does not attempt tail calls of indirect calls. Fixes #107447 Fixes subissue of #110930	2024-10-03 21:50:56 +04:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Fabian Ritter	3ba4092c06	[AMDGPU] Check vector sizes for physical register constraints in inline asm (#109955 ) For register constraints that require specific register ranges, the width of the range should match the type of the associated parameter/return value. With this PR, we error out when that is not the case. Previously, these cases would hit assertions or llvm_unreachables. The handling of register constraints that require only a single register remains more lenient to allow narrower non-vector types for the associated IR values. For example, constraining an i16 or i8 value to a 32-bit register is still allowed. Fixes #101190. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-10-01 10:29:35 +02:00
Matt Arsenault	5883ad34d6	DAG: Handle vector legalization of minimumnum/maximumnum (#109779 ) Follow the same patterns as the other min/max variants.	2024-09-30 13:43:35 +04:00
sstipano	eb16acedf5	[AMDGPU] Overload resource descriptor in image intrinsics. (#107255 )	2024-09-27 15:33:52 +02:00
gonzalobg	0f521931b8	LLVMContext: add getSyncScopeName() to lookup individual scope name (#109484 ) This PR adds a `getSyncScopeString(Id)` API to `LLVMContext` that returns the `StringRef` for that ID, if any.	2024-09-25 11:13:56 -07:00
Pierre van Houtryve	de70b959b1	[AMDGPU] Fix typo in promoteUniformOpToI32 (#109942 )	2024-09-25 12:42:57 +02:00
Jay Foad	d075debc50	[AMDGPU] Fix chain handling when lowering barrier intrinsics (#109799 ) Previously we would fail an assertion in RemoveNodeFromCSEMaps after lowering: t3: ch = llvm.amdgcn.s.barrier.join t0, TargetConstant:i64<2973>, Constant:i32<0> to: t6: ch = S_BARRIER_JOIN_IMM TargetConstant:i32<0>	2024-09-24 16:50:46 +01:00
Pierre van Houtryve	758444ca3e	[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383 ) Promote uniform binops, selects and setcc between 2 and 16 bits to 32 bits in DAGISel Solves #64591	2024-09-19 09:00:21 +02:00
Jay Foad	2c85fe9689	[AMDGPU] Remove miscellaneous unused code. NFC.	2024-09-18 16:45:08 +01:00
Piotr Sobczak	adf02ae41f	[AMDGPU] Simplify lowerBUILD_VECTOR (#109094 ) Simplify `lowerBUILD_VECTOR` by commoning up the way the vectors are split. Also reorder the checks to avoid a long condition inside `if`.	2024-09-18 12:58:16 +02:00
Jay Foad	c657a6f6aa	[AMDGPU] Fix selection of s_load_b96 on GFX11 (#108029 ) Fix a bug which resulted in selection of s_load_b96 on GFX11, which only exists in GFX12. The root cause was a mismatch between legalization and selection. The condition used to check that the load was uniform in legalization (SITargetLowering::LowerLOAD) was "!Op->isDivergent()". The condition used to detect a non-uniform load during selection (AMDGPUDAGToDAGISel::isUniformLoad()) was "N->isDivergent() && !AMDGPUInstrInfo::isUniformMMO(MMO)". This makes a difference when IR uniformity analysis has more information than SDAG's built in analysis. In the test case this is because IR UA reports that everything is uniform if isSingleLaneExecution() returns true, e.g. if the specified max flat workgroup size is 1, but SDAG does not have this optimization. The immediate fix is to use the same condition to detect uniform loads in legalization and selection. In future SDAG should learn about isSingleLaneExecution(), and then it could probably stop relying on IR metadata to detect uniform loads.	2024-09-12 13:41:40 +01:00
Jay Foad	e55d6f5ea2	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889 ) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls.	2024-09-11 17:16:06 +01:00
Nicolas Miller	ccc52a817f	[AMDGPU] Remove dead code in SIISelLowering (NFC) (#108198 ) This return is dead code as the return just above will always be taken.	2024-09-11 15:52:10 +01:00
Jay Foad	7a30b9c0f0	[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186 )	2024-09-11 14:55:53 +01:00
Jay Foad	306b08c3a9	[AMDGPU] Remove unused SITargetLowering::isMemOpUniform	2024-09-10 13:11:55 +01:00
Stanislav Mekhanoshin	0745219d4a	[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293 )	2024-09-06 11:41:21 -07:00
Stanislav Mekhanoshin	bd840a4004	[AMDGPU] Add target intrinsic for s_prefetch_data (#107133 )	2024-09-05 15:14:31 -07:00
Changpeng Fang	26b0bef192	AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761 ) Use GCNPat instead of Custom Lowering to select instructions for intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as a predicate to select only when the rounding mode is supported. "as_hw_round_mode : SDNodeXForm" is developed to translate the round modes to the corresponding ones that hardware recognizes.	2024-08-29 11:43:58 -07:00
Austin Kerbow	ceb587a16c	[AMDGPU] Fix crash in allowsMisalignedMemoryAccesses with i1 (#105794 )	2024-08-23 11:51:37 -07:00
Jay Foad	b02b5b7b59	[AMDGPU] Simplify use of hasMovrel and hasVGPRIndexMode (#105680 ) The generic subtarget has neither of these features. Rather than forcing HasMovrel on, it is simpler to expand dynamic vector indexing to a sequence of compare/select instructions. NFC for real subtargets.	2024-08-23 09:59:19 +01:00
Matt Arsenault	ee08d9cba5	AMDGPU: Remove global/flat atomic fadd intrinics (#97051 ) These have been replaced with atomicrmw.	2024-08-22 23:27:33 +04:00
Matt Arsenault	9d364286f3	AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (#97050 ) These are now fully covered by atomicrmw.	2024-08-21 14:26:42 +04:00
Jay Foad	2258bc429b	[AMDGPU] Simplify, fix and improve known bits for mbcnt (#104768 ) Simplify by using KnownBits::add. Fix GlobalISel path which was ignoring the known bits of src1. Improve analysis of mbcnt.hi which adds at most 31 even in wave64.	2024-08-19 18:22:06 +01:00
Changpeng Fang	16929219b0	AMDGPU: Add tonearest and towardzero roundings for intrinsic llvm.fptrunc.round (#104486 ) This work simplifies and generalizes the instruction definition for intrinsic llvm.fptrunc.round. We no longer name the instruction with the rounding mode. Instead, we introduce an immediate operand for the rounding mode for the pseudo instruction. This immediate will be used to set up the hardware mode register at the time the real instruction is generated. We name the pseudo instruction as FPTRUNC_ROUND_F16_F32 (for f32 -> f16), which is easy to generalize for other types. "round.towardzero" and "round.tonearest" are added for f32 -> f16 truncating, in addition to the existing "round.upward" and "round.downward". Other rounding modes are not supported by hardware at this moment.	2024-08-17 11:22:47 -07:00
Matt Arsenault	ef56061dcf	AMDGPU: Rename type helper functions in atomic handling Requested on #95394	2024-08-16 22:54:55 +04:00
Matt Arsenault	0f08aa43a6	AMDGPU: Avoid manually reconstructing atomicrmw (#103769 ) When introducing the address space predicates, move and mutate the original instruction, and clone for the shared case.	2024-08-14 22:01:15 +04:00
Matt Arsenault	0edd07770f	AMDGPU: Preserve alignment when custom expanding atomicrmw (#103768 )	2024-08-14 17:16:59 +04:00
Craig Topper	51bad732dc	[SelectionDAG] Replace EVTToAPFloatSemantics with MVT/EVT::getFltSemantics. (#103001 )	2024-08-13 11:35:28 -07:00
Matt Arsenault	edded8d7b5	AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699 ) This is now autoupgraded to annotate atomicrmw instructions in old bitcode.	2024-08-13 22:02:25 +04:00
Matt Arsenault	1ae507d109	AMDGPU: Do not create phi user for atomicrmw with no uses (#103061 )	2024-08-13 19:24:52 +04:00
Kazu Hirata	f4fb735840	[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578 )	2024-08-09 09:15:42 -07:00
Matt Arsenault	42b5540211	AMDGPU: Preserve atomicrmw name when specializing address space (#102470 )	2024-08-09 00:43:04 +04:00
Matt Arsenault	bb7143f666	AMDGPU: Avoid creating unnecessary block split in atomic expansion (#102440 ) This was creating a new block to insert the is.shared check, but we can just do that in the original block.	2024-08-09 00:39:12 +04:00
Matt Arsenault	88a85942ce	AMDGPU: Directly handle all atomicrmw cases in SIISelLowering (#102439 )	2024-08-08 22:45:43 +04:00
Matt Arsenault	c66777ee1b	AMDGPU: Generalize atomicrmw handling in custom expansion Use the utility function instead of assuming fadd. No change as-is, but will soon be used for other expansions.	2024-08-08 12:30:24 +04:00
Matt Arsenault	dfda9c5b9e	AMDGPU: Handle new atomicrmw metadata for fadd case (#96760 ) This is the most complex atomicrmw support case. Note we don't have accurate remarks for all of the cases, which I'm planning on fixing in a later change with more precise wording. Continue respecting amdgpu-unsafe-fp-atomics until it's eventual removal. Also seems to fix a few cases not interpreting amdgpu-unsafe-fp-atomics appropriately aaggressively.	2024-08-02 19:41:33 +04:00
Matt Arsenault	41439d5bb7	AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759 ) Consider the new atomic metadata when choosing to expand as cmpxchg instead.	2024-08-01 22:08:01 +04:00
Matt Arsenault	1d2b2d29d7	AMDGPU: Cleanup extract_subvector actions (NFC) (#101454 ) The base AMDGPUISelLowering was setting custom action on 16-bit vector types, but also set in SIISelLowering.	2024-08-01 10:55:28 +04:00

1 2 3 4 5 ...

1495 Commits