llvm-project

Author	SHA1	Message	Date
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Stanislav Mekhanoshin	6e3e14f600	[AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191	2022-03-24 12:40:42 -07:00
Jon Chesterfield	bcbd4cf1f2	Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.	2022-03-20 00:58:26 +00:00
Jon Chesterfield	98e474c1b3	[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal	2022-03-19 16:42:17 +00:00
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Jay Foad	313f306b26	[AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments NFCI. The motivation for this is avoid problems in future if we add new classes containing only a subset of all VGPRs, or a subset of all SGPRs. getMinimalPhysRegClass would favour these smaller classes, which is not what we want here. Differential Revision: https://reviews.llvm.org/D121914	2022-03-17 15:19:17 +00:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Stanislav Mekhanoshin	1f53f20fc1	[AMDGPU] Support gfx940 v_lshl_add_u64 instruction Differential Revision: https://reviews.llvm.org/D121401	2022-03-14 15:45:42 -07:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00
Changpeng Fang	0f20a35b9e	AMDGPU: Set up User SGPRs for queue_ptr only when necessary Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend. Reviewers: sameerds, arsenm Fixes: SWDEV-307189 Differential Revision: https://reviews.llvm.org/D119762	2022-03-09 10:14:05 -08:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Stanislav Mekhanoshin	932f628121	[AMDGPU] new gfx940 fp atomics Differential Revision: https://reviews.llvm.org/D121028	2022-03-07 12:32:02 -08:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Jay Foad	f72d8897ac	[AMDGPU] Honor !invariant.load metadata on load-like intrinsics Differential Revision: https://reviews.llvm.org/D119739	2022-02-15 09:16:57 +00:00
Julien Pages	dcb2da13f1	[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode Add a new llvm.fptrunc.round intrinsic to precisely control the rounding mode when converting from f32 to f16. Differential Revision: https://reviews.llvm.org/D110579	2022-02-11 12:08:23 -05:00
Jay Foad	ddd3807e69	[AMDGPU] Use new target MMO flag MONoClobber This allows us to set the noclobber flag on (the MMO of) a load instruction instead of on the pointer. This fixes a bug where noclobber was being applied to all loads from the same pointer, even if some of them were clobbered. Differential Revision: https://reviews.llvm.org/D118775	2022-02-02 17:12:36 +00:00
Changpeng Fang	1194b9cdda	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272	2022-01-31 18:07:47 -08:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Sebastian Neubauer	ae2f9c8be8	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116	2022-01-21 12:09:08 +01:00
Sebastian Neubauer	0530fdbbbb	[AMDGPU] Fix LOD bias in A16 combine As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine. Differential Revision: https://reviews.llvm.org/D116038	2022-01-21 12:09:06 +01:00
Matt Arsenault	82de129ab8	AMDGPU: Remove llvm.amdgcn.alignbit and handle bitcode upgrade to fshr	2022-01-18 14:08:36 -05:00
Matt Arsenault	dc2457c8cf	AMDGPU: Fix crashing on calls to C functions from graphics contexts If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did not have anything to pass to the workitem ID. This is illegal, but we still need to produce something sensible. llvm-reduce likes to replace calls to intrinsics with calls to null or undef, so this does appear and is helpful to avoid hard erroring. Pass undef in this case, as already happened for the other implicit arguments. It might make sense to define the behavior here and pass null for the pointers, and -1 for the workitem ID. We do have extra bits in the workitem ID, so this wouldn't conflict with a valid value.	2022-01-17 10:13:25 -05:00
Stanislav Mekhanoshin	fc6af7e188	[AMDGPU] Fix error handling in asm constraint syntax I believe this is unexploitable because in either case the result will be 'couldn't allocate register for constraint' error message, but error code checking is clearly wrong. Differential Revision: https://reviews.llvm.org/D117189	2022-01-13 09:33:50 -08:00
Matt Arsenault	59994c25f9	AMDGPU: Select workitem ID intrinsics to 0 with req_work_group_size Shockingly we weren't doing this already. We should probably have this be done earlier in the IR too, but it's still helpful to have the lowering guarantee it so that we can modify the ABI implicit inputs based on it.	2022-01-13 12:08:18 -05:00
Matt Arsenault	a6f49423c1	AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size If we know we we aren't using a component from the kernel, we can save a few bit packing instructions. We're still enabling the VGPR input to the kernel though.	2022-01-13 12:08:18 -05:00
Stanislav Mekhanoshin	d043822daa	[AMDGPU] Fixed physreg asm constraint parsing We are always failing parsing of the physreg constraint because we do not drop trailing brace, thus getAsInteger() returns a non-empty string and we delegate reparsing to the TargetLowering. In addition it did not parse register tuples. Fixed which has allowed to remove w/a in two places we call it. Differential Revision: https://reviews.llvm.org/D117055	2022-01-12 16:37:08 -08:00
Austin Kerbow	8470bf2b08	[AMDGPU] Do not reserve any VGPR for SGPR spills After the split register allocation changes in eebe841a47cb it is no longer necessary to reserve a VGPR before RA. This can also create bugs when IPRA is enabled since we cannot predict that a called function may not reserve any register if it does not have any SGPR spills. If that happens those functions may override reserved registers that are normally callee saved. Added a test to show this. Fixes: SWDEV-309900 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115551	2022-01-11 22:14:59 -08:00
David Salinas	c0581f7df6	Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D116686 Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2	2022-01-11 21:14:09 +00:00
Matt Arsenault	68468bbe15	AMDGPU: Avoid null check during addrspacecast lowering If we know the source is a valid object, we do not need to insert a null check. This misses a lot of opportunities from metadata/attributes not tracked in codegen.	2022-01-10 13:27:39 -05:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
rkorsa	c680fb69d6	[AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16 bit on The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. Differential Revision: https://reviews.llvm.org/D111754	2021-12-17 16:11:51 +05:30
Neubauer, Sebastian	26924b57e8	[AMDGPU] Ignore special ABI registers for graphics Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them. Differential Revision: https://reviews.llvm.org/D115344	2021-12-13 16:44:37 +01:00
Matt Arsenault	06b90175e7	AMDGPU: Remove fixed function ABI option	2021-12-10 19:41:19 -05:00
Michael Liao	53fc971a4b	Fix `-Wunused-variable` warning. NFC.	2021-12-04 23:34:55 -05:00
Jay Foad	2774bad112	[AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to match how the hardware instruction works. Don't change the API of the corresponding OpenCL builtins. Differential Revision: https://reviews.llvm.org/D115032	2021-12-04 10:32:11 +00:00
Petar Avramovic	ab01f4d264	AMDGPU/GlobalISel: Do not fcanonicalize const splat padded with undef Recognize constant splat padded with undef in isCanonicalized. Fcanonicalize will be removed by RemoveFcanonicalize in post-legalizer combiner. We will treat undef as value that will result in a splat in clamp combine after regbankselect. Differential Revision: https://reviews.llvm.org/D104408	2021-12-03 12:49:38 +01:00
Daniil Fukalov	ab05ab59a7	[CostModel][AMDGPU] Fix instructions costs estimation for vector types. 1. Fixed vector instructions costs estimations incosistency - removed different logic for "not simple types" since it biases costs for these types. 2. Fixed legalization penalty for vectors too big for the target: changed from overwrite default legalization cost value estimation to added penalty. 3. Fixed few typos in tests. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114893	2021-12-03 03:08:08 +03:00
Mirko Brkusanin	8951136216	[AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D97937	2021-11-29 16:27:21 +01:00
Mirko Brkusanin	881840fc26	[AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z) Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D93305	2021-11-29 16:27:21 +01:00
Kazu Hirata	387927bbaf	[Target] Use range-based for loops (NFC)	2021-11-26 21:21:17 -08:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
Jay Foad	d7e03df719	[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986	2021-11-24 11:25:02 +00:00

1 2 3 4 5 ...

1022 Commits