llvm-project

Author	SHA1	Message	Date
Matt Arsenault	b5bc205d75	AMDGPU: Convert some bit operation tests to opaque pointers	2022-11-29 18:36:53 -05:00
Jay Foad	342642dc75	[AMDGPU][GISel] Smaller code for scalar 32 to 64-bit extensions Differential Revision: https://reviews.llvm.org/D107639	2022-11-16 06:57:21 +00:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Austin Kerbow	da067ed569	[AMDGPU] Set most sched model resource's BufferSize to one Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior. Practically, this means that the scheduler will trigger the 'STALL' heuristic more often. This type of change needs to be evaluated experimentally. Preliminary results are positive. Fixes: SWDEV-282962 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114777	2021-12-01 22:31:28 -08:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Jay Foad	83610d4eb0	[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107474	2021-08-06 09:40:48 +01:00
Jay Foad	24b67a9024	[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef We can improve on the generic splitting by using ffbh/ffbl, which have a defined result when the input is zero. Differential Revision: https://reviews.llvm.org/D107442	2021-08-06 09:40:48 +01:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Jay Foad	ba5c4ac600	[AMDGPU] Add cttz tests and globalisel checks for ctlz	2021-08-04 15:57:14 +01:00

11 Commits