llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f7c3627338	DAG: Implement promotion for strict_fpextend (#74310 ) Test is a placeholder, will be merged into the existing test after additional bug fixes for illegal f16 targets are fixed.	2023-12-22 17:15:52 +07:00
Matt Arsenault	c7952d8860	AMDGPU: Add a few more bfloat codegen tests	2023-12-22 12:31:42 +07:00
Matt Arsenault	50ed3b1ecc	AMDGPU: Workaround a divergent return value bug in test	2023-12-22 12:31:42 +07:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Matt Arsenault	b01adc6bed	AMDGPU: Strengthen some bfloat tests Fix bitcast test, which was splitting apart phis intended to force bitcasts that survive all the way to selection. Disable the amdgpu-codegenprepare phi splitting, which defeats the technique of using a phi to ensure a bitcast reaches all the way to selection. Also add a variety of bfloat tests. These probably need revisiting to avoid the cast folding into argument loads. Also round out set of bfloat bitcast and ABI tests. Add codegen tests for more bf16 operations The promotion of these works contrary to the comment.	2023-12-20 19:33:45 +07:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Mariusz Sikora	9a41a80e76	[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744 ) Attached test will cause crash without this change. We should not remove isAssumeLikeIntrinsic instruction if it is used by other instruction.	2023-12-20 07:47:49 +01:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Matt Arsenault	1196975286	AMDGPU: Add gfx11 run line to bf16 test	2023-12-19 17:12:52 +07:00
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Simon Pilgrim	7b1e4239b3	[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229 ) We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads). reduceLoadWidth can handle this for scalar loads, but not for vectors. Noticed while triaging D152928	2023-12-18 16:21:11 +00:00
Carl Ritson	5139299618	[AMDGPU] Track physical VGPRs used for SGPR spills (#75573 ) Physical VGPRs used for SGPR spills need to be tracked independent of WWM reserved registers. The WWM reserved set contains extra registers allocated during WWM pre-allocation pass. This causes SGPR spills allocated after WWM pre-allocation to overlap with WWM register usage, e.g. if frame pointer is spilt during prologue/epilog insertion.	2023-12-17 16:44:16 +09:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Mariusz Sikora	229273f538	[AMDGPU] Update permlane test for GFX12 (#75572 )	2023-12-15 11:18:23 +01:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Carl Ritson	0ed0b7458a	[AMDGPU] Pre-commit test for #75573 . NFC Shows spill allocation overlapping with WWM register use.	2023-12-15 18:29:08 +09:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Pierre van Houtryve	f1ea77f7be	[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436 ) Split from #72830	2023-12-15 08:31:14 +01:00
Saiyedul Islam	e21b7e2143	[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219 ) Regenerate a few more llc tests to check for COV5 instead of the default ABI version.	2023-12-15 10:27:49 +05:30
Jay Foad	3e6da3252f	[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499 )	2023-12-14 21:11:39 +00:00
Mirko Brkusanin	c6351b4cc9	[AMDGPU][NFC] Regenerate .mir test	2023-12-14 18:58:43 +01:00
Valery Pykhtin	dd051295bc	[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975 ) Let's try once again after #69957 has landed.	2023-12-14 14:10:27 +01:00
Stanislav Mekhanoshin	c6ecbcb48b	[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245 ) BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the vmcnt. This is VMEM load and DS store, so it shall use vmcnt.	2023-12-13 10:49:36 -08:00
Petar Avramovic	6892c175c5	AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340 ) Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will implement - selection of divergent i1 phis as lane mask phis, requires lane mask merging in some cases - lower uses of divergent i1 values outside of the cycle using lane mask merging - lowering of all cases of temporal divergence: - lower uses of uniform i1 values outside of the cycle using lane mask merging - lower uses of uniform non-i1 values outside of the cycle using a copy to vgpr inside of the cycle Add very detailed set of regression tests for cases mentioned above. patch 1 from: https://github.com/llvm/llvm-project/pull/73337	2023-12-13 16:42:56 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Stanislav Mekhanoshin	7f54070194	[AMDGPU] Precommit test for LDS DMA waitcounts. NFC. (#75240 )	2023-12-12 12:29:09 -08:00
Jay Foad	8005ee6da3	[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070 )	2023-12-12 17:41:40 +00:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Saiyedul Islam	777b6de7a4	[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339 ) Regenerate a few llc tests to test for COV5 instead of the default ABI version.	2023-12-12 14:35:13 +05:30
Jay Foad	35ebd92d3d	[GlobalISel] Add G_PREFETCH (#74863 )	2023-12-11 11:06:50 +00:00
Pierre van Houtryve	dd32d26a37	[AMDGPU] Form V_MAD_U64_U32 from mul24 (#72393 ) Fixes SWDEV-421067	2023-12-11 11:38:27 +01:00
Jay Foad	e38c29c2b7	[AMDGPU] Add GFX11 test coverage to integer-mad-patterns.ll	2023-12-08 13:06:03 +00:00
Saiyedul Islam	5c4c199fe3	[AMDGPU][NFC] Improve testing for AMDHSA ABI Version (#74300 ) Add tests for COV4 as well as COV5 instead of only testing for the default version.	2023-12-08 18:09:45 +05:30
Valery Pykhtin	901c5be524	[AMDGPU] Fix GCNUpwardRPTracker: max register pressure on defs. (#74422 ) Treat a defined register as fully live "at" the instruction and update maximum pressure accordingly. Fixes #3786.	2023-12-08 11:27:08 +01:00
Pierre van Houtryve	ecd2f56a80	[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055 ) This should make it a bit harder to miss this type of issue. The warning only shows if amdgpu-waves-per-eu is used. See SWDEV-434482	2023-12-06 10:46:46 +01:00
Matt Arsenault	08e63dd8fe	AMDGPU: Add a MIR test to catch infinite loop This is derived from one of the regressions reported after aed1a2217a1da0c9fb7d2c0856302dee25b1d4a1	2023-12-06 15:58:32 +07:00
Pranav Taneja	41507fe595	[GISel] Combine (Scalarize) vector load followed by an element extract.	2023-12-06 11:23:23 +05:30
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Valery Pykhtin	ecf8818380	[AMDGPU] Presubmit test: max register pressure on defs. (#74424 ) Upcoming patch #74422.	2023-12-05 13:23:49 +01:00
Ruiling, Song	90681d3a41	AMDGPU: Return legal addressmode correctly for flat scratch (#71494 )	2023-12-05 11:22:39 +08:00
Ruiling Song	e31a7581a5	AMDGPU: Pre-commit test to show diff	2023-12-05 11:19:50 +08:00

1 2 3 4 5 ...

7032 Commits