llvm-project

Author	SHA1	Message	Date
Matt Arsenault	248fba0cd8	AMDGPU: Remove pointless setOperationAction for xint_to_fp The legalize action for uint_to_fp/sint_to_fp uses the source integer type, not the result FP type so setting an action on an FP type does nothing.	2023-12-22 11:24:35 +07:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Mariusz Sikora	9a41a80e76	[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744 ) Attached test will cause crash without this change. We should not remove isAssumeLikeIntrinsic instruction if it is used by other instruction.	2023-12-20 07:47:49 +01:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Mariusz Sikora	a018c8cdbb	GFX12: Add LoopDataPrefetchPass (#75625 ) It is currently disabled by default. It will need experiments on a real HW to tune and decide on the profitability. --------- Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-19 08:32:16 +01:00
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Stanislav Mekhanoshin	e5c523e861	[AMDGPU] Produce better memoperand for LDS DMA (#75247 ) 1) It was marked as volatile. This is not needed and the only reason it was done is because it is both load and store and handled together with atomics. Global load to LDS was marked as volatile just because buffer load was done that way. 2) Preserve at least LDS (store) pointer which we always have with the intrinsics. 3) Use PoisonValue instead of nullptr for load memop as a Value.	2023-12-18 11:01:12 -08:00
Stanislav Mekhanoshin	94230ce548	[AMDGPU] Fix lack of LDS DMA check in the AA handling (#75249 ) SIInstrInfo::areMemAccessesTriviallyDisjoint does a DS offset checks, but does not account for LDS DMA instructions. Added these checks. Without it code falls through and returns true which is wrong. As a result mayAlias would always return false for LDS DMA and a regular LDS instruction or 2 LDS DMA instructions. At the moment this is NFCI because we do not use this AA in a context which may touch LDS DMA instructions. This is also unreacheable now because of the ordered memory ref checks just above in the function and LDS DMA is marked as volatile. This volatile marking is removed in PR #75247, therefore I'd submit this check before #75247.	2023-12-18 10:58:50 -08:00
Jay Foad	7e5019e82b	[AMDGPU] Simplify WaitcntBrackets::getRegInterval with getPhysRegBaseClass (#74087 ) This means that getRegInterval no longer depends on the MCInstrDesc, so it could be simplified further to take just a MachineOperand or just a physical register. NFCI.	2023-12-18 14:16:02 +00:00
Jakub Chlanda	a34db9bdef	[AMDGPU][NFC] Simplify needcopysign logic (#75176 ) This was caught by coverity, reported as: `dead_error_condition`. Since the conditional revolves around `CF`, it is guaranteed to be null in the else clause, hence making the second part of the statement redundant.	2023-12-18 12:07:22 +01:00
Carl Ritson	5139299618	[AMDGPU] Track physical VGPRs used for SGPR spills (#75573 ) Physical VGPRs used for SGPR spills need to be tracked independent of WWM reserved registers. The WWM reserved set contains extra registers allocated during WWM pre-allocation pass. This causes SGPR spills allocated after WWM pre-allocation to overlap with WWM register usage, e.g. if frame pointer is spilt during prologue/epilog insertion.	2023-12-17 16:44:16 +09:00
Youngsuk Kim	67aec2f58b	[llvm] Remove no-op ptr-to-ptr casts (NFC) Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).	2023-12-15 11:04:48 -06:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mirko Brkušanin	07a6d73664	[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493 )	2023-12-15 15:01:40 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Jie Fu	f0b44ce28e	[AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC) llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1322:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = ^ llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1334:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(Block, It, DL, TII->get(AMDGPU::S_WAITCNT_VSCNT)) ^ 2 errors generated.	2023-12-15 20:00:18 +08:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Pierre van Houtryve	ef067f5204	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830 ) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>	2023-12-15 12:33:32 +01:00
Petar Avramovic	d37ced8880	AMDGPU: refactor phi lowering from SILowerI1Copies (NFCI) (#75349 ) Make abstract class PhiLoweringHelper and expose it for use in GlobalISel path. SILowerI1Copies implements PhiLoweringHelper as Vreg1LoweringHelper and it is equivalent to SILowerI1Copies. Notable change that createLaneMaskReg now clones attributes from register that has lane mask attributes instead of creating register with lane mask register class. This is because lane masks have different(more) attributes in GlobalISel. patch 2 from: https://github.com/llvm/llvm-project/pull/73337	2023-12-15 12:20:07 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Mirko Brkušanin	569ef8ddd9	[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204 )	2023-12-15 10:41:05 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Mirko Brkušanin	c1a6974d6b	[AMDGPU][MC] Add GFX12 SMEM encoding (#75215 )	2023-12-15 09:00:54 +01:00
Pierre van Houtryve	f1ea77f7be	[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436 ) Split from #72830	2023-12-15 08:31:14 +01:00
Jay Foad	3e6da3252f	[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499 )	2023-12-14 21:11:39 +00:00
Jay Foad	c5a068a196	[AMDGPU] Remove s_cmpk_* for GFX12 (#75497 ) No GFX12 encoding was added for these. This patch adds tests that they are not recognized by the assembler and defends against generating them in codegen.	2023-12-14 21:10:53 +00:00
Mirko Brkušanin	47615ddc84	[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193 )	2023-12-14 14:22:04 +01:00
Valery Pykhtin	dd051295bc	[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975 ) Let's try once again after #69957 has landed.	2023-12-14 14:10:27 +01:00
Mirko Brkušanin	ac406b4817	[AMDGPU][MC] Add GFX12 VBUFFER encoding (#75195 )	2023-12-14 12:58:18 +01:00
Mirko Brkušanin	16c27bcdde	[AMDGPU][MC] Add GFX12 VDS encoding (#75316 )	2023-12-14 11:04:21 +01:00
Stanislav Mekhanoshin	c6ecbcb48b	[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245 ) BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the vmcnt. This is VMEM load and DS store, so it shall use vmcnt.	2023-12-13 10:49:36 -08:00
Petar Avramovic	6892c175c5	AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340 ) Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will implement - selection of divergent i1 phis as lane mask phis, requires lane mask merging in some cases - lower uses of divergent i1 values outside of the cycle using lane mask merging - lowering of all cases of temporal divergence: - lower uses of uniform i1 values outside of the cycle using lane mask merging - lower uses of uniform non-i1 values outside of the cycle using a copy to vgpr inside of the cycle Add very detailed set of regression tests for cases mentioned above. patch 1 from: https://github.com/llvm/llvm-project/pull/73337	2023-12-13 16:42:56 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Piotr Sobczak	62653573cb	[AMDGPU] Extend clang-format directive in SIDefines.h	2023-12-13 11:18:16 +01:00
Jay Foad	8005ee6da3	[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070 )	2023-12-12 17:41:40 +00:00
Jay Foad	220e095a2c	[AMDGPU] Remove unused function splitScalar64BitAddSub	2023-12-12 15:01:57 +00:00
Piotr Sobczak	cd138fddf1	[AMDGPU] Turn off clang-format in moveToVALU (#75188 )	2023-12-12 15:38:06 +01:00
Jay Foad	1c6b336395	[AMDGPU] Miscellaneous clang-format changes (#75186 ) Reformat one table and protect a couple of tables that we don't want to reformat.	2023-12-12 14:26:04 +00:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Saiyedul Islam	777b6de7a4	[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339 ) Regenerate a few llc tests to test for COV5 instead of the default ABI version.	2023-12-12 14:35:13 +05:30
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
Wang Pengcheng	d11e54f3fe	[MacroFusion] Support multiple predicators (#72219 ) The user can provide multiple predicators to MacroFusion and the DAG mutation will be applied if one of them is evalated to true. `ShouldSchedulePredTy` is renamed to `MacroFusionPredTy`.	2023-12-12 12:11:13 +08:00
wangpc	59f3661bd2	Revert "[MacroFusion] Support multiple predicators (#72219 )" This reverts commit d3f6e82a6a562e3288a6fc0970d324073996c16d. Some code can't be compiled.	2023-12-12 11:21:39 +08:00
Wang Pengcheng	d3f6e82a6a	[MacroFusion] Support multiple predicators (#72219 ) The user can provide multiple predicators to MacroFusion and the DAG mutation will be applied if one of them is evalated to true. `ShouldSchedulePredTy` is renamed to `MacroFusionPredTy`.	2023-12-12 11:18:31 +08:00
Pierre van Houtryve	dd32d26a37	[AMDGPU] Form V_MAD_U64_U32 from mul24 (#72393 ) Fixes SWDEV-421067	2023-12-11 11:38:27 +01:00

1 2 3 4 5 ...

8582 Commits