llvm-project

Author	SHA1	Message	Date
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Jay Foad	a4196666ac	[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710 ) This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.	2023-11-13 13:53:10 +00:00
Amara Emerson	6b69584660	[GlobalISel] Fall back for bf16 conversions. (#71470 ) We don't support these correctly since we don't yet have FP types. AMDGPU tests were silently miscompiling bf16 as if they were fp16.	2023-11-06 21:18:57 -08:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Ivan Kosarev	f04aa1f814	[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002 )	2023-10-05 14:22:29 +03:00
Amara Emerson	1cc9f626cb	[GlobalISel] Add constant-folding of FP binops to combiner. (#65230 )	2023-09-07 19:33:35 +03:00
Amara Emerson	6c31f20fee	[GlobalISel] Fold fmul x, 1.0 -> x (#65379 )	2023-09-06 03:14:16 +08:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Jay Foad	7fa7a08f21	[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) Differential Revision: https://reviews.llvm.org/D155681	2023-07-19 10:33:11 +01:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Jay Foad	f2c164c815	[AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns. Differential Revision: https://reviews.llvm.org/D153537	2023-07-04 12:22:38 +01:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Matt Arsenault	089f652f17	AMDGPU: Add more log vector tests	2023-06-23 08:28:42 -04:00
Ivan Kosarev	c2887096f3	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 6. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152807	2023-06-15 10:39:31 +01:00
Matt Arsenault	1f615b502c	AMDGPU: Modernize log codegen tests	2023-06-12 21:11:36 -04:00
Matt Arsenault	fb1d166e1d	AMDGPU: Bulk update some generic intrinsic tests to opaque pointers Done purely with the script.	2022-11-29 18:09:59 -05:00
Jay Foad	e2926501d8	[AMDGPU] Aggressively fold immediates in SIShrinkInstructions Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage. Differential Revision: https://reviews.llvm.org/D114644	2022-05-18 11:04:33 +01:00
Mircea Trofin	b470630913	[NFC] Removed unused prefixes from CodeGen/AMDGPU All the 'l'-starting tests. Differential Revision: https://reviews.llvm.org/D94151	2021-01-06 09:34:11 -08:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Joel E. Denny	9fa9c9368d	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843	2018-07-11 20:25:49 +00:00
Vedran Miletic	ad21f2687d	[AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 llvm-svn: 319025	2017-11-27 13:26:38 +00:00

24 Commits