11 Commits

Author SHA1 Message Date
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Jay Foad
7fa7a08f21 [AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
2023-07-19 10:33:11 +01:00
Jay Foad
f7684d8510 [DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.

On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.

Differential Revision: https://reviews.llvm.org/D154960
2023-07-12 08:12:09 +01:00
Jay Foad
f2c164c815 [AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537
2023-07-04 12:22:38 +01:00
Ivan Kosarev
e9d77cd9b2 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 3.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152716
2023-06-15 09:55:25 +01:00
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
jeff
f4e6149d82 [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)
If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead.

Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945
2022-10-03 12:58:29 -07:00
jeff
33ab74ac46 [AMDGPU] Precommit switching test to generated checks for D134463
Change-Id: I0d90f86ab759347a2f20448d28cc09ddaea3a4d4
2022-09-23 15:12:53 -07:00
alex-t
a0ea7ec90f [AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion.
BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS.
         The latter is further lowererd to the copy instructions.
	 We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes
	 to avoid VGPR to SGPR copies.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122068
2022-03-21 21:11:20 +01:00
Jay Foad
aa1e5fbc9b [AMDGPU] Fix permissions on test files 2022-02-24 12:17:54 +00:00
alex-t
e4103c91f8 [AMDGPU] Select build_vector DAG nodes according to the divergence
This change enables divergence-driven instruction selection for the build_vector DAG nodes.
It also enables packed i16 instructions for GFX9.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116187
2021-12-23 02:27:12 +03:00