20 Commits

Author SHA1 Message Date
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Matt Arsenault
4998de4dcc AMDGPU: Update some wait tests to opaque pointers
The script mangled the constantexprs in waitcnt-looptest.ll, so fixed
those manually.
2022-12-01 21:01:58 -05:00
Carl Ritson
4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Carl Ritson
58f1417ebc [AMDGPU] Order pos exports before param exports
Summary:
Modify export clustering DAG mutation to move position exports
before other exports types.

Reviewers: foad, arsenm, rampitec, nhaehnle

Reviewed By: foad

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79670
2020-05-12 23:02:23 +09:00
Jay Foad
5f7ea85e78 [AMDGPU] Remove unnecessary s_waitcnt between VMEM loads
VMEM loads of the same type (sampler vs no sampler) are guaranteed to
write their result registers in order, so there is no need for an
s_waitcnt even if they write to overlapping vgprs.

Differential Revision: https://reviews.llvm.org/D79176
2020-05-01 10:10:23 +01:00
Matt Arsenault
20ca49b646 AMDGPU: Update tests to use modern buffer intrinsics 2020-01-16 13:49:43 -05:00
Jay Foad
0412f518dc [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr
Summary:
The typo has been present since memOpsHaveSameBasePtr was introduced in
r313208.

It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than
it was supposed to.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71616
2019-12-17 18:54:27 +00:00
Matt Arsenault
1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Matt Arsenault
51a05d72ae AMDGPU: Drop remnants of byval support for shaders
Before 2018, mesa used to use byval interchangably with inreg, which
didn't really make sense. Fix tests still using it to avoid breaking
in a future commit.

llvm-svn: 365953
2019-07-12 20:12:17 +00:00
Yaxun Liu
0124b5484c [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170

llvm-svn: 325030
2018-02-13 18:00:25 +00:00
Matt Arsenault
b600e138cc AMDGPU: Remove llvm.SI.vs.load.input
llvm-svn: 299391
2017-04-03 21:45:13 +00:00
Matt Arsenault
3ea06336fc AMDGPU: Remove some uses of llvm.SI.export in tests
Merge some of the old, smaller tests into more complete versions.

llvm-svn: 295792
2017-02-22 00:02:21 +00:00
Matt Arsenault
7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Sanjoy Das
3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Nicolai Haehnle
df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Matt Arsenault
9c47dd583a AMDGPU: Remove some old intrinsic uses from tests
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Matt Arsenault
2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Tom Stellard
bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
Tom Stellard
45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00