27 Commits

Author SHA1 Message Date
Noah Goldstein
17162b61c2 [KnownBits] Make nuw and nsw support in computeForAddSub optimal
Just some improvements that should hopefully strengthen analysis.

Closes #83580
2024-03-05 12:59:58 -06:00
Jay Foad
7fa7a08f21 [AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
2023-07-19 10:33:11 +01:00
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
Sanjay Patel
8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Jay Foad
5cae88164e [AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so,
mostly because the checks are autogenerated and/or GFX11 can share the
same checks as GFX10.

Differential Revision: https://reviews.llvm.org/D129295
2022-07-08 09:13:59 +01:00
Sanjay Patel
e44dcfb06e [AMDGPU] add alternate tests for max-offset codegen; NFC
As discussed in D128123, the existing test shows a possible
regression when converting sub to xor. This adds tests that
avoid that pattern but still has a offset near 65535. Also,
add a test with the canonical IR for the existing test to show
if the transform is happening with the expected pattern in IR.
2022-06-30 15:51:39 -04:00
Stanislav Mekhanoshin
16cf9e6dad [AMDGPU] Fix handling of gfx10 LDS misaligned access bug
It was only handled for FLAT initially because we did not have
unaligned DS instructions lowering. Now it is implemented but
the bug is not handled.

Differential Revision: https://reviews.llvm.org/D123338
2022-04-07 15:08:29 -07:00
Matt Arsenault
89c447e4e6 AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal
This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.
2022-01-20 12:12:05 -05:00
Jay Foad
3f34f75a68 [AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32
As described in the comment, the way we change vcc to vcc_lo in these
operands confuses addPhysRegDataDeps into treating them as implicit
pseudo operands. Fix this by setting the correct latency from the
SchedModel after addPhysRegDataDeps wrongly set it to 0.

Differential Revision: https://reviews.llvm.org/D112317
2021-10-22 20:03:29 +01:00
Joe Nash
3ce1b9631a [AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109536

Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
2021-09-14 15:11:27 -04:00
Stanislav Mekhanoshin
4a3b055653 [AMDGPU] Fix flags of V_MOV_B64_PSEUDO
In particular it was not rematerializable.

Differential Revision: https://reviews.llvm.org/D105724
2021-07-09 12:49:28 -07:00
Baptiste Saleil
caf1294d95 [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Petar Avramovic
b082e6f88a [AMDGPU] Extend gfx10 test coverage. NFC.
Differential Revision: https://reviews.llvm.org/D99267
2021-03-29 11:13:55 +02:00
Tony
2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Mircea Trofin
1ebe86adf5 [NFC] Removed unused prefixes in test/CodeGen/AMDGPU
More patches to follow.

Differential Revision: https://reviews.llvm.org/D94121
2021-01-05 14:16:52 -08:00
Jay Foad
040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jay Foad
32897c05ab [AMDGPU] Specify a triple to avoid codegen changes depending on host OS 2020-11-03 13:33:44 +00:00
Jay Foad
0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f0b38dc11fbce2553fc715067aaf42f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad
2e7e898c8f Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Jay Foad
f3881d6517 [AMDGPU] Generate test checks. NFC. 2020-11-02 13:56:46 +00:00
Nicolai Haehnle
2710171a15 AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297
2019-06-25 11:52:30 +00:00
Michael Liao
9bef688bc2 [AMDGPU] Add more test cases of D59608.
Summary: - Add more test cases.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60071

llvm-svn: 357442
2019-04-02 00:36:37 +00:00
Matt Arsenault
84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Matt Arsenault
3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
9c47dd583a AMDGPU: Remove some old intrinsic uses from tests
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Matt Arsenault
2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Matt Arsenault
966a94f861 AMDGPU: Handle sub of constant for DS offset folding
sub C, x - > add (sub 0, x), C for DS offsets.

This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.

llvm-svn: 247054
2015-09-08 19:34:22 +00:00