16 Commits

Author SHA1 Message Date
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
Alexander Timofeev
fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
David Salinas
c0581f7df6 Revert D109159 : Revert "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D116686

Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2
2022-01-11 21:14:09 +00:00
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Michael Liao
640beb38e7 [amdgpu] Enable selection of s_cselect_b64.
Differential Revision: https://reviews.llvm.org/D109159
2021-09-07 10:45:07 -04:00
Matt Arsenault
d2e52eec51 AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Piotr Sobczak
0045786f14 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
Piotr Sobczak
6d9565d6d5 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.
2020-06-19 16:41:04 +02:00
Piotr Sobczak
4067de569f [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
Matt Arsenault
2fe500ab5b AMDGPU: Look through casted selects to constant fold bin ops
The promotion of the uniform select to i32 interfered with this fold.
2020-01-22 10:16:39 -05:00
Matt Arsenault
bcd91778fe AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare
DAGCombiner does this, but divisions expanded here miss this
optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f,
divisions have been expanded here and missed out on this
optimization. Avoids test regressions in a future patch.
2020-01-22 10:16:39 -05:00
Stanislav Mekhanoshin
67aa18f165 [AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868
2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin
22ee191c3e DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1"
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.

Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.

In turn the same pattern with "or" opcode can reach DAG.

Differential Revision: https://reviews.llvm.org/D48301

llvm-svn: 335250
2018-06-21 16:02:05 +00:00
Stanislav Mekhanoshin
20279dc025 Allow binop C1, (select cc, CF, CT) -> select folding
Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Differential Revision: https://reviews.llvm.org/D48223

llvm-svn: 335167
2018-06-20 20:24:20 +00:00