21 Commits

Author SHA1 Message Date
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
pvanhout
edca49cfb7 [AMDGPU] Match med3 for (max (min ..))
We previously only matched (min (max ...))

Depends on D144728

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145159
2023-03-07 11:14:31 +01:00
Nikita Popov
bdf2fbba9c [AMDGPU] Convert some tests to opaque pointers (NFC) 2022-12-19 12:41:13 +01:00
Alexander Timofeev
fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Nicolai Hähnle
52bc2e7577 [AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24
Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
possible. This significantly improves some game cases by eliminating
v_readfirstlane instructions when the result feeds into a scalar
operation, like the address calculation for a scalar load or store.

Since isDivergent is only an approximation of whether a value is in
SGPRs, it can potentially regress some situations where a uniform value
ends up in a VGPR. These should be rare in real code, although the test
changes do contain a number of examples.

Most of the test changes are just using s_mul instead of v_mul/mad which
is generally better for both register pressure and latency (at least on
GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
instructions have significantly longer latency than scalar ALU). Some
R600 tests now use MULLO_INT instead of MUL_UINT24.

GlobalISel appears to handle more scenarios in the desirable way,
although it can also be thrown off and fails to select the 24-bit
multiplies in some cases.

Alternative solution considered and rejected was to allow selecting
MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
those SD operations works is don't-care on the most significant 8 bits,
and this fact is used in some combines via SimplifyDemandedBits.

Based on a patch by Nicolai Hähnle.

Differential Revision: https://reviews.llvm.org/D97063
2021-02-23 15:39:19 +00:00
Piotr Sobczak
62d8b8a225 Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.

Before introducing the S_CSELECT pattern with explicit SCC
(0045786f146e78afee49eee053dc29ebc842fee1), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).

The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.

The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).

Differential Revision: https://reviews.llvm.org/D85207
2020-08-09 20:50:30 +02:00
Piotr Sobczak
0045786f14 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
Piotr Sobczak
6d9565d6d5 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f119a81419fbf2e79d5f3307dfdda5b.
2020-06-19 16:41:04 +02:00
Piotr Sobczak
4067de569f [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
Matt Arsenault
7af7b96a9b AMDGPU: Move R600 test compatability hack
Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn
intrinsics on r600.
2020-02-10 10:02:06 -08:00
Matt Arsenault
49169a963e AMDGPU: Add 24-bit mul intrinsics
Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an unknown read
source to the mul combine. It doesn't worth the hassle of trying to
insert an AssertZext or something to try to deal with it.

llvm-svn: 366094
2019-07-15 17:50:31 +00:00
Nirav Dave
fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Jan Vesely
7b2c98ab59 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

llvm-svn: 339190
2018-08-07 21:54:37 +00:00
Matt Arsenault
3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Tom Stellard
115a61560e AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Tom Stellard
2d2d33f1dc Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Tom Stellard
2b3379cdff AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Tom Stellard
09c2bd6bd4 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Matt Arsenault
6689abe632 AMDGPU: Run r600 tests last
llvm-svn: 268672
2016-05-05 20:07:37 +00:00
Tom Stellard
45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00