36 Commits

Author SHA1 Message Date
Daniil Fukalov
5b3fad4966 [AMDGPU][CostModel] Update shuffle instruction tests. NFC.
New tests ported over from test/Analysis/CostModel/AArch64/shuffle-other.ll.
2021-08-30 19:17:27 +03:00
Simon Pilgrim
872a950033 [CostModel] Treat 'widen subvector' patterns as zero cost
As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost.

NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:43:10 +01:00
alex-t
e585b332e4 [AMDGPU] PHI node cost should not be counted for the size and latency.
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105104
2021-06-30 16:11:17 +03:00
dfukalov
8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Sander de Smalen
4ca860742d [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Alexey Bataev
14ae0cf0f5 [Cost]Canonicalize the cost for logical or/and reductions.
The generic cost of logical or/and reductions should be cost of bitcast
<ReduxWidth x i1> to iReduxWidth + cmp eq|ne iReduxWidth.

Differential Revision: https://reviews.llvm.org/D97961
2021-03-19 11:01:58 -07:00
Alexey Bataev
60470ac7ff [Cost]Add tests for boolean and/or reductions, NFC.
Tests with the default costs for boolean and/or reductions.

Differential Revision: https://reviews.llvm.org/D97793
2021-03-03 12:34:30 -08:00
Juneyoung Lee
c89d9d8a48 [TTI] Consider select form of and/or i1 as having arithmetic cost
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97360
2021-03-02 02:18:19 +09:00
Stanislav Mekhanoshin
a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
dfukalov
9068c20965 [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
dfukalov
4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
dfukalov
76a0c0ee6f [AMDGPU][CostModel] Improve cost estimation for fused {fadd|fsub}(a,fmul(b,c))
Summary:
If result of fmul(b,c) has one use, in almost all cases (except denormals are
IEEE) the pair of operations will be fused in one fma/mad/mac/etc.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83919
2020-07-16 03:06:38 +03:00
Stanislav Mekhanoshin
f7a7efbf88 [AMDGPU] Tweak getTypeLegalizationCost()
Even though wide vectors are legal they still cost more as we
will have to eventually split them. Not all operations can
be uniformly done on vector types.

Conservatively add the cost of splitting at least to 8 dwords,
which is our widest possible load.

We are more or less lying to cost mode with this change but
this can prevent vectorizer from creation of wide vectors which
results in RA problems for us.

Differential Revision: https://reviews.llvm.org/D83078
2020-07-06 14:07:48 -07:00
dfukalov
129388ddc4 [AMDGPU][CostModel] Add fneg cost estimation
Summary: The estimation uses AMDGPUTargetLowering::isFNegFree()

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82065
2020-06-19 17:31:35 +03:00
Jonathan Roelofs
7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault
5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Stanislav Mekhanoshin
58578f7056 [AMDGPU] Implemented fma cost analysis
Differential Revision: https://reviews.llvm.org/D71676
2019-12-18 23:54:20 -08:00
Stanislav Mekhanoshin
b8ac5894a1 [AMDGPU] Fixed cost model for packed 16 bit ops
Differential Revision: https://reviews.llvm.org/D71622
2019-12-17 15:14:17 -08:00
Matt Arsenault
b337bce871 AMDGPU: Split test functions to avoid dependency on subtarget
Prepare this test for moving tthe denormal setting out of the
subtarget features.
2019-11-19 11:12:13 +05:30
dfukalov
6e8251046b [AMDGPU] Fix bug introduced in 47a5c36b37f0
Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0

Reviewers: foad, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69915
2019-11-07 11:50:14 +03:00
dfukalov
47a5c36b37 [AMDGPU] Improve code size cost model (part 2)
Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69629
2019-11-06 13:55:48 +03:00
Daniil Fukalov
3972057511 [AMDGPU] Improve code size cost model
Summary:
Added estimation for zero size insertelement, extractelement
and llvm.fabs operators.
Updated inline/unroll parameters default values.

Reviewers: rampitec, arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68881

llvm-svn: 375109
2019-10-17 12:15:35 +00:00
Matt Arsenault
8dbeb9256c TTI: Improve default costs for addrspacecast
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436
2019-06-03 18:41:34 +00:00
Matt Arsenault
e0c1f9e76d AMDGPU: Partially fix default device for HSA
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

llvm-svn: 356347
2019-03-17 21:31:35 +00:00
Tim Renouf
e30aa6a136 [AMDGPU] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Fixed assumptions of power-of-2 vector type in kernel arg handling,
  and added v5 kernel arg tests and v3/v5 shader arg tests.

* Added v5 tests for cost analysis.

* Added vec3/vec5 arg test cases.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58928

Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
2019-03-17 21:04:16 +00:00
Yaxun Liu
2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Matt Arsenault
376f1bd73c AMDGPU: Don't assert in TTI with fp32 denorms enabled
Also refine for f16 and rcp cases.

llvm-svn: 312213
2017-08-31 05:47:00 +00:00
Matt Arsenault
3c5e4237c6 AMDGPU: Make some packed shuffles free
VOP3P instructions can encode access to either
half of the register.

llvm-svn: 302730
2017-05-10 21:29:33 +00:00
Matt Arsenault
3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
8c8fcb2585 AMDGPU: Cost model for basic integer operations
This resolves bug 21148 by preventing promotion to
i64 induction variables.

llvm-svn: 264376
2016-03-25 01:16:40 +00:00
Matt Arsenault
9651813ee0 AMDGPU: Partially implement getArithmeticInstrCost for FP ops
llvm-svn: 264374
2016-03-25 01:00:32 +00:00
Matt Arsenault
51d702812d TTI: Report 0 cost for free addrspacecasts
llvm-svn: 264369
2016-03-25 00:26:29 +00:00
Matt Arsenault
8e9aa0acc8 TTI: Use 0 for cost of fabs if free
Ideally this would also happen for fneg, but that
isn't a distinct operation in the IR.

llvm-svn: 264368
2016-03-25 00:26:22 +00:00
Matt Arsenault
59767cea79 AMDGPU: TTI: Make insertelement free.
We don't want to have a cost to scalarizing operations.

llvm-svn: 264364
2016-03-25 00:14:11 +00:00
Matt Arsenault
e05ff15186 AMDGPU: Override getCFInstrCost
The default cost was 0 with the assumption that it is predictable.

llvm-svn: 255796
2015-12-16 18:37:19 +00:00
Matt Arsenault
e830f5427b AMDGPU: Report extractelement as free in cost model
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.

llvm-svn: 254438
2015-12-01 19:08:39 +00:00