llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	55304d0d90	[CostModel] getInstructionCost - improve estimation of costs for length changing shuffles (#84156 ) Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result. A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)	2024-03-07 10:46:27 +00:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Jay Foad	eca2fcbdeb	[AMDGPU] Fix cost of fast unsafe f32 fdiv (#68988 )	2023-10-15 12:25:36 +01:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Nikita Popov	68c50b111d	[CostModel] Convert some tests to opaque pointers (NFC)	2022-12-15 09:50:34 +01:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Simon Pilgrim	4178e33470	[CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows DOS really doesn't like `` quotes to be used in command lines Some prep work as I'm intending to resurrect D79483 soon	2022-08-10 17:54:06 +01:00
Piotr Sobczak	bd675af2a2	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865	2022-06-30 23:08:40 +02:00
Arthur Eubanks	15ba588d6d	[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'	2022-02-09 15:42:16 -08:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Andrew Litteken	4ff4e7ea30	[CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so. This did not show any changes in CTMark code size benchmarks. Reviewers: paquette, samparker, dmgreen Differential Revision: https://reviews.llvm.org/D109388	2022-01-12 18:03:50 -06:00
Daniil Fukalov	a2120f6b44	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model, part 2.	2021-12-22 22:33:57 +03:00
Daniil Fukalov	deaedab14a	[NFC][AMDGPU][CostModel] Add tests for AMDGPU cost model.	2021-12-22 22:32:09 +03:00
Daniil Fukalov	e5c64b45be	[CostModel][AMDGPU] Fix intrinsics costs estimations. 1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs. 2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat intrisics since they have special processing in cost model. 3. Minor intrisics' costs tests updat and refinement. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115385	2021-12-13 17:17:34 +03:00
Daniil Fukalov	ab05ab59a7	[CostModel][AMDGPU] Fix instructions costs estimation for vector types. 1. Fixed vector instructions costs estimations incosistency - removed different logic for "not simple types" since it biases costs for these types. 2. Fixed legalization penalty for vectors too big for the target: changed from overwrite default legalization cost value estimation to added penalty. 3. Fixed few typos in tests. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114893	2021-12-03 03:08:08 +03:00
Daniil Fukalov	cf362ff4ca	[NFC][AMDGPU] Improve cost model tests coverage.	2021-09-30 18:13:17 +03:00
Daniil Fukalov	6a187f9a57	[NFC][AMDGPU] Add missing gfx90a test cases to fsub.ll.	2021-09-29 21:55:54 +03:00
Daniil Fukalov	1f73f0c19d	[NFC][AMDGPU] Update cost model tests: 1. Convert to generated tests. 2. Added code-size case in few places.	2021-09-27 19:26:02 +03:00
Daniil Fukalov	4f28a2eb03	[NFC] Refactor tests to improve readability.	2021-09-24 01:57:30 +03:00
Daniil Fukalov	5b3fad4966	[AMDGPU][CostModel] Update shuffle instruction tests. NFC. New tests ported over from test/Analysis/CostModel/AArch64/shuffle-other.ll.	2021-08-30 19:17:27 +03:00
Simon Pilgrim	872a950033	[CostModel] Treat 'widen subvector' patterns as zero cost As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost. NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm. Differential Revision: https://reviews.llvm.org/D107228	2021-08-02 11:43:10 +01:00
alex-t	e585b332e4	[AMDGPU] PHI node cost should not be counted for the size and latency. Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the result of the PHI lowering are inserted into the basic block predecessors - not into the block itself. As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop body size/cost estimation. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105104	2021-06-30 16:11:17 +03:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
Sander de Smalen	4ca860742d	[InstructionCost] Don't conflate Invalid costs with Unknown costs. We previously made a change to getUserCost to return a Invalid cost when one of the TTI costs returned '-1' (meaning 'unknown' or 'infinitely expensive'). It makes no sense to say that: shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3> has an invalid cost. Perhaps the cost is not known, but the IR is valid and can be code-generated. Invalid should only be used for IR that cannot possibly be code-generated and where a cost is nonsensical. With more passes now asserting that the cost must be valid, it is possible that those assertions will fail for perfectly valid IR. An incomplete cost-model probably shouldn't be a reason for the compiler to break. It's better to consider these costs as 'very expensive' and ignore them for other reasons. At some point, we should consider replacing -1 with some other mechanism. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D99502	2021-03-30 09:29:42 +01:00
Alexey Bataev	14ae0cf0f5	[Cost]Canonicalize the cost for logical or/and reductions. The generic cost of logical or/and reductions should be cost of bitcast <ReduxWidth x i1> to iReduxWidth + cmp eq\|ne iReduxWidth. Differential Revision: https://reviews.llvm.org/D97961	2021-03-19 11:01:58 -07:00
Alexey Bataev	60470ac7ff	[Cost]Add tests for boolean and/or reductions, NFC. Tests with the default costs for boolean and/or reductions. Differential Revision: https://reviews.llvm.org/D97793	2021-03-03 12:34:30 -08:00
Juneyoung Lee	c89d9d8a48	[TTI] Consider select form of and/or i1 as having arithmetic cost This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b` as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`. Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe. This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked. These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97360	2021-03-02 02:18:19 +09:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
dfukalov	76a0c0ee6f	[AMDGPU][CostModel] Improve cost estimation for fused {fadd\|fsub}(a,fmul(b,c)) Summary: If result of fmul(b,c) has one use, in almost all cases (except denormals are IEEE) the pair of operations will be fused in one fma/mad/mac/etc. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83919	2020-07-16 03:06:38 +03:00
Stanislav Mekhanoshin	f7a7efbf88	[AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078	2020-07-06 14:07:48 -07:00
dfukalov	129388ddc4	[AMDGPU][CostModel] Add fneg cost estimation Summary: The estimation uses AMDGPUTargetLowering::isFNegFree() Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82065	2020-06-19 17:31:35 +03:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Stanislav Mekhanoshin	58578f7056	[AMDGPU] Implemented fma cost analysis Differential Revision: https://reviews.llvm.org/D71676	2019-12-18 23:54:20 -08:00
Stanislav Mekhanoshin	b8ac5894a1	[AMDGPU] Fixed cost model for packed 16 bit ops Differential Revision: https://reviews.llvm.org/D71622	2019-12-17 15:14:17 -08:00
Matt Arsenault	b337bce871	AMDGPU: Split test functions to avoid dependency on subtarget Prepare this test for moving tthe denormal setting out of the subtarget features.	2019-11-19 11:12:13 +05:30
dfukalov	6e8251046b	[AMDGPU] Fix bug introduced in 47a5c36b37f0 Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0 Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915	2019-11-07 11:50:14 +03:00
dfukalov	47a5c36b37	[AMDGPU] Improve code size cost model (part 2) Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629	2019-11-06 13:55:48 +03:00
Daniil Fukalov	3972057511	[AMDGPU] Improve code size cost model Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109	2019-10-17 12:15:35 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Matt Arsenault	e0c1f9e76d	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347	2019-03-17 21:31:35 +00:00
Tim Renouf	e30aa6a136	[AMDGPU] Prepare for introduction of v3 and v5 MVTs AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342	2019-03-17 21:04:16 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Matt Arsenault	376f1bd73c	AMDGPU: Don't assert in TTI with fp32 denorms enabled Also refine for f16 and rcp cases. llvm-svn: 312213	2017-08-31 05:47:00 +00:00

1 2

59 Commits