llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	aa8dffb69b	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts llvm-svn: 303012	2017-05-14 17:36:07 +00:00
Simon Pilgrim	4599eaa09a	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts llvm-svn: 303010	2017-05-14 13:38:53 +00:00
Simon Pilgrim	2d1c6d6e8d	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics. Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378	2017-05-07 20:58:55 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Keno Fischer	1ec5dd85a2	[X86 TTI] Implement LSV hook Summary: LSV wants to know the maximum size that can be loaded to a vector register. On X86, this always matches the maximum register width. Implement this accordingly and add a test to make sure that LSV can vectorize up to the maximum permissible width on X86. Reviewers: delena, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31504 llvm-svn: 299589	2017-04-05 20:51:38 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Simon Pilgrim	a0b0b74b9a	Align cost model columns. NFCI. llvm-svn: 297824	2017-03-15 11:57:42 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Michael Kuperstein	e6d59fdca5	[X86] Add costs for non-AVX512 single-source permutation integer shuffles Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932	2017-02-02 20:27:13 +00:00
Jonas Paulsson	8e2f948ef0	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Mohammed Agabaria	20caee95e1	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Simon Pilgrim	3e5b525699	Remove trailing whitespace. NFCI. llvm-svn: 292613	2017-01-20 15:15:59 +00:00
Simon Pilgrim	0da4d2bc03	[CostModel][X86] Removed unused cost. NFCI. SHL v8i32 is already handled in the SSE41 cost table llvm-svn: 292612	2017-01-20 15:14:38 +00:00
Simon Pilgrim	6ed996cdf0	[CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077	2017-01-15 20:44:00 +00:00
Simon Pilgrim	d419b73a42	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed llvm-svn: 292023	2017-01-14 19:24:23 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Simon Pilgrim	9c58950eeb	[CostModel][X86] Fixed vXi8 uniform shift costs. The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391	2017-01-08 14:14:36 +00:00
Simon Pilgrim	1fa5487c05	[CostModel][X86] Moved legal uniform shift costs earlier. XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390	2017-01-08 13:12:03 +00:00
Simon Pilgrim	9681c407b4	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372	2017-01-07 22:27:43 +00:00
Simon Pilgrim	a470296367	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs. llvm-svn: 291366	2017-01-07 22:08:09 +00:00
Simon Pilgrim	82e3e05fe2	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365	2017-01-07 21:47:10 +00:00
Simon Pilgrim	e70644dab7	[CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion. llvm-svn: 291364	2017-01-07 21:33:00 +00:00
Simon Pilgrim	725997154d	[CostModel][X86] Merge separate AVX1 cost LUTs. NFCI. llvm-svn: 291355	2017-01-07 18:19:25 +00:00
Simon Pilgrim	a4109d6433	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets. llvm-svn: 291354	2017-01-07 17:54:10 +00:00
Simon Pilgrim	df7de7a87e	[CostModel][X86] Added missing AVX2 arithmetic costs. Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353	2017-01-07 17:27:39 +00:00
Simon Pilgrim	100eae1ee0	[CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI. llvm-svn: 291352	2017-01-07 17:03:51 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Simon Pilgrim	d8333372bc	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs. Set the costs on the lowest target that supports the type. llvm-svn: 291229	2017-01-06 11:12:53 +00:00
Simon Pilgrim	aa186c632d	[CostModel][X86] Tidyup arithmetic costs code. NFCI. Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187	2017-01-05 22:48:02 +00:00
Simon Pilgrim	4c050c2190	[CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI. llvm-svn: 291165	2017-01-05 19:42:43 +00:00
Simon Pilgrim	6f72eba606	Remove trailing whitespace. NFCI. llvm-svn: 291163	2017-01-05 19:24:25 +00:00
Simon Pilgrim	5b06e4d319	[CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI. llvm-svn: 291162	2017-01-05 19:19:39 +00:00
Simon Pilgrim	a8bf97569a	[CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI. Removes need for yet another LUT. llvm-svn: 291158	2017-01-05 19:01:50 +00:00
Simon Pilgrim	430d34fc14	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI. Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152	2017-01-05 18:36:48 +00:00
Simon Pilgrim	b01e844241	[CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149	2017-01-05 18:20:25 +00:00
Simon Pilgrim	f74700aa8c	[CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI. llvm-svn: 291146	2017-01-05 17:56:19 +00:00
Simon Pilgrim	bca02f9e20	[CostModel][X86] Add support for broadcast shuffle costs Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122	2017-01-05 15:56:08 +00:00
Simon Pilgrim	a62395a4bd	[CostModel][X86] Pulled out common type legalization code llvm-svn: 291109	2017-01-05 14:33:32 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Mohammed Agabaria	189e2d29ba	[Test Commit] fixing some format issue in X86TTI to match clang-format output. llvm-svn: 291095	2017-01-05 09:51:02 +00:00
Simon Pilgrim	bb895f3e9c	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962	2017-01-04 14:01:33 +00:00
Simon Pilgrim	939b8cd708	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI. As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956	2017-01-04 12:08:41 +00:00
Elena Demikhovsky	d96200d60a	Fixed shuffle-reverse cost on AVX-512. (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812	2017-01-02 11:44:10 +00:00
Elena Demikhovsky	21706cbd24	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns. X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810	2017-01-02 10:37:52 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Simon Pilgrim	2f7f0e7a48	[CostModel][X86] Updated reverse shuffle costs llvm-svn: 289819	2016-12-15 14:24:07 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00

1 2 3 4

199 Commits