4253 Commits

Author SHA1 Message Date
Simon Pilgrim
edccc1254b Avoid repeated calls to MVT getSizeInBits and getScalarSizeInBits(). NFCI.
llvm-svn: 288170
2016-11-29 17:57:48 +00:00
Simon Pilgrim
001368abc8 [X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI.
llvm-svn: 288147
2016-11-29 15:32:58 +00:00
Simon Pilgrim
35c47c494d [X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX.
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.

llvm-svn: 288140
2016-11-29 14:18:51 +00:00
Simon Pilgrim
923020a652 Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI.
llvm-svn: 288138
2016-11-29 13:43:08 +00:00
Simon Pilgrim
2228f70a85 [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.
llvm-svn: 288049
2016-11-28 17:58:19 +00:00
Sanjay Patel
100bc01a72 [x86] fix formatting; NFC
llvm-svn: 288045
2016-11-28 17:39:21 +00:00
Simon Pilgrim
3f10e66981 [X86][SSE] Added support for combining bit-shifts with shuffles.
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.

Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.

llvm-svn: 288040
2016-11-28 16:25:01 +00:00
Simon Pilgrim
91d6f5fbc1 [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Simon Pilgrim
cdb2ce661d [X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.
Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit).

llvm-svn: 288003
2016-11-27 19:28:39 +00:00
Craig Topper
6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Simon Pilgrim
8e8ae7219f Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper
88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Simon Pilgrim
f1ee930db0 Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Simon Pilgrim
9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim
841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim
ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov
3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov
bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Simon Pilgrim
a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Simon Pilgrim
4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Zvi Rackover
14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Simon Pilgrim
4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Zvi Rackover
9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover
0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Craig Topper
da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Simon Pilgrim
b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Simon Pilgrim
5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim
5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim
3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim
c17e1b74b8 [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim
096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Oren Ben Simhon
c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Simon Pilgrim
a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim
623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Craig Topper
893ea9fb2c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Simon Pilgrim
7938bd666e Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Craig Topper
07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim
9d15fb3c10 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00
Simon Pilgrim
67ef3b984a Wdocumentation fix
llvm-svn: 287224
2016-11-17 12:21:45 +00:00
Simon Pilgrim
8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Oren Ben Simhon
489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Craig Topper
05b0fcd168 [X86] Fix formatting. NFC
llvm-svn: 287211
2016-11-17 05:59:55 +00:00
Sanjay Patel
066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Simon Pilgrim
b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Sanjay Patel
73d1d35d21 fix formatting; NFC
llvm-svn: 286989
2016-11-15 17:47:13 +00:00
Simon Pilgrim
ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Craig Topper
5cb13062d2 [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Reviewers: delena, RKSimon

Subscribers: Farhana, llvm-commits

Differential Revision: https://reviews.llvm.org/D26297

llvm-svn: 286709
2016-11-12 05:05:27 +00:00
Evandro Menezes
21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Craig Topper
f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Simon Pilgrim
b3ad5f7ebf [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsElementInsertion. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286067
2016-11-06 14:20:29 +00:00