llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	d32ca2c0b7	[X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3) Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data. This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage. Differential Revision: https://reviews.llvm.org/D48936 llvm-svn: 336642	2018-07-10 07:58:33 +00:00
Simon Pilgrim	97e6afec2a	[X86][SSE] Add extra v16i16 shl x,c -> pmullw test We want to compare shifts with repeated vs non-repeated v8i16 shuffle masks (for PBLENDW ymm) llvm-svn: 336333	2018-07-05 09:54:53 +00:00
Simon Pilgrim	0c3b421b1a	[X86][SSE] Add v16i16 shl x,c -> pmullw test llvm-svn: 336277	2018-07-04 14:20:58 +00:00
Simon Pilgrim	c3e1617bf9	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED) We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247. llvm-svn: 336250	2018-07-04 09:12:48 +00:00
Simon Pilgrim	61fdf3b33c	[X86][SSE] Add reduced crash test case for r336113 - [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values The patch was reverted at r336189 due to crashes llvm-svn: 336247	2018-07-04 08:55:23 +00:00
Benjamin Kramer	fd171f2f89	Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values" This reverts commit r336113. It causes crashes. llvm-svn: 336189	2018-07-03 11:15:17 +00:00
Simon Pilgrim	2bc8e079f2	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113	2018-07-02 15:14:07 +00:00
Simon Pilgrim	a6be2437e7	[X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends. llvm-svn: 336112	2018-07-02 14:53:41 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(\1)/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Simon Pilgrim	0afe97f480	[X86][SSE] Dropped -mcpu from vector shift tests Use triple and attribute only for consistency llvm-svn: 306662	2017-06-29 11:09:53 +00:00
Simon Pilgrim	ba325e3a73	[X86][SSE] Don't blend vector shifts with MOVSS/MOVSD directly, lower from generic shuffle Shuffle lowering will correctly lower to MOVSS/MOVSD/PBLEND, improving commutation opportunities llvm-svn: 281471	2016-09-14 14:08:18 +00:00
Simon Pilgrim	f1f55198c1	[X86][SSE] Regenerate vector shift lowering tests llvm-svn: 278232	2016-08-10 15:13:49 +00:00
Andrea Di Biagio	aac2eac4c2	[X86] Improve the lowering of packed shifts by constant build_vector. This patch teaches the backend how to efficiently lower logical and arithmetic packed shifts on both SSE and AVX/AVX2 machines. When possible, instead of scalarizing a vector shift, the backend should try to expand the shift into a sequence of two packed shifts by immedate count followed by a MOVSS/MOVSD. Example (v4i32 (srl A, (build_vector < X, Y, Y, Y>))) Can be rewritten as: (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>))) [with X and Y ConstantInt] The advantage is that the two new shifts from the example would be lowered into X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into four scalar shifts plus four pairs of vector insert/extract. llvm-svn: 206316	2014-04-15 19:30:48 +00:00

14 Commits