llvm-project

Author	SHA1	Message	Date
Craig Topper	4f1c7256f9	Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior. cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix. cvtt2si/cvt2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix. Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein. llvm-svn: 171668	2013-01-06 20:39:29 +00:00
Evan Cheng	3fb03e23a4	Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch. llvm-svn: 171665	2013-01-06 19:00:15 +00:00
Craig Topper	92a70b1e65	Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171608	2013-01-05 07:39:25 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Nadav Rotem	c616a5408a	Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468	2013-01-04 17:35:21 +00:00
Elena Demikhovsky	5f2f06d2d9	Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171467	2013-01-03 08:48:33 +00:00
Michael Gottesman	820aac1c78	Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks." This reverts commit r171461 since it breaks the following tests: Clang :: Analysis/outofbound-notwork.c Clang :: Analysis/string-fail.c Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp Clang :: CXX/temp/temp.param/p14.cpp Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c Clang :: CodeGen/blocks-2.c Clang :: CodeGen/libcalls-d.c Clang :: CodeGen/libcalls-ld.c Clang :: CodeGenCXX/conversion-function.cpp Clang :: CodeGenCXX/debug-info-limit-type.cpp Clang :: CodeGenCXX/inheriting-constructor.cpp Clang :: FixIt/fixit-errors.c Clang :: FixIt/fixit-pmem.cpp Clang :: Modules/namespaces.cpp Clang :: PCH/changed-files.c Clang :: PCH/pr4489.c Clang :: PCH/source-manager-stack.c Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp Clang :: SemaTemplate/instantiate-function-1.mm llvm-svn: 171466	2013-01-03 08:18:30 +00:00
Craig Topper	7c27cc9fd0	Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171461	2013-01-03 06:40:20 +00:00
Jakob Stoklund Olesen	725d57682b	Fix PR14732 by handling all kinds of IMPLICIT_DEF live ranges. Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs pass, and a few are reinserted by PHIElimination when a PHI argument is <undef>. RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look like those created by PHIElimination, and that their live range never leaves the basic block. The PR14732 test case does tricks with PHI nodes that causes a longer IMPLICIT_DEF live range to appear. This happens very rarely, but RegisterCoalescer should be able to handle it. llvm-svn: 171435	2013-01-03 00:47:51 +00:00
Tom Stellard	567f886eb0	DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two mistakes: 1. It was checking the legality of scalar INT_TO_FP nodes and then generating vector nodes. 2. It was passing the result value type to TargetLoweringInfo::getOperationAction() when it should have been passing the value type of the first operand. llvm-svn: 171420	2013-01-02 22:13:01 +00:00
Nadav Rotem	c8d7047fa9	AVX: Fix a bug in WidenMaskArithmetic. llvm-svn: 171397	2013-01-02 17:40:39 +00:00
Dmitri Gribenko	56bf2e1830	Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID This is done to avoid odd test failures, like the one fixed in r171243. llvm-svn: 171250	2012-12-30 02:33:22 +00:00
Nadav Rotem	3da9ac72fa	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. llvm-svn: 171178	2012-12-28 05:45:24 +00:00
Nadav Rotem	2a054b4475	On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized register. In most cases we actually compare or select YMM-sized registers and mixing the two types creates horrible code. This commit optimizes some of the transition sequences. PR14657. llvm-svn: 171148	2012-12-27 08:15:45 +00:00
NAKAMURA Takumi	40aa3285f4	llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083. llvm-svn: 171084	2012-12-26 03:19:30 +00:00
NAKAMURA Takumi	334f685328	llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082. llvm-svn: 171083	2012-12-26 03:08:55 +00:00
Benjamin Kramer	a9f265ee98	Harden test so it's not affected by changes to compare lowering. This only failed on hosts that don't have SSE41. llvm-svn: 171066	2012-12-25 13:23:23 +00:00
Benjamin Kramer	81b5a8fd2e	X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity. llvm-svn: 171064	2012-12-25 13:09:08 +00:00
Benjamin Kramer	df4af41b9b	X86: Custom lower <2 x i64> eq and ne when SSE41 is not available. pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack. Small speedup on loop-vectorized viterbi (-march=core2). llvm-svn: 171063	2012-12-25 12:54:19 +00:00
NAKAMURA Takumi	1b18db7ea3	llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple. llvm-svn: 171029	2012-12-24 11:14:06 +00:00
Nadav Rotem	dc0ad92b64	Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned. When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding tables and removes the alignment restrictions from VEX-encoded instructions. llvm-svn: 171024	2012-12-24 09:40:33 +00:00
Benjamin Kramer	76268ac682	X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. llvm-svn: 170985	2012-12-22 16:07:56 +00:00
Benjamin Kramer	b2f0a2bd4b	X86: Emit vector sext as shuffle + sra if vpmovsx is not available. Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better than scalarized loads. Fixes PR14590. llvm-svn: 170984	2012-12-22 11:34:28 +00:00
Nadav Rotem	d5aae980cb	In some cases, due to scheduling constraints we copy the EFLAGS. The only way to read the eflags is using push and pop. If we don't adjust the stack then we run over the first frame index. This is not something that we want to do, so we have to make sure that our machine function does not copy the flags. If it does then we have to emit the prolog that adjusts the stack. rdar://12896831 llvm-svn: 170961	2012-12-21 23:48:49 +00:00
Benjamin Kramer	b4688f84bd	try to unbreak ppc buildbots. llvm-svn: 170913	2012-12-21 18:11:45 +00:00
Benjamin Kramer	82d1c371e2	X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization. Part of PR14667. llvm-svn: 170908	2012-12-21 17:46:58 +00:00
Eric Christopher	6e47b725ff	Move these files over to the debug info directory. llvm-svn: 170810	2012-12-21 00:03:42 +00:00
Bob Wilson	3365b80290	Do not introduce vector operations in functions marked with noimplicitfloat. <rdar://problem/12879313> llvm-svn: 170630	2012-12-20 01:36:20 +00:00
Elena Demikhovsky	14a4af0e66	Optimized load + SIGN_EXTEND patterns in the X86 backend. llvm-svn: 170506	2012-12-19 07:50:20 +00:00
Craig Topper	63f5921776	Teach SimplifySetCC that comparing AssertZext i1 against a constant 1 can be rewritten as a compare against a constant 0 with the opposite condition. llvm-svn: 170495	2012-12-19 06:12:28 +00:00
Craig Topper	f924a58af1	Add rest of BMI/BMI2 instructions to the folding tables as well as popcnt and lzcnt. llvm-svn: 170304	2012-12-17 05:02:29 +00:00
Benjamin Kramer	b16ccde7a4	X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. llvm-svn: 170273	2012-12-15 16:47:44 +00:00
Nadav Rotem	8487537bdb	TypeLegalizer: Do not generate target specific nodes with illegal types, because we cant type-legalize them. llvm-svn: 170245	2012-12-14 21:20:37 +00:00
Evan Cheng	bf0baa9de7	Fix a bug in DAGCombiner::MatchBSwapHWord. Make sure the node has operands before referencing them. rdar://12868039 llvm-svn: 170078	2012-12-13 01:34:32 +00:00
NAKAMURA Takumi	be230b8fdb	llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Fix possible typo(s) in CHECK-NOT lines. Found by Alexander Zinenko, thanks! llvm-svn: 169978	2012-12-12 13:34:20 +00:00
NAKAMURA Takumi	cae5321a3b	llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Rename symbols, s/test_/Test/g, not to mismatch "CHECK(-NOT): test". llvm-svn: 169977	2012-12-12 13:34:14 +00:00
NAKAMURA Takumi	69d1405e48	llvm/test/CodeGen/X86/store_op_load_fold.ll: Fix typo, s/CHECK_NEXT/CHECK-NEXT/ llvm-svn: 169957	2012-12-12 01:41:01 +00:00
NAKAMURA Takumi	01ac65af00	llvm/test/CodeGen/X86/store_op_load_fold.ll: Add explicit triple. llvm-svn: 169956	2012-12-12 01:40:56 +00:00
Manman Ren	82751a105c	DAGCombine: clamp hi bit in APInt::getBitsSet to avoid assertion rdar://12838504 llvm-svn: 169951	2012-12-12 01:13:50 +00:00
Evan Cheng	04e5518783	Avoid using lossy load / stores for memcpy / memset expansion. e.g. f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944	2012-12-12 00:42:09 +00:00
Chad Rosier	d4c0c6cb22	Add a triple to this test. llvm-svn: 169803	2012-12-11 00:51:36 +00:00
Chandler Carruth	b27041c50b	Fix a miscompile in the DAG combiner. Previously, we would incorrectly try to reduce the width of this load, and would end up transforming: (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32) to (truncate (zextload i32 <ptr+4> as i64) to i32) We lost the sext attached to the load while building the narrower i32 load, and replaced it with a zext because lshr always zext's the results. Instead, bail out of this combine when there is a conflict between a sextload and a zext narrowing. The rest of the DAG combiner still optimize the code down to the proper single instruction: movswl 6(...),%eax Which is exactly what we wanted. Previously we read past the end and missed the sign extension: movl 6(...), %eax llvm-svn: 169802	2012-12-11 00:36:57 +00:00
Paul Redmond	c4550d4967	move X86-specific test This test case uses -mcpu=corei7 so it belongs in CodeGen/X86 Reviewed by: Nadav llvm-svn: 169801	2012-12-11 00:36:43 +00:00
Chad Rosier	df42cf39ab	Fall back to the selection dag isel to select tail calls. This shouldn't affect codegen for -O0 compiles as tail call markers are not emitted in unoptimized compiles. Testing with the external/internal nightly test suite reveals no change in compile time performance. Testing with -O1, -O2 and -O3 with fast-isel enabled did not cause any compile-time or execution-time failures. All tests were performed on my x86 machine. I'll monitor our arm testers to ensure no regressions occur there. In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While it's theoretically true that this is just an optimization, it's an optimization that we very much want to happen even at -O0, or else ARC applications become substantially harder to debug. Part of rdar://12553082 llvm-svn: 169796	2012-12-11 00:18:02 +00:00
Evan Cheng	79e2ca90bc	Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791	2012-12-10 23:21:26 +00:00
Craig Topper	d8005db486	Teach DAG combine to handle vector add/sub with vectors of all 0s. llvm-svn: 169727	2012-12-10 08:12:29 +00:00
Craig Topper	a183ddb0fe	Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined. llvm-svn: 169684	2012-12-08 22:49:19 +00:00
Nadav Rotem	ad0b5fbe8c	When we use the BLEND instruction that uses the MSB as a mask, we can remove the VSRI instruction before it since it does not affect the MSB. Thanks Craig Topper for suggesting this. llvm-svn: 169638	2012-12-07 21:43:11 +00:00
Nadav Rotem	481e50efe0	X86: Prefer using VPSHUFD over VPERMIL because it has better throughput. llvm-svn: 169624	2012-12-07 19:01:13 +00:00

1 2 3 4 5 ...

3753 Commits