llvm-project

Author	SHA1	Message	Date
Silviu Baranga	a055aab506	Make the MergeGlobals pass correctly handle the address space qualifiers of the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them. llvm-svn: 171730	2013-01-07 12:31:25 +00:00
Craig Topper	4f1c7256f9	Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior. cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix. cvtt2si/cvt2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix. Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein. llvm-svn: 171668	2013-01-06 20:39:29 +00:00
Evan Cheng	3fb03e23a4	Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch. llvm-svn: 171665	2013-01-06 19:00:15 +00:00
Craig Topper	92a70b1e65	Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171608	2013-01-05 07:39:25 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Akira Hatanaka	b13b33359b	[mips] MipsTargetLowering::getSetCCResultType should return a vector type if vectors are being compared. llvm-svn: 171517	2013-01-04 20:06:01 +00:00
Nadav Rotem	c616a5408a	Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468	2013-01-04 17:35:21 +00:00
Elena Demikhovsky	5f2f06d2d9	Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171467	2013-01-03 08:48:33 +00:00
Michael Gottesman	820aac1c78	Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks." This reverts commit r171461 since it breaks the following tests: Clang :: Analysis/outofbound-notwork.c Clang :: Analysis/string-fail.c Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp Clang :: CXX/temp/temp.param/p14.cpp Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c Clang :: CodeGen/blocks-2.c Clang :: CodeGen/libcalls-d.c Clang :: CodeGen/libcalls-ld.c Clang :: CodeGenCXX/conversion-function.cpp Clang :: CodeGenCXX/debug-info-limit-type.cpp Clang :: CodeGenCXX/inheriting-constructor.cpp Clang :: FixIt/fixit-errors.c Clang :: FixIt/fixit-pmem.cpp Clang :: Modules/namespaces.cpp Clang :: PCH/changed-files.c Clang :: PCH/pr4489.c Clang :: PCH/source-manager-stack.c Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp Clang :: SemaTemplate/instantiate-function-1.mm llvm-svn: 171466	2013-01-03 08:18:30 +00:00
Craig Topper	7c27cc9fd0	Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171461	2013-01-03 06:40:20 +00:00
Jakob Stoklund Olesen	725d57682b	Fix PR14732 by handling all kinds of IMPLICIT_DEF live ranges. Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs pass, and a few are reinserted by PHIElimination when a PHI argument is <undef>. RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look like those created by PHIElimination, and that their live range never leaves the basic block. The PR14732 test case does tricks with PHI nodes that causes a longer IMPLICIT_DEF live range to appear. This happens very rarely, but RegisterCoalescer should be able to handle it. llvm-svn: 171435	2013-01-03 00:47:51 +00:00
Tom Stellard	567f886eb0	DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two mistakes: 1. It was checking the legality of scalar INT_TO_FP nodes and then generating vector nodes. 2. It was passing the result value type to TargetLoweringInfo::getOperationAction() when it should have been passing the value type of the first operand. llvm-svn: 171420	2013-01-02 22:13:01 +00:00
Nadav Rotem	c8d7047fa9	AVX: Fix a bug in WidenMaskArithmetic. llvm-svn: 171397	2013-01-02 17:40:39 +00:00
Hal Finkel	6dbdd4307b	Support ppcf128 in SelectionDAG::getConstantFP Fixes pr14751. Patch by Kai; Thanks! llvm-svn: 171261	2012-12-30 19:03:32 +00:00
Dmitri Gribenko	56bf2e1830	Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID This is done to avoid odd test failures, like the one fixed in r171243. llvm-svn: 171250	2012-12-30 02:33:22 +00:00
Nadav Rotem	3da9ac72fa	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. llvm-svn: 171178	2012-12-28 05:45:24 +00:00
Nadav Rotem	2a054b4475	On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized register. In most cases we actually compare or select YMM-sized registers and mixing the two types creates horrible code. This commit optimizes some of the transition sequences. PR14657. llvm-svn: 171148	2012-12-27 08:15:45 +00:00
NAKAMURA Takumi	40aa3285f4	llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083. llvm-svn: 171084	2012-12-26 03:19:30 +00:00
NAKAMURA Takumi	334f685328	llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082. llvm-svn: 171083	2012-12-26 03:08:55 +00:00
Hal Finkel	2ebe6d08cd	Loosen scheduling restrictions on the PPC dcbt intrinsic As with the prefetch intrinsic to which it maps, simply have dcbt marked as reading from and writing to its arguments instead of having unmodeled side effects. While this might cause unwanted code motion (because aliasing checks don't really capture cache-line sharing), it is more important that prefetches in unrolled loops don't block the scheduler from rearranging the unrolled loop body. llvm-svn: 171073	2012-12-25 18:51:18 +00:00
Hal Finkel	1b5ff08d43	Expand PPC64 atomic load and store Use of store or load with the atomic specifier on 64-bit types would cause instruction-selection failures. As with the 32-bit case, these can use the default expansion in terms of cmp-and-swap. llvm-svn: 171072	2012-12-25 17:22:53 +00:00
Benjamin Kramer	a9f265ee98	Harden test so it's not affected by changes to compare lowering. This only failed on hosts that don't have SSE41. llvm-svn: 171066	2012-12-25 13:23:23 +00:00
Benjamin Kramer	81b5a8fd2e	X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity. llvm-svn: 171064	2012-12-25 13:09:08 +00:00
Benjamin Kramer	df4af41b9b	X86: Custom lower <2 x i64> eq and ne when SSE41 is not available. pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack. Small speedup on loop-vectorized viterbi (-march=core2). llvm-svn: 171063	2012-12-25 12:54:19 +00:00
NAKAMURA Takumi	1b18db7ea3	llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple. llvm-svn: 171029	2012-12-24 11:14:06 +00:00
Nadav Rotem	dc0ad92b64	Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned. When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding tables and removes the alignment restrictions from VEX-encoded instructions. llvm-svn: 171024	2012-12-24 09:40:33 +00:00
Benjamin Kramer	76268ac682	X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. llvm-svn: 170985	2012-12-22 16:07:56 +00:00
Benjamin Kramer	b2f0a2bd4b	X86: Emit vector sext as shuffle + sra if vpmovsx is not available. Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better than scalarized loads. Fixes PR14590. llvm-svn: 170984	2012-12-22 11:34:28 +00:00
Nadav Rotem	d5aae980cb	In some cases, due to scheduling constraints we copy the EFLAGS. The only way to read the eflags is using push and pop. If we don't adjust the stack then we run over the first frame index. This is not something that we want to do, so we have to make sure that our machine function does not copy the flags. If it does then we have to emit the prolog that adjusts the stack. rdar://12896831 llvm-svn: 170961	2012-12-21 23:48:49 +00:00
Benjamin Kramer	b4688f84bd	try to unbreak ppc buildbots. llvm-svn: 170913	2012-12-21 18:11:45 +00:00
Benjamin Kramer	82d1c371e2	X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization. Part of PR14667. llvm-svn: 170908	2012-12-21 17:46:58 +00:00
Tom Stellard	a8b0351720	R600: Expand vec4 INT <-> FP conversions llvm-svn: 170901	2012-12-21 16:33:24 +00:00
Reed Kotler	93f778d2bd	Add test case for r170674 llvm-svn: 170823	2012-12-21 00:55:10 +00:00
Eric Christopher	6e47b725ff	Move these files over to the debug info directory. llvm-svn: 170810	2012-12-21 00:03:42 +00:00
Bob Wilson	7bba4f8957	Revert "Adding support for llvm.arm.neon.vaddl[su].* and" This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765	2012-12-20 21:09:38 +00:00
Evan Cheng	ddc0cb6dc5	On some ARM cpus, flags setting movs with shifter operand, i.e. lsl, lsr, asr, are more expensive than the non-flag setting variant. Teach thumb2 size reduction pass to avoid generating them unless we are optimizing for size. rdar://12892707 llvm-svn: 170728	2012-12-20 19:59:30 +00:00
Rafael Espindola	642c7cd56e	Simplify the testcase a bit. I checked that it would still crash llc before the corresponding fix. llvm-svn: 170709	2012-12-20 17:47:27 +00:00
Renato Golin	6b2ea4a48f	Adding support for llvm.arm.neon.vaddl[su].* and llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694	2012-12-20 13:52:11 +00:00
Reed Kotler	d019dbf75e	fix most of remaining issues with large frames. these patches are tested a lot by test-suite but make check tests are forthcoming once the next few patches that complete this are committed. with the next few patches the pass rate for mips16 is near 100% llvm-svn: 170656	2012-12-20 04:07:42 +00:00
Akira Hatanaka	f423672117	[mips] Use "or $r0, $r1, $zero" instead of "addu $r0, $zero, $r1" to copy physical register $r1 to $r0. GNU disassembler recognizes an "or" instruction as a "move", and this change makes the disassembled code easier to read. Original patch by Reed Kotler. llvm-svn: 170655	2012-12-20 04:06:06 +00:00
Bob Wilson	3365b80290	Do not introduce vector operations in functions marked with noimplicitfloat. <rdar://problem/12879313> llvm-svn: 170630	2012-12-20 01:36:20 +00:00
Evan Cheng	eae6d2ccea	LLVM sdisel normalize bit extraction of the form: ((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 llvm-svn: 170581	2012-12-19 20:16:09 +00:00
Benjamin Kramer	c5071466d4	PowerPC: Expand VSELECT nodes. There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. llvm-svn: 170551	2012-12-19 15:49:14 +00:00
Elena Demikhovsky	14a4af0e66	Optimized load + SIGN_EXTEND patterns in the X86 backend. llvm-svn: 170506	2012-12-19 07:50:20 +00:00
Nadav Rotem	33360d8ae9	After reducing the size of an operation in the DAG we zero-extend the reduced bitwidth op back to the original size. If we reduce ANDs then this can cause an endless loop. This patch changes the ZEXT to ANY_EXTEND if the demanded bits are equal or smaller than the size of the reduced operation. llvm-svn: 170505	2012-12-19 07:39:08 +00:00
Craig Topper	63f5921776	Teach SimplifySetCC that comparing AssertZext i1 against a constant 1 can be rewritten as a compare against a constant 0 with the opposite condition. llvm-svn: 170495	2012-12-19 06:12:28 +00:00
Quentin Colombet	23b404d5ad	Disable ARM partial flag dependency optimization at -Oz To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency). Disables this check when code size is the priority, i.e., code is compiled with -Oz. llvm-svn: 170462	2012-12-18 22:47:16 +00:00
Andrew Trick	ec2564818c	MISched: add dependence to ExitSU to model live-out latency. llvm-svn: 170454	2012-12-18 20:53:01 +00:00
Hal Finkel	943f76d1b3	Check multiple register classes for inline asm tied registers A register can be associated with several distinct register classes. For example, on PPC, the floating point registers are each associated with both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code would fail when provided with a floating point register and an f64 operand because it would happen to find the register in the F4RC class first and return that. From the F4RC class, SDAG would extract f32 as the register type and then assert because of the invalid implied conversion between the f64 value and the f32 register. Instead, search all register classes. If a register class containing the the requested register has the requested type, then return that register class. Otherwise, as before, return the first register class found that contains the requested register. llvm-svn: 170436	2012-12-18 17:50:58 +00:00

1 2 3 4 5 ...

6767 Commits