3896 Commits

Author SHA1 Message Date
Bob Wilson
00871c71e9 Fix result type of Neon floating-point comparisons against zero.
The result vector elements are always integers.  Radar 8782191.

llvm-svn: 122112
2010-12-18 00:04:33 +00:00
Bill Wendling
3fff1fd49b During local stack slot allocation, the materializeFrameBaseRegister function
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.

Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)

<rdar://problem/8782198>

llvm-svn: 122104
2010-12-17 23:09:14 +00:00
Bob Wilson
5408144add Fix a DAGCombiner crash when folding binary vector operations with constant
BUILD_VECTOR operands where the element type is not legal.  I had previously
changed this code to insert TRUNCATE operations, but that was just wrong.

llvm-svn: 122102
2010-12-17 23:06:49 +00:00
Bob Wilson
494d7f0367 Combine several vector-related DAGCombiner tests.
llvm-svn: 122101
2010-12-17 23:06:46 +00:00
Nate Begeman
97b72c99d2 Add support for matching psign & plendvb to the x86 target
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts

llvm-svn: 122098
2010-12-17 22:55:37 +00:00
Dale Johannesen
cd538afa52 Add a transform to DAG Combiner. This improves the
code for the case where 32-bit divide by constant is
turned into 64-bit multiply by constant.  8771012.

llvm-svn: 122090
2010-12-17 21:45:49 +00:00
Kalle Raiskila
affe15fd67 Don't feed 19 bit immediates to ILA.
Patch (slightly modified) by Visa Putkinen.

llvm-svn: 122052
2010-12-17 09:36:09 +00:00
Bob Wilson
bfc6904fc6 Fix crash compiling a QQQQ REG_SEQUENCE for a Neon vld3_lane operation.
Radar 8776599

llvm-svn: 122018
2010-12-17 01:21:12 +00:00
Jason W Kim
4c1386add9 1. ARM/MC/ELF: A few more ELF relocs for .o
2. Fixed EmitLocalCommonSymbol for ELF (Yes, they exist. :)
   Test added.

llvm-svn: 121951
2010-12-16 03:12:17 +00:00
Jim Grosbach
bfef309d11 Thumb1 had two patterns for the same load-from-constant-pool instruction.
Canonicalize on tLDRpci and remove tLDRcp.

llvm-svn: 121920
2010-12-15 23:52:36 +00:00
Eric Christopher
347f4c32e8 Don't handle -arm-long-calls in fast isel for now.
llvm-svn: 121919
2010-12-15 23:47:29 +00:00
Evan Cheng
b7ff5a0f20 Teach machine cse to commute instructions.
llvm-svn: 121903
2010-12-15 22:16:21 +00:00
Bob Wilson
fa27a8621c Add Neon VCVT instructions for f32 <-> f16 conversions.
Clang is now providing intrinsics for these and so we need to support them
in the backend.  Radar 8068427.

llvm-svn: 121902
2010-12-15 22:14:12 +00:00
Wesley Peck
0c558b2080 Lower the MBlaze target specific calling conventions for "interrupt_handler"
and "save_volatiles" correctly. This completes the custom calling convention
functionality changes for the MBlaze backend that were started in 121888.

llvm-svn: 121891
2010-12-15 20:27:28 +00:00
Chris Lattner
15090e1eb0 take care of some todos, transforming [us]mul_lohi into
a wider mul if the wider mul is legal.

llvm-svn: 121848
2010-12-15 06:04:19 +00:00
Chris Lattner
c3301e970c merge two tests
llvm-svn: 121847
2010-12-15 05:58:59 +00:00
Evan Cheng
19dc77cec6 Fix a minor bug in two-address pass. It was missing a commute opportunity.
regB = move RCX
regA = op regB, regC
RAX  = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
       movl    %edi, %eax
       shlq    $32, %rcx
       leaq    (%rcx,%rax), %rax
=>
       movl    %edi, %eax
       shlq    $32, %rcx
       orq     %rcx, %rax
rdar://8762995

llvm-svn: 121793
2010-12-14 21:34:53 +00:00
Evan Cheng
c177813755 bfi A, (and B, C1), C2) -> bfi A, B, C2 iff C1 & C2 == C1. rdar://8458663
llvm-svn: 121746
2010-12-14 03:22:07 +00:00
Jason W Kim
1296055841 fix fixme case typo :-)
llvm-svn: 121743
2010-12-14 01:42:38 +00:00
Jason W Kim
0e909c5f9c First cut of ARM/MC/ELF PIC relocations.
Test has fixme, to move to .s -> .o test when AsmParser works better.

llvm-svn: 121732
2010-12-13 23:16:07 +00:00
Bob Wilson
651eaa02b8 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

llvm-svn: 121730
2010-12-13 23:02:37 +00:00
Chris Lattner
8e21a02c19 rename test
llvm-svn: 121697
2010-12-13 08:39:40 +00:00
Chris Lattner
10bd29f1d4 Add a couple dag combines to transform mulhi/mullo into a wider multiply
when the wider type is legal.  This allows us to compile:

define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
	%div = udiv i16 %x, 33
	ret i16 %div
}

into:

test1:                                  # @test1
	movzwl	4(%esp), %eax
	imull	$63551, %eax, %eax      # imm = 0xF83F
	shrl	$21, %eax
	ret

instead of:

test1:                                  # @test1
        movw    $-1985, %ax             # imm = 0xFFFFFFFFFFFFF83F
        mulw    4(%esp)
        andl    $65504, %edx            # imm = 0xFFE0
        movl    %edx, %eax
        shrl    $5, %eax
        ret

Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320

We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.

llvm-svn: 121696
2010-12-13 08:39:01 +00:00
Wesley Peck
b4f896ce90 Missed some ADDI <-> ADDIK conversions in 121649.
llvm-svn: 121652
2010-12-12 22:53:14 +00:00
Evan Cheng
3434575704 (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056
llvm-svn: 121606
2010-12-11 04:11:38 +00:00
Bob Wilson
9375d27460 Add float patterns for Neon vld1-lane/dup and vst1-lane operations.
llvm-svn: 121583
2010-12-10 22:13:32 +00:00
Bob Wilson
d29b38c893 Fix some invalid alignments for Neon vld-dup and vld/st-lane instructions.
Alignments smaller than the total size of the memory being loaded or stored,
unless the alignment is 8 bytes, are not allowed.  Add tests for this, too.

llvm-svn: 121506
2010-12-10 19:37:42 +00:00
Nate Begeman
8b08f5232b Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
llvm-svn: 121439
2010-12-10 00:26:57 +00:00
Jim Grosbach
5fccad84a3 ARM stm/ldm instructions require more than one register in the register list.
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460

llvm-svn: 121391
2010-12-09 18:31:13 +00:00
Bruno Cardoso Lopes
d47180e45e Add ROTR and ROTRV mips32 instructions. Patch by Akira Hatanaka
llvm-svn: 121377
2010-12-09 17:32:30 +00:00
Eric Christopher
a8aaaee379 Rewrite the darwin tlv support to use a chain and return to copying
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.

llvm-svn: 121358
2010-12-09 06:25:53 +00:00
Eric Christopher
d84970ae8b Remove extraneous copy from DAG conversion for darwin tls. This was
popping up at O0 when it wasn't folded and the fast allocator would
complain.

llvm-svn: 121330
2010-12-09 00:27:58 +00:00
Eric Christopher
6a21b40bd6 Move this test to tlv* to make it easier to notice versus linux tls
support.

llvm-svn: 121316
2010-12-08 23:33:23 +00:00
Jason W Kim
c79c5f6e8c ARM/MC/ELF TPsoft is now a proper pseudo inst.
Added test to check bl __aeabi_read_tp gets emitted properly for ELF/ASM
as well as ELF/OBJ (including fixup)

Also added support for ELF::R_ARM_TLS_IE32

llvm-svn: 121312
2010-12-08 23:14:44 +00:00
Evan Cheng
775ead3293 Fix a bad prologue / epilogue codegen bug where the compiler would emit illegal
vpush instructions to save / restore VFP / NEON registers like this:
vpush {d8,d10,d11}
vpop {d8,d10,d11}

vpush and vpop do not allow gaps in the register list.
rdar://8728956

llvm-svn: 121197
2010-12-07 23:08:38 +00:00
Bruno Cardoso Lopes
f0c6e3780d Match a pattern generated by a dag combiner opt where:
(select (load (load tga0)) (load tga1)) => (load (select (load tga0) tga1))

Thanks to Akira for pointing that.

llvm-svn: 121163
2010-12-07 19:00:20 +00:00
Devang Patel
c24048a718 If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0.
llvm-svn: 121059
2010-12-06 22:39:26 +00:00
Wesley Peck
8da34b6c35 Fixed reversed operands for IDIV and CMP instructions in MBlaze backend.
Use BRAD instead of BRD for indirect branches in MBlaze backend.

patch contributed by Jack Whitham!

llvm-svn: 121044
2010-12-06 22:06:49 +00:00
Wesley Peck
6ce9b60811 Fix a 16-bit immediate value detection bug in the MBlaze delay slot filler.
Address more hazards in the MBlaze delay slot filler.

patch contributed by Jack Whitham!

llvm-svn: 121037
2010-12-06 21:11:01 +00:00
Rafael Espindola
dee3062373 Revert previous two patches while I try to find out how to make both
linux and darwin assemblers happy :-(

llvm-svn: 121004
2010-12-06 15:35:15 +00:00
Rafael Espindola
884d58a798 Update test for the extra =.
llvm-svn: 121001
2010-12-06 15:05:36 +00:00
Che-Liang Chiou
9f2af628a6 ptx: add shift instructions
llvm-svn: 120982
2010-12-06 04:00:03 +00:00
Evan Cheng
62c7b5bf76 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
Chris Lattner
6886171792 Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags
result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936
2010-12-05 07:49:54 +00:00
Chris Lattner
364bb0a081 it turns out that when ".with.overflow" intrinsics were added to the X86
backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935
2010-12-05 07:30:36 +00:00
Chris Lattner
183ddd8ed3 fix the rest of the linux miscompares :)
llvm-svn: 120933
2010-12-05 02:08:07 +00:00
Chris Lattner
116580a11c generalize the previous check to handle -1 on either side of the
select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932
2010-12-05 02:00:51 +00:00
Chris Lattner
77a11c6174 relax this to handle linux defaulting to -static.
llvm-svn: 120930
2010-12-05 01:31:13 +00:00
Chris Lattner
342e6ea5f9 Improve an integer select optimization in two ways:
1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax

llvm-svn: 120929
2010-12-05 01:23:24 +00:00
Chris Lattner
0523388d60 merge some tests into select.ll and make them more specific.
llvm-svn: 120928
2010-12-05 01:13:58 +00:00