24929 Commits

Author SHA1 Message Date
David Blaikie
f55abeaf4c Revert "Revert "PR14606: debug info imported_module support""
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996
2013-04-22 06:12:31 +00:00
Jim Grosbach
563983c8a3 Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989
2013-04-21 23:47:41 +00:00
Jim Grosbach
d4db72db61 Tidy up comment grammar.
llvm-svn: 179986
2013-04-21 21:23:01 +00:00
Tim Northover
16aba17024 Remove unused ShouldFoldAtomicFences flag.
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.

llvm-svn: 179940
2013-04-20 12:32:43 +00:00
Tim Northover
a2b533906a Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.
llvm-svn: 179939
2013-04-20 12:32:17 +00:00
Stephen Lin
b8bd232a3d Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).
llvm-svn: 179925
2013-04-20 05:14:40 +00:00
Stephen Lin
ffc445492c Allow tail call opportunity detection through nested and/or multiple iterations of extractelement/insertelement indirection
llvm-svn: 179924
2013-04-20 04:27:51 +00:00
Eli Bendersky
e80691dc0a Simplify the code in FastISel::tryToFoldLoad, add an assertion and fix a comment.
llvm-svn: 179908
2013-04-19 23:26:18 +00:00
Eli Bendersky
90dd3e7dfd Move TryToFoldFastISelLoad to FastISel, where it belongs. In general, I'm
trying to move as much FastISel logic as possible out of the main path in
SelectionDAGISel - intermixing them just adds confusion.

llvm-svn: 179902
2013-04-19 22:29:18 +00:00
Michael Liao
b53d8963ce ArrayRefize getMachineNode(). No functionality change.
llvm-svn: 179901
2013-04-19 22:22:57 +00:00
Jakob Stoklund Olesen
e17c3fde6b Add an MRI::verifyUseLists() function.
This checks the sanity of the register use lists in the MI intermediate
representation.

llvm-svn: 179895
2013-04-19 21:40:57 +00:00
Eli Bendersky
dbeefaa86a Use dbgs() consistently for -debug printouts
llvm-svn: 179894
2013-04-19 21:37:07 +00:00
Eric Christopher
0e89ade8ff Revert "PR14606: debug info imported_module support"
This reverts commit r179836 as it seems to have caused test failures.

llvm-svn: 179840
2013-04-19 07:47:16 +00:00
David Blaikie
88564f3cf7 PR14606: debug info imported_module support
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.

llvm-svn: 179836
2013-04-19 06:57:04 +00:00
Eli Bendersky
6084f45f38 Add some more stats for fast isel vs. SelectionDAG, w.r.t lowering function
arguments in entry BBs.

llvm-svn: 179824
2013-04-19 01:04:40 +00:00
Peter Collingbourne
2f495b93ee Add support for subsections to the ELF assembler. Fixes PR8717.
Differential Revision: http://llvm-reviews.chandlerc.com/D598

llvm-svn: 179725
2013-04-17 21:18:16 +00:00
Andy Gibbs
b23ea72e48 Replace uses of the deprecated std::auto_ptr with OwningPtr.
This is a rework of the broken parts in r179373 which were subsequently reverted in r179374 due to incompatibility with C++98 compilers.  This version should be ok under C++98.

llvm-svn: 179520
2013-04-15 12:06:32 +00:00
Nadav Rotem
0db0690a70 Document the decision to assume that the cost of floats is twice as much as integers.
llvm-svn: 179478
2013-04-14 05:55:18 +00:00
Andrew Trick
1f0bb69b6c MI-Sched: DEBUG formatting.
llvm-svn: 179452
2013-04-13 06:07:49 +00:00
Andrew Trick
be2bccbce9 MI-Sched cleanup. If an instruction has no valid sched class, do not attempt to check for a variant.
llvm-svn: 179451
2013-04-13 06:07:45 +00:00
Andrew Trick
e833e1cd6e MI-Sched: schedule physreg copies.
The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.

llvm-svn: 179449
2013-04-13 06:07:40 +00:00
Nadav Rotem
87a0af6e1b CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
llvm-svn: 179413
2013-04-12 21:15:03 +00:00
Benjamin Kramer
dae0851237 Revert broken pieces of r179373.
You can't copy an OwningPtr, and move semantics aren't available in C++98.

llvm-svn: 179374
2013-04-12 12:13:51 +00:00
Andy Gibbs
95777550a9 Replace uses of the deprecated std::auto_ptr with OwningPtr.
llvm-svn: 179373
2013-04-12 10:56:28 +00:00
Nadav Rotem
c0adc9fd91 Don't disable block layout when forcing block alignment.
llvm-svn: 179355
2013-04-12 01:24:16 +00:00
Nadav Rotem
c3b0f50ac2 Add a flag to align all basic blocks in the function.
When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem.  When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353
2013-04-12 00:48:32 +00:00
Benjamin Kramer
e7c45bc670 Add braces around || in && to pacify GCC.
llvm-svn: 179275
2013-04-11 11:57:01 +00:00
Hal Finkel
95081bff72 Manually remove successors in if conversion when CopyAndPredicateBlock is used
In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is
used because the to-be-predicated block has other predecessors, we need to
explicitly remove the old copied block from the successors list. Normally if
conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges
to cleanup the successors list, but if the predicated block contained an
un-analyzable branch (such as a now-predicated return), then this will fail.

These extra successors were causing a problem on PPC because it was causing
later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in
the code.

llvm-svn: 179227
2013-04-10 22:05:25 +00:00
Andrew Trick
e220323c7f Generalize the PassConfig API and remove addFinalizeRegAlloc().
The target hooks are getting out of hand. What does it mean to run
before or after regalloc anyway? Allowing either Pass* or AnalysisID
pass identification should make it much easier for targets to use the
substitutePass and insertPass APIs, and create less need for badly
named target hooks.

llvm-svn: 179140
2013-04-10 01:06:56 +00:00
Eric Christopher
52ce7189c1 The .dwo section shouldn't contain the unrelocated values (and
therefore not at all) of the pc or statement list. We also don't
need to emit the compilation dir so save so space and time
and don't bother.

Fix up the testcase accordingly and verify that we don't emit
the attributes or the items that they use.

llvm-svn: 179114
2013-04-09 19:23:15 +00:00
Benjamin Kramer
bbae991db6 DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible.
This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106
2013-04-09 17:41:43 +00:00
Eric Christopher
55863befd1 DW_FORM_sec_offset should be a relocation on platforms that use
a relocation across sections. Do this for DW_AT_stmt list in the
skeleton CU and check the relocations in the debug_info section.

Add a FIXME for multiple CUs.

llvm-svn: 178969
2013-04-07 03:43:09 +00:00
Nadav Rotem
c4bd84c1d5 typo
llvm-svn: 178949
2013-04-06 04:24:12 +00:00
Manman Ren
5b22f9fe18 Dwarf: use utostr on CUID to append to SmallString.
We used to do "SmallString += CUID", which is incorrect, since CUID will
be truncated to a char.

rdar://problem/13573833

llvm-svn: 178941
2013-04-06 01:02:38 +00:00
Hal Finkel
3005c299b5 Reapply r178845 with fix - Fix bug in PEI's virtual-register scavenging
This fixes PEI as previously described, but correctly handles the case where
the instruction defining the virtual register to be scavenged is the first in
the block. Arnold provided me with a bugpoint-reduced test case, but even that
seems too large to use as a regression test. If I'm successful in cleaning it
up then I'll commit that as well.

Original commit message:

    This change fixes a bug that I introduced in r178058. After a register is
    scavenged using one of the available spills slots the instruction defining the
    virtual register needs to be moved to after the spill code. The scavenger has
    already processed the defining instruction so that registers killed by that
    instruction are available for definition in that same instruction. Unfortunately,
    after this, the scavenger needs to iterate through the spill code and then
    visit, again, the instruction that defines the now-scavenged register. In order
    to avoid confusion, the register scavenger needs the ability to 'back up'
    through the spill code so that it can again process the instructions in the
    appropriate order. Prior to this fix, once the scavenger reached the
    just-moved instruction, it would assert if it killed any registers because,
    having already processed the instruction, it believed they were undefined.

    Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
    for diagnosing the problem and testing this fix.

llvm-svn: 178919
2013-04-05 22:31:56 +00:00
Bill Wendling
eb108bad50 Use the target options specified on a function to reset the back-end.
During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.

llvm-svn: 178917
2013-04-05 21:52:40 +00:00
Hal Finkel
81c46d0809 Revert r178845 - Fix bug in PEI's virtual-register scavenging
Reverting because this breaks one of the LTO builders. Original commit message:

    This change fixes a bug that I introduced in r178058. After a register is
    scavenged using one of the available spills slots the instruction defining the
    virtual register needs to be moved to after the spill code. The scavenger has
    already processed the defining instruction so that registers killed by that
    instruction are available for definition in that same instruction. Unfortunately,
    after this, the scavenger needs to iterate through the spill code and then
    visit, again, the instruction that defines the now-scavenged register. In order
    to avoid confusion, the register scavenger needs the ability to 'back up'
    through the spill code so that it can again process the instructions in the
    appropriate order. Prior to this fix, once the scavenger reached the
    just-moved instruction, it would assert if it killed any registers because,
    having already processed the instruction, it believed they were undefined.

    Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
    for diagnosing the problem and testing this fix.

llvm-svn: 178916
2013-04-05 21:30:40 +00:00
Hal Finkel
e6f48e4e2f Fix bug in PEI's virtual-register scavenging
This change fixes a bug that I introduced in r178058. After a register is
scavenged using one of the available spills slots the instruction defining the
virtual register needs to be moved to after the spill code. The scavenger has
already processed the defining instruction so that registers killed by that
instruction are available for definition in that same instruction. Unfortunately,
after this, the scavenger needs to iterate through the spill code and then
visit, again, the instruction that defines the now-scavenged register. In order
to avoid confusion, the register scavenger needs the ability to 'back up'
through the spill code so that it can again process the instructions in the
appropriate order. Prior to this fix, once the scavenger reached the
just-moved instruction, it would assert if it killed any registers because,
having already processed the instruction, it believed they were undefined.

Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.

llvm-svn: 178845
2013-04-05 05:01:13 +00:00
Andrew Trick
80e66ce0b4 RegisterPressure heuristics currently require signed comparisons.
llvm-svn: 178823
2013-04-05 00:31:34 +00:00
Andrew Trick
96ce3848d6 Disable DFSResult for ConvergingScheduler.
For now, just save the compile time since the ConvergingScheduler
heuristics don't use this analysis. We'll probably enable it later
after compile-time investigation.

llvm-svn: 178822
2013-04-05 00:31:31 +00:00
Andrew Trick
419d491747 MachineScheduler: format DEBUG output.
I'm getting more serious about tuning and enabling on x86/ARM. Start
by making the trace readable.

llvm-svn: 178821
2013-04-05 00:31:29 +00:00
Arnold Schwaighofer
b977387112 CostModel: Add parameter to instruction cost to further classify operand values
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.

We can efficiently support

for (i = 0 ; i < ; i += 4)
  w[0:3] = v[0:3] << <2, 2, 2, 2>

but not

for (i = 0; i < ; i += 4)
  w[0:3] = v[0:3] << x[0:3]

This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.

Targets can then choose to return a different cost for instructions with such
operand values.

A follow-up commit will test this feature on x86.

radar://13576547

llvm-svn: 178807
2013-04-04 23:26:21 +00:00
Manman Ren
bdcb4464e2 Debug Info: revert 178722 for now.
There is a difference for FORM_ref_addr between DWARF 2 and DWARF 3+.
Since Eric is against guarding DWARF 2 ref_addr with DarwinGDBCompat, we are
still in discussion on how to handle this.

The correct solution is to update our header to say version 4 instead of version
2 and update tool chains as well.

rdar://problem/13559431

llvm-svn: 178806
2013-04-04 23:13:11 +00:00
Adrian Prantl
322f41d095 typo
llvm-svn: 178804
2013-04-04 22:56:49 +00:00
Eli Bendersky
fc186358f2 Formatting
llvm-svn: 178771
2013-04-04 18:03:41 +00:00
Manman Ren
5a15c9ed9f Debug Info: according to DWARF 2, FORM_ref_addr the same size as an address on
the target system.

It was hard-coded to 4 bytes before. I can't get llvm to generate a
ref_addr on a reasonably sized testing case.

rdar://problem/13559431

llvm-svn: 178722
2013-04-04 00:22:54 +00:00
Bill Schmidt
92e26646bc Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC.
For this we need to use a libcall.  Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner.  A test case from the bug report is included.

llvm-svn: 178639
2013-04-03 13:05:44 +00:00
Eric Christopher
14c2067ca1 Fix grammar.
llvm-svn: 178624
2013-04-03 05:29:58 +00:00
Eric Christopher
5590949f29 Remove ZeroOrMore from the option description. We don't need it here.
llvm-svn: 178623
2013-04-03 05:26:07 +00:00
Jakob Stoklund Olesen
aeb69a5481 Allow MachineTraceMetrics to be used when the model has no resources.
It it still possible to extract information from itineraries, for
example.

llvm-svn: 178582
2013-04-02 22:27:45 +00:00