25772 Commits

Author SHA1 Message Date
Jeremy Morse
66ac86b58d [DebugInfo][DAG] Process FrameIndex dbg.values unconditionally
A FrameIndex should be valid throughout a block regardless of what instructions
get selected in that block -- therefore we shouldn't harness dbg.values that
refer to FrameIndexes to an SDNode. There are numerous codegen reasons why
an SDNode never appears or doesn't become a location that a DBG_VALUE can
refer to. None of them actually affect the variable location.

Therefore, before any other tests to encode dbg_values in a SelectionDAG,
identify FrameIndex operands and encode them unattached to any SDNode.

Differential Revision: https://reviews.llvm.org/D57328

llvm-svn: 352467
2019-01-29 09:40:05 +00:00
Jonas Paulsson
5ed4d4638f [CodeGenPrepare] Handle all debug calls in dupRetToEnableTailCallOpts()
This patch makes sure that a debug value that is after the bitcast in
dupRetToEnableTailCallOpts() is also skipped.

The reduced test case is from SPEC-2006 on SystemZ.

Review: Vedant Kumar, Wolfgang Pieb
https://reviews.llvm.org/D57050

llvm-svn: 352462
2019-01-29 09:03:35 +00:00
Craig Topper
390ac61b93 Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This did not cause the buildbot failure it was previously reverted for.

Original commit message:

I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

llvm-svn: 352433
2019-01-28 21:38:47 +00:00
Jessica Paquette
2d73ecd0a3 [GlobalISel][AArch64] Add legalization for G_FLOG
This adds support for legalizing G_FLOG into a RTLib call.

It adds a legalizer test, and updates the existing floating point tests.

https://reviews.llvm.org/D57347

llvm-svn: 352429
2019-01-28 21:27:23 +00:00
Jessica Paquette
c49428a97d [GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.

https://reviews.llvm.org/D57341

llvm-svn: 352418
2019-01-28 19:53:14 +00:00
Jessica Paquette
2e35dc5185 [GlobalISel] Add ISel support for @llvm.lifetime.start and @llvm.lifetime.end
This adds ISel support for lifetime markers in opt levels above O0.

It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.

It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.

(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)

https://reviews.llvm.org/D57187

llvm-svn: 352410
2019-01-28 19:22:29 +00:00
Nikita Popov
8e1a464e6a [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.

Differential Revision: https://reviews.llvm.org/D56869

llvm-svn: 352409
2019-01-28 19:19:09 +00:00
Jessica Paquette
7db82d7257 [GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.

https://reviews.llvm.org/D57197
3/3

llvm-svn: 352402
2019-01-28 18:34:18 +00:00
Jessica Paquette
296f19b3d9 [GlobalISel][AArch64] Add IRTranslator support for G_FCOS and G_FSIN
This adds IRTranslator support for the G_FCOS and G_FSIN generic instructions.

https://reviews.llvm.org/D57197
2/3

llvm-svn: 352401
2019-01-28 18:34:17 +00:00
Michael Berg
685d5f675e [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target control
llvm-svn: 352396
2019-01-28 18:03:08 +00:00
Petar Avramovic
7cecadb9af [MIPS GlobalISel] Select sub
Lower G_USUBO and G_USUBE. Add narrowScalar for G_SUB.
Legalize and select G_SUB for MIPS 32.

Differential Revision: https://reviews.llvm.org/D53416

llvm-svn: 352351
2019-01-28 12:10:17 +00:00
Jeremy Morse
8ebffb4b82 [DebugInfo][DAG] Avoid re-ordering of DBG_VALUEs
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.

Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.

Differential revision: https://reviews.llvm.org/D57163

llvm-svn: 352350
2019-01-28 12:08:31 +00:00
Matt Arsenault
cfca2a7adf GlobalISel: Don't reduce elements for atomic load/store
This is invalid for the same reason as in the narrowScalar handling
for load.

llvm-svn: 352334
2019-01-27 22:36:24 +00:00
Matt Arsenault
816c9b3e25 GlobalISel: Factor fewerElementVectors into separate functions
llvm-svn: 352332
2019-01-27 21:53:09 +00:00
Matt Arsenault
fdfb7d78f1 GlobalISel: Verify load/store has a pointer input
I expected this to be automatically verified, but it seems
nothing uses that the type index was declared as a "ptype"

llvm-svn: 352319
2019-01-27 15:57:23 +00:00
Amara Emerson
711bbdc894 Re-apply "r351584: "GlobalISel: Verify g_zextload and g_sextload""
I reverted it originally due to a bot failing. The underlying bug has been fixed
as of r352311.

llvm-svn: 352312
2019-01-27 11:34:41 +00:00
Amara Emerson
bf43004ff1 [AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions.
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.

This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.

llvm-svn: 352311
2019-01-27 10:56:20 +00:00
Matt Arsenault
590c67507a GlobalISel: Fix typo in assert messages
llvm-svn: 352301
2019-01-27 00:53:54 +00:00
Matt Arsenault
211e89d4dd GlobalISel: Implement narrowScalar for mul
llvm-svn: 352300
2019-01-27 00:52:51 +00:00
Matt Arsenault
2e5f900849 GlobalISel: fewerElementsVector for intrinsic_trunc/intrinsic_round
llvm-svn: 352298
2019-01-27 00:12:21 +00:00
Amara Emerson
203760ab9c [GlobalISel][IRTranslator] Fix crash on translation of fneg.
When the fneg IR instruction was added the code to do translation wasn't
tested, and tried to get an invalid operand.

llvm-svn: 352296
2019-01-26 23:47:09 +00:00
Matt Arsenault
26a6c74fbe AMDGPU/GlobalISel: Legalize more bit ops
llvm-svn: 352295
2019-01-26 23:47:07 +00:00
Craig Topper
58e6b37e62 Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This might be breaking an lldb windows buildbot.

llvm-svn: 352268
2019-01-26 02:44:58 +00:00
Craig Topper
b1d3457c03 [SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.

This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.

On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57186

llvm-svn: 352255
2019-01-26 00:26:37 +00:00
Alexey Lapshin
31f47b8194 [NFC] Test commit : fix typo.
llvm-svn: 352248
2019-01-25 21:59:53 +00:00
Guozhi Wei
81f3fd4bf8 [MBP] Don't move bottom block before header if it can't reduce taken branches
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:

-->OldTop<-
|    .    |
|    .    |
|    .    |
---Pred   |
     |    |
    BB-----

Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.

Differential Revision: https://reviews.llvm.org/D57067

llvm-svn: 352236
2019-01-25 19:45:13 +00:00
Vedant Kumar
13ef84fced [MC] Teach the MachO object writer about N_FUNC_COLD
N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker
to order a symbol towards the end of its section, to improve locality.

Example:

```
void a1() {}
__attribute__((cold)) void a2() {}
void a3() {}
int main() {
  a1();
  a2();
  a3();
  return 0;
}
```

A linker that supports N_FUNC_COLD will order _a2 to the end of the text
section. From `nm -njU` output, we see:

```
_a1
_a3
_main
_a2
```

Differential Revision: https://reviews.llvm.org/D57190

llvm-svn: 352227
2019-01-25 18:30:22 +00:00
Tom Weaver
4db70d9695 [TEST][COMMIT] - fix comment typo in AsmPrinter/DwarfDebug.cpp - NFC
llvm-svn: 352214
2019-01-25 16:29:35 +00:00
Simon Pilgrim
cdf58092e4 Fix gcc -Wparentheses warning. NFCI.
llvm-svn: 352191
2019-01-25 11:34:58 +00:00
Matt Arsenault
3e08b772b3 AMDGPU/GlobalISel: Scalarize add/sub
llvm-svn: 352167
2019-01-25 04:53:57 +00:00
Matt Arsenault
e6cebd0d69 GlobalISel: fewerElementsVector for more cast types
llvm-svn: 352166
2019-01-25 04:37:33 +00:00
Matt Arsenault
95fd95cfe0 GlobalISel: fewerElementsVector for a few more trivial ops
llvm-svn: 352165
2019-01-25 04:03:38 +00:00
Matt Arsenault
5d622fbcc1 AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mul
llvm-svn: 352162
2019-01-25 03:23:04 +00:00
Matt Arsenault
1b1e685f10 GlobalISel: Support fewerElementsVector for icmp/fcmp
Also legalize 64-bit compares for AMDGPU

llvm-svn: 352157
2019-01-25 02:59:34 +00:00
Matt Arsenault
ca676343a9 GlobalISel: Implement fewerElementsVector for extensions
llvm-svn: 352155
2019-01-25 02:36:32 +00:00
Matt Arsenault
990f507704 GlobalISel: Add convenience mutatations to scalarize
llvm-svn: 352143
2019-01-25 00:51:00 +00:00
Matt Arsenault
6bab7ab11e RegBankSelect: Fix use after free in r352123
llvm-svn: 352130
2019-01-24 23:42:01 +00:00
Aditya Nandakumar
3ba0d94bce [GISel]: Change how CSE is enabled by default for each pass
https://reviews.llvm.org/D57178

Now add a hook in TargetPassConfig to query if CSE needs to be
enabled. By default this hook returns false only for O0 opt level but
this can be overridden by the target.
As a consequence of the default of enabled for non O0, a few tests
needed to be updated to not use CSE (by passing in -O0) to the run
line.

reviewed by: arsenm

llvm-svn: 352126
2019-01-24 23:11:25 +00:00
Matt Arsenault
baa5d2e69c RegBankSelect: Support some more complex part mappings
llvm-svn: 352123
2019-01-24 22:47:04 +00:00
Jessica Paquette
245047dfe8 [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.

To do this, this patch...

- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors

It also adds

- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES

And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.

https://reviews.llvm.org/D56682

llvm-svn: 352113
2019-01-24 22:00:41 +00:00
James Y Knight
2c36240a82 Fix emission of _fltused for MSVC.
It should be emitted when any floating-point operations (including
calls) are present in the object, not just when calls to printf/scanf
with floating point args are made.

The difference caused by this is very subtle: in static (/MT) builds,
on x86-32, in a program that uses floating point but doesn't print it,
the default x87 rounding mode may not be set properly upon
initialization.

This commit also removes the walk of the types pointed to by pointer
arguments in calls. (To assist in opaque pointer types migration --
eventually the pointee type won't be available.)

That latter implies that it will no longer consider a call like
`scanf("%f", &floatvar)` as sufficient to emit _fltused on its
own. And without _fltused, `scanf("%f")` will abort with error R6002. This
new behavior is unlikely to bite anyone in practice (you'd have to
read a float, and do nothing with it!), and also, is consistent with
MSVC.

Differential Revision: https://reviews.llvm.org/D56548

llvm-svn: 352076
2019-01-24 18:34:00 +00:00
Nirav Dave
58e9833e98 [SelectionDAGBuilder] Simplify HasSideEffect calculation. NFC.
llvm-svn: 352067
2019-01-24 17:56:03 +00:00
Nirav Dave
b41a198472 [InlineAsm] Don't calculate registers for inline asm memory operands. NFCI.
llvm-svn: 352066
2019-01-24 17:47:18 +00:00
Simon Pilgrim
2f018de6a3 [TargetLowering] Rename getExpandedFixedPointMultiplication to expandFixedPointMul. NFCI.
Match the (much shorter) name used in various legalization methods.

llvm-svn: 352056
2019-01-24 15:46:54 +00:00
Nirav Dave
bd069f424f [SelectionDAGBuilder] Fuse inline asm input operand loops passes. NFCI.
llvm-svn: 352053
2019-01-24 15:15:32 +00:00
Craig Topper
1e718429c1 [X86] Update SelectionDAGDumper to print the extension type and expanding flag for masked loads. Add truncating and compressing for masked stores.
llvm-svn: 352029
2019-01-24 07:51:34 +00:00
David Blaikie
7b585673d1 DebugInfo: Use assembly label arithmetic for address pool size for easier reading/editing
Recommits 350048, 350050 That broke buildbots because of some typos in
the test case.

llvm-svn: 352019
2019-01-24 03:27:57 +00:00
Reid Kleckner
e80799e6af [ADT] Notify ilist traits about in-list transfers
Summary:
Previously no client of ilist traits has needed to know about transfers
of nodes within the same list, so as an optimization, ilist doesn't call
transferNodesFromList in that case. However, now there are clients that
want to use ilist traits to cache instruction ordering information to
optimize dominance queries of instructions in the same basic block.
This change updates the existing ilist traits users to detect in-list
transfers and do nothing in that case.

After this change, we can start caching instruction ordering information
in LLVM IR data structures. There are two main ways to do that:
- by putting an order integer into the Instruction class
- by maintaining order integers in a hash table on BasicBlock

I plan to implement and measure both, but I wanted to commit this change
first to enable other out of tree ilist clients to implement this
optimization as well.

Reviewers: lattner, hfinkel, chandlerc

Subscribers: hiraditya, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57120

llvm-svn: 351992
2019-01-23 22:59:52 +00:00
Andrea Di Biagio
d768d35515 [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit.
This patch adds a new ReadAdvance definition named ReadInt2Fpu.
ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by
data transfers from the integer unit to the floating point unit.
ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all
x86 models excluding BtVer2. That means, this patch is only a functional change
for the Jaguar cpu model only.

Tablegen definitions for instructions (V)PINSR* have been updated to account for
the new ReadInt2Fpu. That read is mapped to the the GPR input operand.
On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch,
that extra delay was added to the opcode latency. In practice, the insert opcode
only executes for 1cy. Most of the actual latency is actually contributed by the
so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR*
latency is defined by expression f+1, where f is defined as a forwarding delay
from the integer unit to the fpu.

When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC
(only when flag -print-schedule is speified), we now need to account for any
extra forwarding delays. We do this by checking if scheduling classes declare
any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td:
"A negative advance effectively increases latency, which may be used for
cross-domain stalls". When computing the instruction latency for the purpose of
our scheduling tests, we now add any extra delay to the formula. This avoids
regressing existing codegen and mca schedule tests. It comes with the cost of an
extra (but very simple) hook in MCSchedModel.

Differential Revision: https://reviews.llvm.org/D57056

llvm-svn: 351965
2019-01-23 16:35:07 +00:00
Sam Parker
9a2a89d58f [DAGCombine] Enable more pre-indexed stores
The current check in CombineToPreIndexedLoadStore is too
conversative, preventing a pre-indexed store when the base pointer
is a predecessor of the value being stored. Instead, we should check
the pointer operand of the store.

Differential Revision: https://reviews.llvm.org/D56719

llvm-svn: 351933
2019-01-23 09:11:49 +00:00