27609 Commits

Author SHA1 Message Date
Matt Arsenault
ca9583a70a AMDGPU/GlobalISel: Fix broken tests
llvm-svn: 353559
2019-02-08 19:59:39 +00:00
Nemanja Ivanovic
92a8c36735 [DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557
2019-02-08 19:50:58 +00:00
Aditya Nandakumar
01e818a97d [GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder
https://reviews.llvm.org/D57932

Add some logging + tests to make sure CSEInfo prints debug output.

reviewed by: arsenm

llvm-svn: 353553
2019-02-08 19:41:13 +00:00
Matt Arsenault
d7047276ec AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548
2019-02-08 19:18:01 +00:00
Simon Pilgrim
eb6a47a462 [TargetLowering] Use ISD::FSHR in expandFixedPointMul
Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720.

llvm-svn: 353546
2019-02-08 18:57:38 +00:00
Simon Pilgrim
478bb90779 [TargetLowering] Add SimplifyDemandedBits funnel shift support
llvm-svn: 353539
2019-02-08 17:19:01 +00:00
Simon Pilgrim
68457c1e52 [X86] Add basic funnel shift demanded bits tests
llvm-svn: 353534
2019-02-08 16:51:16 +00:00
Carl Ritson
494b8ac95a [AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs
Summary:
Prior to GCN3 s_load_dword offsets are in dwords rather than bytes.
Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56496

llvm-svn: 353530
2019-02-08 15:41:11 +00:00
Matt Arsenault
b0a227049f AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2
clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.

llvm-svn: 353526
2019-02-08 15:06:24 +00:00
Matt Arsenault
0f2debb1c2 AMDGPU/GlobalISel: Fix non-power-of-2 implicit_def
llvm-svn: 353522
2019-02-08 14:46:27 +00:00
Petar Avramovic
c98b26d326 [MIPS GlobalISel] Select any extending load and truncating store
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.

Select extending load that does not have specified type of extension
into zero extending load.

Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.

Differential Revision: https://reviews.llvm.org/D57454

llvm-svn: 353520
2019-02-08 14:27:23 +00:00
Matt Arsenault
dc88a2ce35 AMDGPU/GlobalISel: Don't use a copy in addrspacecast lowering
llvm-svn: 353516
2019-02-08 14:16:11 +00:00
Valery Pykhtin
7fe97f8c7c [AMDGPU] Fix DPP combiner
Differential revision: https://reviews.llvm.org/D55444

dpp move with uses and old reg initializer should be in the same BB.
bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity.
Added add, subrev, and, or instructions to the old folding function.
Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

The pass is still disabled by default.

llvm-svn: 353513
2019-02-08 11:59:48 +00:00
Carlos Alberto Enciso
08dc50f2fb [DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353511
2019-02-08 10:57:26 +00:00
Petar Avramovic
56dc218dc1 [MIPS GlobalISel] Select mul
Legalize and select G_MUL for s32 and smaller types for MIPS32.

Differential Revision: https://reviews.llvm.org/D57816

llvm-svn: 353506
2019-02-08 10:11:33 +00:00
Matt Arsenault
a8b4339c2f AMDGPU/GlobalISel: Legalize addrspacecast
Use a placeholder constant for now on targets
that need the load from the queue ptr.

llvm-svn: 353497
2019-02-08 02:40:47 +00:00
Craig Topper
738180cc7f Fix the lowering issue of intrinsics llvm.localaddress on X86
Patch by Yuanke Luo

Reviewers: craig.topper, annita.zhang, smaslov, rnk, wxiao3

Reviewed By: rnk

Subscribers: efriedma, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57501

llvm-svn: 353492
2019-02-08 01:14:12 +00:00
Craig Topper
c782f18835 [X86] Add FPCW as a register and start using it as an implicit use on floating point instructions.
Summary:
FPCW contains the rounding mode control which we manipulate to implement fp to integer conversion by changing the roudning mode, storing the value to the stack, and then changing the rounding mode back. Because we didn't model FPCW and its dependency chain, other instructions could be scheduled into the middle of the sequence.

This patch introduces the register and adds it as an implciit def of FLDCW and implicit use of the FP binary arithmetic instructions and store instructions. There are more instructions that need to be updated, but this is a good start. I believe this fixes at least the reduced test case from PR40529.

Reviewers: RKSimon, spatel, rnk, efriedma, andrew.w.kaylor

Subscribers: dim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57735

llvm-svn: 353489
2019-02-08 00:44:39 +00:00
Eli Friedman
29c0609301 [AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half.  We have
some DAGCombines to try to take advantage of that.  However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.

This issue has apparently existed since the AArch64 backend was merged.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 .

Differential Revision: https://reviews.llvm.org/D57862

llvm-svn: 353486
2019-02-08 00:23:35 +00:00
Petar Jovanovic
3cfcd75453 [mips][micromips] Fix how values in .gcc_except_table are calculated
When a landing pad is calculated in a program that is compiled for micromips
with -fPIC flag, it will point to an even address.
Such an error will cause a segmentation fault, as the instructions in
micromips are aligned on odd addresses. This patch sets the last bit of the
offset where a landing pad is, to 1, which will effectively be an odd
address and point to the instruction exactly.

r344591 fixed this issue for -static compilation.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D57677

llvm-svn: 353480
2019-02-07 22:57:33 +00:00
Nikita Popov
9d7e86a978 [CodeGen] Handle vector UADDO, SADDO, USUBO, SSUBO
This is part of https://bugs.llvm.org/show_bug.cgi?id=40442.

Vector legalization is implemented for the add/sub overflow opcodes.
UMULO/SMULO are also handled as far as legalization is concerned, but
they don't support vector expansion yet (so no tests for them).

The vector result widening implementation is suboptimal, because it
could result in a legalization loop.

Differential Revision: https://reviews.llvm.org/D57639

llvm-svn: 353464
2019-02-07 21:02:22 +00:00
Matt Arsenault
fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Matt Arsenault
d914189a2e AMDGPU/GlobalISel: Restrict g_implicit_def legality
llvm-svn: 353452
2019-02-07 19:10:15 +00:00
Matt Arsenault
d6212f9f1b GlobalISel: Fix artifact combiner constant legality checks for vectors
Since G_CONSTANT is illegal for vectors, this needs to check
what buildConstant will produce for a splat vector.

llvm-svn: 353449
2019-02-07 18:58:28 +00:00
Matt Arsenault
60b33fb6fc AMDGPU/GlobalISel: Don't use g_implicit_def in a few tests
llvm-svn: 353443
2019-02-07 18:33:22 +00:00
Matt Arsenault
c0f7569aab AMDGPU/GlobalISel: Legalize fsqrt
llvm-svn: 353438
2019-02-07 18:14:39 +00:00
Matt Arsenault
93fdec739b AMDGPU/GlobalISel: Legalize some f16 operations
llvm-svn: 353436
2019-02-07 18:03:11 +00:00
Sanjay Patel
2d4b186844 [DAGCombiner] fold add/sub with bool operand based on target's boolean contents
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.

The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.

This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.

Differential Revision: https://reviews.llvm.org/D57401

llvm-svn: 353433
2019-02-07 17:43:34 +00:00
Matt Arsenault
c83b82363c GlobalISel: Implement fewerElementsVector for shifts
Introduce a new function which handles instructions with multiple type
indices, but have the same number of vector elements.

Also legalize v2s16 shifts when applicable.

llvm-svn: 353432
2019-02-07 17:38:00 +00:00
Sanjay Patel
a5c4a5e958 [x86] split more 256/512-bit shuffles in lowering
This is intentionally a small step because it's hard to know exactly 
where we might introduce a conflicting transform with the code that 
tries to form wider shuffles. But I think this is safe - if we have 
a wide shuffle with 2 operands, then we should do better with an 
extract + narrow shuffle.

Differential Revision: https://reviews.llvm.org/D57867

llvm-svn: 353427
2019-02-07 17:10:49 +00:00
Sam Parker
67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Diana Picus
75a04e2a77 [ARM GlobalISel] Support G_ICMP for Thumb2
Mark as legal and use the t2* equivalents of the arm mode instructions,
e.g. t2CMPrr instead of plain CMPrr.

llvm-svn: 353392
2019-02-07 11:05:33 +00:00
Tim Northover
638110a208 AArch64: implement copy for paired GPR registers.
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.

llvm-svn: 353383
2019-02-07 10:35:34 +00:00
Roland Froese
42f58498c5 [PowerPC] Add vector truncate test to prep for D56507 NFC
llvm-svn: 353344
2019-02-06 21:34:44 +00:00
Alina Sbirlea
6cba96ed52 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Craig Topper
1c7ee20819 [X86] Change the CPU on the test case for pr40529.ll to really show the bug. NFC
llvm-svn: 353334
2019-02-06 19:50:59 +00:00
Sanjay Patel
29a710be6a [x86] add tests for horizontal ops (PR38971, PR33758); NFC
llvm-svn: 353332
2019-02-06 19:40:11 +00:00
Jonas Paulsson
b21dde0530 [SystemZ] Improved handling of the @llvm.ctlz intrinsic.
Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.

ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.

Review: Ulrich Weigand
https://reviews.llvm.org/D57710

llvm-svn: 353330
2019-02-06 19:23:31 +00:00
Jonas Paulsson
8cda83a5db [SystemZ] Wait with VGBM selection until after DAGCombine2.
Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.

For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.

The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.

Review: Ulrich Weigand
https://reviews.llvm.org/D57152

llvm-svn: 353325
2019-02-06 18:59:19 +00:00
Tim Northover
474f5d9b55 AArch64: enforce even/odd register pairs for CASP instructions.
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308
2019-02-06 15:26:35 +00:00
Nirav Dave
e5c37958f9 [InlineAsm][X86] Add backend support for X86 flag output parameters.
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.

llvm-svn: 353307
2019-02-06 15:26:29 +00:00
Ulrich Weigand
17a0012687 [SystemZ] Do not return INT_MIN from strcmp/memcmp
The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case.  While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.

This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC).  However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.

llvm-svn: 353304
2019-02-06 15:10:13 +00:00
Sanjay Patel
e84fbb67a1 [x86] vectorize cast ops in lowering to avoid register file transfers
The proposal in D56796 may cross the line because we're trying to avoid vectorization 
transforms in generic DAG combining. So this is an alternate, later, x86-specific 
translation of that patch.

There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P

Differential Revision: https://reviews.llvm.org/D56864

llvm-svn: 353302
2019-02-06 14:59:39 +00:00
Philip Reames
b5bb4a4ec6 [Test] Add codegen tests for unordered and monotonic integer operations
llvm-svn: 353266
2019-02-06 03:19:04 +00:00
Sanjay Patel
997b2aba58 [x86] add tests for extract+sitofp; NFC
llvm-svn: 353249
2019-02-06 00:19:56 +00:00
Craig Topper
4562f420cd [X86] Regenerate tests missed in r353061. NFC
We now print the implicit %st register on these instruction, but since they occur at the end of the line, FileCheck didn't see they were missing.

llvm-svn: 353222
2019-02-05 21:47:42 +00:00
Thomas Lively
315056692d [WebAssembly] Lower memmove to memory.copy
Summary: The lowering is identical to the memcpy lowering.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57727

llvm-svn: 353216
2019-02-05 20:57:40 +00:00
Scott Linder
e2c5847414 [AMDGPU] Consider XOR in waterfall loop as a terminator
Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator.

Differential Revision: https://reviews.llvm.org/D57703

llvm-svn: 353207
2019-02-05 19:50:32 +00:00
Matt Arsenault
a3f9b71c09 AMDGPU: Fix assert on trunc from bitcast of build_vector
The v2i64 argument is lowered to a bitcast of v4i32 build_vector.
This would then attempt to use the i32-element as the source of the
vector truncate. This really would need to collect 2 elements from the
build_vector to produce the intended truncate.

llvm-svn: 353202
2019-02-05 19:23:57 +00:00
Simon Pilgrim
b0afc69435 [X86][SSE] Disable ZERO_EXTEND shuffle combining
rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly.

llvm-svn: 353198
2019-02-05 19:15:48 +00:00