26750 Commits

Author SHA1 Message Date
Jessica Paquette
962b3ae659 [MachineOutliner] Outline functions by order of benefit
Mostly NFC, only change is the order of outlined function names.

Loop over the outlined functions instead of walking the candidate list.

This is a bit easier to understand. It's far more natural to create a function,
then replace all of its occurrences with calls than the other way around.

The functions outlined after this do not change, but their names will be
decided by their benefit. E.g, OUTLINED_FUNCTION_0 will now always be the
most beneficial function, rather than the first one seen.

This makes it easier to enforce an ordering on the outlined functions. So,
this also adds a test to make sure that the ordering works as expected.

llvm-svn: 348414
2018-12-05 21:36:04 +00:00
Krzysztof Parzyszek
8eb394d764 [Hexagon] Add intrinsics for Hexagon V66
llvm-svn: 348413
2018-12-05 21:14:51 +00:00
Krzysztof Parzyszek
545a68ca4b [Hexagon] Add instruction definitions for Hexagon V66
llvm-svn: 348411
2018-12-05 21:01:07 +00:00
Simon Pilgrim
c10590f6f9 [X86][SSE] Fix a copy+paste typo that was folding the sext/zext of partial vectors
llvm-svn: 348403
2018-12-05 19:32:19 +00:00
Matt Arsenault
b3e14de487 AMDGPU: Fix using old address spaces in some tests
llvm-svn: 348385
2018-12-05 17:34:59 +00:00
Sanjay Patel
33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Andrea Di Biagio
3998bebc12 [X86] Add test case to show missed opportunity to combine a concat_vector into a scalar_to_vector. NFC
This is a test for D55274.

llvm-svn: 348380
2018-12-05 16:23:27 +00:00
Chandler Carruth
71c14a36a2 [SLH] Fix a nasty bug in SLH.
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.

The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.

The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.

llvm-svn: 348374
2018-12-05 15:42:11 +00:00
Chandler Carruth
e3ea164659 [SLH] Regenerate tests with --no_x86_scrub_rip to restore the higher
fidelity checking of RIP-based references to basic blocks and other
labels.

These labels are super important for SLH tests so we should keep them
readable in the test cases.

llvm-svn: 348373
2018-12-05 15:41:13 +00:00
Valery Pykhtin
5b4db77b13 [AMDGPU]: Turn on the DPP combiner by default
Differential revision: https://reviews.llvm.org/D55314

llvm-svn: 348371
2018-12-05 15:21:17 +00:00
Simon Pilgrim
180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Diana Picus
8a1b4f57c9 [ARM GlobalISel] Implement call lowering for Thumb2
The only things that are different from arm are:
* different opcodes for calls and returns
* Thumb calls take predicate operands

llvm-svn: 348347
2018-12-05 10:35:28 +00:00
Saleem Abdulrasool
efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Craig Topper
3991089816 [X86] Add narrow vector test cases to vector-reduce* tests. Add copies of the tests with -x86-experimental-vector-widening-legalization
llvm-svn: 348334
2018-12-05 06:29:44 +00:00
Craig Topper
6934202dc0 [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.

This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.

The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.

Differential Revision: https://reviews.llvm.org/D55102

llvm-svn: 348330
2018-12-05 03:41:26 +00:00
Amara Emerson
8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Stefan Pintilie
46f840f286 [PowerPC] Make no-PIC default to match GCC - LLVM
Change the default for PowerPC LE to -fno-PIC.

Differential Revision: https://reviews.llvm.org/D53383

llvm-svn: 348298
2018-12-04 20:14:57 +00:00
Matt Arsenault
b17241b12d Move llc-start-stop-instance to x86
Avoid bot failures where the host pass
setup might not have 2 dead-mi-elimination runs

llvm-svn: 348290
2018-12-04 18:19:08 +00:00
Matt Arsenault
43153024ab MIR: Add method to stop after specific runs of passes
Currently if you use -{start,stop}-{before,after}, it picks
the first instance with the matching pass name. If you run
the same pass multiple times, there's no way to distinguish them.

Allow specifying a run index wih ,N to specify which you mean.

llvm-svn: 348285
2018-12-04 17:45:12 +00:00
Simon Pilgrim
07843640d5 [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK
Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.

llvm-svn: 348282
2018-12-04 16:52:32 +00:00
Ilya Biryukov
449a7f0dbb Revert "Adapt gcov to changes in CFE."
This reverts commit r348203.
Reason: this produces absolute paths in .gcno files, breaking us
internally as we rely on them being consistent with the filenames passed
in the command line.

Also reverts r348157 and r348155 to account for revert of r348154 in
clang repository.

llvm-svn: 348279
2018-12-04 16:30:31 +00:00
Simon Pilgrim
e82c3dab12 [X86][SSE] Add MOVMSK demandedbits/elts tests
llvm-svn: 348277
2018-12-04 16:01:25 +00:00
Clement Courbet
7925d58eae [X86][NFC] Add more constant-size memcmp tests.
llvm-svn: 348257
2018-12-04 12:35:51 +00:00
Simon Pilgrim
0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim
666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjin Sijaric
dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette
bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Krzysztof Parzyszek
44c1f81b27 [Hexagon] Switch to auto-generated intrinsic definitions and patterns
llvm-svn: 348206
2018-12-03 22:40:36 +00:00
Sanjay Patel
d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Jessica Paquette
2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Craig Topper
5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper
e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Adrian Prantl
0f873eb80a Update Diagnostic handling for changes in CFE.
The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.

https://reviews.llvm.org/D55085

llvm-svn: 348155
2018-12-03 17:55:29 +00:00
Simon Pilgrim
fb39916048 Fix line endings. NFCI.
llvm-svn: 348146
2018-12-03 14:55:09 +00:00
Ron Lieberman
16de4fd2eb [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882

llvm-svn: 348132
2018-12-03 13:04:54 +00:00
Tim Northover
5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Petr Pavlu
d336c4eb61 [GlobalISel] Fix test irtranslator-stackprotect-check.ll
Fix for commit r347862. Use correct AArch64 triple in test
CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll.

llvm-svn: 348111
2018-12-03 09:28:28 +00:00
Sjoerd Meijer
5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Kang Zhang
51986417f9 [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 348109
2018-12-03 03:32:57 +00:00
Craig Topper
959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Sanjay Patel
b205606d3e [SelectionDAG] fold constant with undef vector per element
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef. 

But the most practical improvement is likely as shown in the tests here - 
for FP, we were overconstraining undef lanes to NaN, and that can prevent 
vector simplifications/narrowing (see D51553).

llvm-svn: 348090
2018-12-02 13:48:42 +00:00
Craig Topper
4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper
ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Craig Topper
eff43f6ae3 [X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch.
llvm-svn: 348082
2018-12-01 21:53:08 +00:00
Jessica Paquette
9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Jessica Paquette
adcc410f65 Replace w16/w17 in machine-outliner.mir with w11/w12
These registers should not be used here, since they are interprocedural
scratch registers in AArch64.

llvm-svn: 348080
2018-12-01 21:23:58 +00:00
Craig Topper
f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Graham Sellers
ba559ac058 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
2018-12-01 12:27:53 +00:00
Simon Pilgrim
e017ed3245 [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

llvm-svn: 348073
2018-12-01 12:08:55 +00:00
Craig Topper
2d6324c3cb [X86] Remove stale FIXME from test case. NFC
This was fixed in r346581. I just forgot to remove it.

llvm-svn: 348069
2018-12-01 07:45:36 +00:00