24447 Commits

Author SHA1 Message Date
Craig Topper
85906cf041 [X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.
llvm-svn: 332198
2018-05-13 18:03:59 +00:00
Dimitry Andric
a39c409619 Follow-up to rL332176 by adding a test case for PR37264.
Noticed by Simon Pilgrim.

llvm-svn: 332197
2018-05-13 14:32:23 +00:00
Matt Arsenault
dfb88dfe30 AMDGPU: Make undef legal for v2i16/v2f16
This is apparently necessary to stop undef from being
turned into a build_vector of 0s.

llvm-svn: 332195
2018-05-13 10:04:38 +00:00
Puyan Lotfi
380a6f55ff [NFC] MIR-Canon: switching to a stable string sorting of instructions.
llvm-svn: 332191
2018-05-13 06:07:20 +00:00
Craig Topper
38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper
28b85caea8 [X86] Remove some unused CHECK lines from tests.
llvm-svn: 332188
2018-05-13 00:58:23 +00:00
Craig Topper
df3a9cedff [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.
llvm-svn: 332187
2018-05-13 00:29:40 +00:00
Craig Topper
38ad7ddabc [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time.
llvm-svn: 332186
2018-05-12 23:14:39 +00:00
Craig Topper
a288f241cd [X86] Remove some unused masked conversion intrinsics that can be replaced with an older intrinsic and a select.
This is what clang already uses.

llvm-svn: 332170
2018-05-12 02:34:28 +00:00
Stanislav Mekhanoshin
7012c246c1 [AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.

Differential Revision: https://reviews.llvm.org/D46781

llvm-svn: 332166
2018-05-12 01:41:56 +00:00
Tom Stellard
655fdd3f82 AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46153

llvm-svn: 332154
2018-05-11 23:12:49 +00:00
Changpeng Fang
f094885a9e AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.
Summary:
  We have no logic to promote alloca to vector for an AddrSpaceCast instruction.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D45993

llvm-svn: 332147
2018-05-11 22:17:57 +00:00
Craig Topper
a17d627abb [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang.
llvm-svn: 332146
2018-05-11 21:59:34 +00:00
Yaxun Liu
deba150c27 [AMDGPU] Fix compilation failure when IR contains comdat
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.

The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.

It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.

Since HIP uses comdat, this bug caused failures for HIP.

Differential Revision: https://reviews.llvm.org/D46770

llvm-svn: 332137
2018-05-11 20:40:14 +00:00
Vedant Kumar
99d5c072f0 [DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).

Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.

ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.

This was suggested by Adrian in https://reviews.llvm.org/D45995.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46216

llvm-svn: 332119
2018-05-11 18:40:10 +00:00
Vedant Kumar
fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Simon Pilgrim
661ae7778d [X86][BtVer2] Model ymm move as double pumped instructions
We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions

llvm-svn: 332109
2018-05-11 17:38:36 +00:00
Alex Bradbury
bca0c3cdb6 [RISCV] Support .option rvc and norvc assembler directives
These directives allow the 'C' (compressed) extension to be enabled/disabled 
within a single file.

Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng

llvm-svn: 332107
2018-05-11 17:30:28 +00:00
Geoff Berry
60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Sanjoy Das
82105e2a7d Use iteration instead of recursion in CFIInserter
Summary: This recursive step can overflow the stack.

Reviewers: djokov, petarj

Subscribers: mcrosier, jlebar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D46671

llvm-svn: 332101
2018-05-11 15:54:46 +00:00
Simon Pilgrim
22dd72b995 [X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width
Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions

llvm-svn: 332094
2018-05-11 14:30:54 +00:00
Tom Stellard
dcc95e9385 AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45883

llvm-svn: 332082
2018-05-11 05:44:16 +00:00
Craig Topper
1ee19ae126 [X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.

So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.

We may be able to drop some of the old patterns, but I leave that for a future patch.

llvm-svn: 332049
2018-05-10 21:49:16 +00:00
Tom Stellard
1e0edad4bb AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard
1dc90204bf AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Sam Clegg
a5908009cd [WebAsembly] Update default triple in test files to wasm32-unknown-unkown.
Summary: The final -wasm component has been the default for some time now.

Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D46342

llvm-svn: 332007
2018-05-10 17:49:11 +00:00
Simon Pilgrim
38ac0e9c6b [X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.

llvm-svn: 331999
2018-05-10 17:06:09 +00:00
Sanjay Patel
b4e7893ba8 [x86] fix fmaxnum/fminnum with nnan
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).

Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.

Note: it's possible that other targets have similar problems
as seen here. 

Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403

The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.

llvm-svn: 331992
2018-05-10 15:40:49 +00:00
Sanjay Patel
ec9b6be26f [x86] fix test names; NFC
llvm-svn: 331989
2018-05-10 14:58:47 +00:00
Sanjay Patel
be97134955 [x86] add tests for maxnum/minnum intrinsics with nnan; NFC
Clang 6.0 was updated to create these intrinsics rather than 
libcalls or fcmp/select, but the backend wasn't prepared to 
handle that optimally. 

This bug is not the primary reason for PR37403:
https://bugs.llvm.org/show_bug.cgi?id=37403
...but it's probably more important for x86 perf.

llvm-svn: 331988
2018-05-10 14:48:42 +00:00
Nirav Dave
a5ad417589 [DAG] Avoid using deleted node in rebuildSetCC
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.

Fixes PR36602.

Reviewers: dbabokin, RKSimon, eli.friedman, davide

Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D46404

llvm-svn: 331985
2018-05-10 14:28:54 +00:00
Gabor Buella
a832b22bae [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D46539

llvm-svn: 331961
2018-05-10 07:26:05 +00:00
Artem Belevich
2f348ea1c7 [NVPTX] Added a feature to use short pointers for const/local/shared AS.
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.

The feature is disabled by default.

Differential Revision: https://reviews.llvm.org/D46147

llvm-svn: 331941
2018-05-09 23:46:19 +00:00
Roman Tereshin
6d26638c90 [GlobalISel][Legalizer] Widening the second src op of shifts bug fix
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.

G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).

Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.

Reviewers: bogner, javed.absar, aivchenk, rovka

Reviewed By: bogner

Subscribers: igorb, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46413

llvm-svn: 331926
2018-05-09 21:43:30 +00:00
Farhana Aleen
e24f3ff8de [AMDGPU] Support horizontal vectorization of min/max.
Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D46604

llvm-svn: 331920
2018-05-09 21:18:34 +00:00
Matt Arsenault
eac81b2448 AMDGPU: Ignore any_extend in mul24 combine
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.

llvm-svn: 331919
2018-05-09 21:11:35 +00:00
Krzysztof Parzyszek
cff73a2118 [Hexagon] Add patterns for vector shift-and-accumulate
llvm-svn: 331918
2018-05-09 21:10:41 +00:00
Matt Arsenault
74fd7600d2 AMDGPU: Handle partial shift reduction for variable shifts
If the variable shift amount has known bits, we can still reduce
the shift.

llvm-svn: 331917
2018-05-09 20:52:54 +00:00
Matt Arsenault
b143d9a5ea AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.

llvm-svn: 331916
2018-05-09 20:52:43 +00:00
Matt Arsenault
762d498808 AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

llvm-svn: 331909
2018-05-09 18:37:39 +00:00
Roman Tereshin
d5fa9fde58 Reapplying r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
The commit was a suspect for clang-cmake-aarch64-global-isel and
    clang-cmake-aarch64-quick bot failures, proved to be innocent.

llvm-svn: 331898
2018-05-09 17:28:18 +00:00
Craig Topper
176ec8506f [DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops.
Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification.

This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations.

llvm-svn: 331896
2018-05-09 17:14:27 +00:00
Simon Pilgrim
ab34aa8294 [X86] Cleanup WriteFStore/WriteVecStore schedules
MOVNTPD/MOVNTPS should be WriteFStore

Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....

llvm-svn: 331864
2018-05-09 11:01:16 +00:00
Craig Topper
b9a473d186 [X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or all zeros vXi1 vector.
llvm-svn: 331847
2018-05-09 06:07:20 +00:00
Daniel Sanders
618437459c Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen
2c864551df [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Roman Tereshin
27bba4495a Revert r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit

llvm-svn: 331839
2018-05-09 01:43:12 +00:00
Roman Tereshin
25cbfe680e [GlobalISel][Legalizer] More concise and faster widenScalar, NFC
Refactoring LegalizerHelper::widenScalar member function reducing its
size by approximately a factor of 2 and (hopefuly) making it more
straightforward and regular by introducing widenScalarSrc and
widenScalarDst helper methods.

The new widenScalar* methods mutate the instructions in place instead
of recreating them from scratch and removing the originals. The
compile time implications of this were measured on sqlite3
amalgamation, targeting AArch64 in -O0:

LegalizerHelper::widenScalar: > 25% faster
Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster

Also adding MachineOperand::setCImm and refactoring out
MachineIRBuilder::recordInsertion methods to make the change possible.

Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm

Reviewed By: aditya_nandakumar

Subscribers: wdng, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46414

llvm-svn: 331819
2018-05-08 22:53:09 +00:00
Daniel Sanders
d24dcdd1f7 [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Jessica Paquette
ec37c640dd Revert "[X86][CET] Shadow stack fix for setjmp/longjmp"
This reverts commit 30962eca38ef02666ebcdded72a94f2cd0292d68.

This commit has been causing test asan failures on a build bot.

http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/45108/

Original commit: https://reviews.llvm.org/D46181

llvm-svn: 331813
2018-05-08 22:00:57 +00:00