57083 Commits

Author SHA1 Message Date
QingShan Zhang
4bd186c0ff [PowerPC] Exploit the rldicl + rldicl when and with mask
If we are and the constant like 0xFFFFFFC00000, for now, we are using several
instructions to generate this 48bit constant and final an "and". However, we
could exploit it with two rotate instructions.

       MB          ME               MB+63-ME
+----------------------+     +----------------------+
|0000001111111111111000| ->  |0000000001111111111111|
+----------------------+     +----------------------+
 0                    63      0                    63
Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63),
finally, rotate back. Notice that, we need to round it with 64 bit for the
wrapping case.

Reviewed by: ChenZheng, Nemanjai

Differential Revision: https://reviews.llvm.org/D71831
2020-04-17 05:24:00 +00:00
Wouter van Oortmerssen
48139ebc3a [WebAssembly] Add int32 DW_OP_WASM_location variant
This to allow us to add reloctable global indices as a symbol.
Also adds R_WASM_GLOBAL_INDEX_I32 relocation type to support it.

See discussion in https://github.com/WebAssembly/debugging/issues/12
2020-04-16 16:32:17 -07:00
Cameron McInally
1223255c2d [AArch64][SVE] Add DestructiveBinaryImm SQSHLU patterns.
Add DestructiveBinaryImm SQSHLU patterns and tests. These patterns allow the SQSHLU instruction to match with a MOVPRFX.

Differential Revision: https://reviews.llvm.org/D76728
2020-04-16 13:48:08 -05:00
Stefan Pintilie
18b6050324 [PowerPC][Future] Initial support for PC Relative addressing for global values
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.

Differential Revision: https://reviews.llvm.org/D75280
2020-04-16 12:45:22 -05:00
Stanislav Mekhanoshin
2e94a64b57 [AMDGPU] Define 16 bit SGPR subregs
These are needed as a counterpart for VGPR subregs even though
there are no scalar instructions which can operate 16 bit values.
When we are materializing a constant that is done into an SGPR
and that SGPR may/will be copied into a 16 bit VGPR subreg. Such
copy is illegal. There are also similar problems if a source
operand of a 16 bit VALU instruction is an SGPR. In addition
we need to get a register with a lo16 subregister of an SGPR
RC during selection and this fails as well.

All of that makes me believe we need these subregisters as a
syntactic glue.

Differential Revision: https://reviews.llvm.org/D78250
2020-04-16 10:31:39 -07:00
Anna Welker
d736571538 [ARM][MVE] Fix location of optimized gather addresses
Fix for the address optimization for gathers and scatters which would in
some complex cases push out instructions not to the vector loop preheader,
but to other locations as well which lead to a scrambled order and the
compilation failing.
This patch ensures that said instructions are always pushed to the end
of the vector loop preheader.

Differential Revision: https://reviews.llvm.org/D78293
2020-04-16 18:15:28 +01:00
Kang Zhang
513976df2e [PowerPC] Ignore implicit register operands for MCInst
Summary:
When doing the conversion: MachineInst -> MCInst, we should ignore the
implicit operands, it will expose more opportunity for InstiAlias.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D77118
2020-04-16 16:22:43 +00:00
Kazushi (Jam) Marukawa
48d64f5654 [VE] Update logical operation instructions
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all fixed-point arithmetic instructions.
This also corrects bswp operand type.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D78177
2020-04-16 14:35:53 +02:00
Konstantin Schwarz
1a3e89aa2b [MIR] Add comments to INLINEASM immediate flag MachineOperands
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.

Reviewers: SjoerdMeijer, arsenm, efriedma

Reviewed By: arsenm

Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78088
2020-04-16 13:46:14 +02:00
Shengchen Kan
71303b753c [X86] Add interface X86II::isPseudo
Avoid duplicate code in X86MCCodeEmitter, NFCI.
2020-04-16 12:40:17 +08:00
Shengchen Kan
7aaaea5acd [X86][MC][NFC] Code cleanup in X86MCCodeEmitter
Make some function static, move the definitions of functions to a better
place and use C++ style cast, etc.
2020-04-16 11:30:49 +08:00
Shengchen Kan
6c66bb393e [X86][MC][NFC] Refine code in X86MCCodeEmitter
As we mentioned in D78180, merge some if clauses and use CamelCase for
variables, etc.
2020-04-16 10:43:42 +08:00
Shengchen Kan
322ac2e917 [X86][MC][NFC] Reduce the parameters of functions in X86MCCodeEmitter(Part I)
Summary:
The function in X86MCCodeEmitter has too many parameters to make it look
messy, and some parameters are unnecessary. This is the first patch to
reduce their parameters.

The follwing operations are cheap
```
unsigned Opcode = MI.getOpcode();
const MCInstrDesc &Desc = MCII.get(Opcode);
uint64_t TSFlags = Desc.TSFlags;
```
So if we pass a `MCInst`, we don't need to pass `MCInstrDesc`;
if we pass a `MCInstrDesc`, we don't need to pass `TSFlags`.

Reviewers: craig.topper, MaskRay, pengfei

Reviewed By: craig.topper

Subscribers: annita.zhang, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78180
2020-04-16 09:53:45 +08:00
Fangrui Song
7d1ff446b6 [MC] Rename MCSection*::getSectionName() to getName(). NFC
A pending change will merge MCSection*::getName() to MCSection::getName().
2020-04-15 16:48:14 -07:00
Christopher Tetreault
85247c1e89 [SVE] Remove calls to getBitWidth from x86
Reviewers: efriedma, RKSimon, sdesmalen

Reviewed By: RKSimon

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77901
2020-04-15 15:48:48 -07:00
Chris Bowler
bee6c234ed [AIX][PowerPC] Implement caller byval arguments in stack memory
Differential Revision: https://reviews.llvm.org/D77578
2020-04-15 17:57:31 -04:00
Francesco Petrogalli
89680f25e8 [llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC]
Summary:
The renaming is necessary to make the naming scheme uniform with other
gather/scatter load/stores SVE intrinsics.

The naming of variables and functions have been adapted to make it
explicit whether we are dealing with a scalar offset (which is
unscaled) or an index (which is scaled according to the data type of
the lanes of the vector).

Reviewers: andwar, sdesmalen, rengolin

Reviewed By: andwar

Subscribers: tschuett, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77839
2020-04-15 21:49:16 +01:00
Nemanja Ivanovic
c196e2ca48 [PowerPC] Clear the set of symbols that need to be updated in MCTargetStreamer
We have added code to correct the .localentry values on assignments. However, we
never clear the set so presumably it will still contain the (now dangling)
MCSymbol pointers across a call to finish() and reset() in the streamer.

This is based on my speculation that it is the reason we are getting
segmentation faults mentioned in https://bugs.llvm.org/show_bug.cgi?id=45366

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45366

Differential revision: https://reviews.llvm.org/D78196
2020-04-15 15:42:02 -05:00
Craig Topper
8dfb9627b7 [X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.
This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.

This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.

Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.

Differential Revision: https://reviews.llvm.org/D76212
2020-04-15 12:17:18 -07:00
Matt Arsenault
588bd7be36 AMDGPU/GlobalISel: Work around a selector crash
Ideally types without a corresponding register class wouldn't reach
here, but we're currently missing some (in particular a 192-bit class
is missing).
2020-04-15 14:38:50 -04:00
Craig Topper
a916e81927 [X86] Various improvements to our vector splitting helpers for lowering. NFC
-Consistently name the functions as split*
-Add a helper for doing the two extractSubvector calls and determining the size of the split
-Use getSplitDestVTs to get the result type for the split node.
-Move the binary and unary helper to one place in the file near the extractSubvector functions. Left the VSETCC one near LowerVSETCC since that's its only caller.
-Remove the 256/512 wrappers that just had asserts. I don't think they provided a lot of value and now with the routines called split* the call sites are more obvious what they do.
-Make the unary routine support different source and dest types to support D76212.
-Add some weaker asserts into the helpers to make up for losing the very specific asserts from the 256/512 wrappers.

Differential Revision: https://reviews.llvm.org/D78176
2020-04-15 10:57:53 -07:00
Benjamin Kramer
316b49d373 Pass shufflevector indices as int instead of unsigned.
No functionality change intended.
2020-04-15 15:52:49 +02:00
Victor Campos
d85b3877dc [CodeGen][ARM] Error when writing to specific reserved registers in inline asm
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.

Example:
  int foo() {
    register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
    __asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
    return a;
  }

In contrast, GCC issues an error in the same scenario.

This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.

This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76848
2020-04-15 14:40:42 +01:00
Benjamin Kramer
cc035d475f Upgrade users of 'new ShuffleVectorInst' to pass indices as an int array
No functionality change intended.
2020-04-15 14:29:43 +02:00
Jonas Paulsson
036242b868 [SystemZ] Bugfix in adjustSubwordCmp()
adjustSubwordCmp() should not optimize a load of an i1 value. This is
achieved by checking that the size and store-size of the MemoryVT are the
same.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45511.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D78187
2020-04-15 12:58:39 +02:00
Benjamin Kramer
6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Sam Parker
dd8153b757 [ARM][MVE] Tail predicate VML[A|S]LDAV
Make the non-exchanging versions of the multiply add/sub instructions
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D77648
2020-04-15 11:34:39 +01:00
Sameer Sahasrabuddhe
8c11bc0cd0 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198

This restores commit 2ada8e2525dd2653f30c8696a27162a3b1647d66.

Originally reverted with commit 44e09b59b869a91bf47d76e8bc569d9ee91ad145.
2020-04-15 15:05:51 +05:30
Kazushi (Jam) Marukawa
7a7f223042 [VE] Update integer arithmetic instructions
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules.  This time update all fixed-point arithmetic instructions.
This also corrects smax/smin code generations.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77856
2020-04-15 09:47:51 +02:00
Sameer Sahasrabuddhe
44e09b59b8 Revert "Introduce fix-irreducible pass"
This reverts commit 2ada8e2525dd2653f30c8696a27162a3b1647d66.

Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe
2ada8e2525 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198
2020-04-15 11:29:19 +05:30
Matt Arsenault
cc149172da AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS
This wasn't covered by existing tablegen patterns, but also suffers
the same issues as G_FNEG. Workaround them by manually selecting, like
G_FNEG.
2020-04-14 22:05:22 -04:00
Mircea Trofin
447e2c3067 [llvm][NFC][CallSite] Remove Implementation uses of CallSite
Reviewers: dblaikie, davidxl, craig.topper

Subscribers: arsenm, dschuff, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78142
2020-04-14 14:49:47 -07:00
Christopher Tetreault
e68f1f2d43 [SVE] Remove calls to getBitWidth from Hexagon
Reviewers: efriedma, sdesmalen, kparzysz

Reviewed By: kparzysz

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77899
2020-04-14 11:09:49 -07:00
Christopher Tetreault
0badd8f613 [SVE] Remove calls to getBitWidth from ARM
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77904
2020-04-14 10:56:38 -07:00
Christopher Tetreault
05a079895c [SVE] Remove calls to getBitWidth from AArch64
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77905
2020-04-14 10:26:37 -07:00
Andrzej Warzynski
8fc7e6dcd8 [AArch64][SVE] Refine node definitions for ff & nf loads/stores (NFC)
Summary:
Only first-faulting and non-faulting loads read/update the FFR register
and hence only the corresponding SDNodes should be decorated with
`SDNPOptInGlue` and `SDNPOutGlue`. This patch:
  * removes SDNPOptInGlue from regular loads stores (FFR is not read)
  * adds SDNPOutGlue to first-faulting and non-faulting loads (FFR is
    both read and updated)

Differential Revision: https://reviews.llvm.org/D77724
2020-04-14 18:17:15 +01:00
Pierre-vh
13eb890139 [Target][ARM] Fix VPT Block Pass miscompilation
The pass was incorrectly reverting back to a "T" when something wrote
to VPR inside a "E" block. This is not the correct behaviour, the
predicate should stay the same.

Differential Revision: https://reviews.llvm.org/D77798
2020-04-14 15:16:27 +01:00
Pierre-vh
4563024356 [Target][ARM] Adding MVE VPT Optimisation Pass
Differential Revision: https://reviews.llvm.org/D76709
2020-04-14 15:16:27 +01:00
Simon Pilgrim
426f37584e [TTI][X86] Add X86TTIImpl::getScalarizationOverhead implementation.
This is a currently just a wrapper to the base type, I'll be adding ISD::BUILD_VECTOR costs in a future patch.
2020-04-14 12:58:19 +01:00
Georgii Rymar
1647ff6e27 [ADT/STLExtras.h] - Add llvm::is_sorted wrapper and update callers.
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.

Differential revision: https://reviews.llvm.org/D78016
2020-04-14 14:11:02 +03:00
Kerry McLaughlin
36c76de678 [AArch64][SVE] Add a pass for SVE intrinsic optimisations
Summary:
Creates the SVEIntrinsicOpts pass. In this patch, the pass tries
to remove unnecessary reinterpret intrinsics which convert to
and from svbool_t (llvm.aarch64.sve.convert.[to|from].svbool)

For example, the reinterprets below are redundant:

  %1 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %a)
  %2 = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)

The pass also looks for ptest intrinsics and phi instructions where
the operands are being needlessly converted to and from svbool_t.

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: mgorny, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76078
2020-04-14 10:41:49 +01:00
Peter Smith
31c8e11896 [MC][ARM] Emit R_ARM_BASE_PREL for _GLOBAL_OFFSET_TABLE_ expressions
The _GLOBAL_OFFSET_TABLE_ in SysVr4 ELF is conventionally the base of the
.got or .got.prel sections. Expressions such as _GLOBAL_OFFSET_TABLE_
- (.L1 +8) are used in assembler code to calculate offsets into the .got.
At present MC outputs a R_ARM_REL32 with respect to the
_GLOBAL_OFFSET_TABLE_ symbol, whereas gas outputs a R_ARM_BASE_PREL
relocation with respect to the _GLOBAL_OFFSET_TABLE_ symbol. While both are
correct the R_ARM_REL32 depends on the value of the _GLOBAL_OFFSET_TABLE_
symbol, wheras te R_ARM_BASE_PREL relocation is idependent of the symbol.
The R_ARM_BASE_PREL is therefore slightly more robust to linker's that may
not follow the conventional placement of _GLOBAL_OFFSET_TABLE_; for example
LLD for some time defined _GLOBAL_OFFSET_TABLE_ to 0.

Differential Revision: https://reviews.llvm.org/D46319
2020-04-14 10:13:21 +01:00
Kazushi (Jam) Marukawa
37db04dda6 [VE] Remove unnecessary iz pattern
Summary:
This iz pattern is a special pattern of im pattern.  This im pattern
has been supported by https://reviews.llvm.org/D77769, so removing
iz pattern as a continuous patch.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77770
2020-04-14 08:56:55 +02:00
Mircea Trofin
4aae4e3f48 [llvm][NFC] CallSite removal from inliner-related files
Summary: This removes CallSite from inliner files. Some dependencies where thus affected.

Reviewers: dblaikie, davidxl, craig.topper

Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, aheejin, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77991
2020-04-13 21:28:58 -07:00
Craig Topper
2f60fbce6c [X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f.
Still not great and we could probably codegen this better, but
11 was clearly ridiculous.
2020-04-13 21:09:43 -07:00
Craig Topper
535a566a01 [X86] Split AVX512 getCastInstrCost into tables that require useAVX512Regs() and those that just operate on 256 or smaller vectors.
Use useAVX512Regs() to skip lookups instead of using type legalization
action.
2020-04-13 21:09:42 -07:00
Craig Topper
071c64d68d [X86] Add a more accurate truncate cost for v8i64->v8i8 2020-04-13 21:09:41 -07:00
Fangrui Song
d1a677cd33 [VE] Adapt D77995 CallSite removal 2020-04-13 19:54:49 -07:00
Matt Arsenault
f48fe2c36e GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS
This was assuming a scalarizing unmerge, and would fail assert if the
unmerge was to smaller vector types.
2020-04-13 22:03:05 -04:00