54189 Commits

Author SHA1 Message Date
David Zarzycki
7653ff398d [X86] Enable AVX512BW for memcmp()
llvm-svn: 373845
2019-10-06 10:25:52 +00:00
Matt Arsenault
e59296a051 AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets
llvm-svn: 373842
2019-10-06 01:41:22 +00:00
Matt Arsenault
786a3953ba AMDGPU/GlobalISel: RegBankSelect mul24 intrinsics
llvm-svn: 373841
2019-10-06 01:37:39 +00:00
Matt Arsenault
c0ec72d4f8 AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics
llvm-svn: 373840
2019-10-06 01:37:38 +00:00
Matt Arsenault
bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault
a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault
69c65a8609 AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics
This wasn't updated for the immarg handling change.

llvm-svn: 373837
2019-10-06 01:37:34 +00:00
Simon Pilgrim
8815be04ec [X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.

This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.

Differential Revision: https://reviews.llvm.org/D68226

llvm-svn: 373834
2019-10-05 20:49:34 +00:00
Simon Pilgrim
9ecacb0d54 [X86] lowerShuffleAsLanePermuteAndRepeatedMask - variable renames. NFCI.
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.

llvm-svn: 373832
2019-10-05 16:08:30 +00:00
Ana Pazos
ea835f5ce8 [RISCV] Added missing ImmLeaf predicates
simm9_lsb0 and simm12_lsb0 operand types were missing predicates.

llvm-svn: 373812
2019-10-04 23:42:07 +00:00
Huihui Zhang
da9e252491 [NFC] Add { } to silence compiler warning [-Wmissing-braces].
../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces]
      return addMappingFromTable<1>(MI, MRI, { 0 }, Table);
                                               ^
                                               {}

llvm-svn: 373784
2019-10-04 20:04:34 +00:00
Craig Topper
87aa59a0c7 [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.

This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.

Removes over 10K bytes from the isel table.

Differential Revision: https://reviews.llvm.org/D68446

llvm-svn: 373766
2019-10-04 18:02:46 +00:00
Craig Topper
074fa390d2 [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC

Differential Revision: https://reviews.llvm.org/D68432

llvm-svn: 373765
2019-10-04 17:53:18 +00:00
Dmitry Preobrazhensky
434d59250e [AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa opcodes
See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68349

llvm-svn: 373745
2019-10-04 13:04:17 +00:00
Dmitry Preobrazhensky
9bd763679f [AMDGPU][MC][GFX10] Enabled decoding of 'null' operand
See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68348

llvm-svn: 373740
2019-10-04 12:38:36 +00:00
Tim Northover
a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
Dmitry Preobrazhensky
94d040706d [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions
See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68347

llvm-svn: 373736
2019-10-04 12:10:22 +00:00
Matt Arsenault
d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault
412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Matt Arsenault
be9521acaa AMDGPU/GlobalISel: Support wave32 waterfall loops
llvm-svn: 373714
2019-10-04 08:35:35 +00:00
David Zarzycki
03b216d854 [X86] Enable inline memcmp() to use AVX512
llvm-svn: 373706
2019-10-04 07:42:34 +00:00
Piotr Sobczak
165e469145 [AMDGPU][SILoadStoreOptimizer] NFC: Refactor code
Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.

Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.

This patch also makes getSubRegIdxs() more concise.

Reviewers: nhaehnle, arsenm, tstellar

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68384

llvm-svn: 373699
2019-10-04 07:09:40 +00:00
Shiva Chen
ff55e2e047 [RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.

    add     sp,sp,-2032
    sw      ra,2028(sp)
    sw      s0,2024(sp)
    sw      s1,2020(sp)
    sw      s3,2012(sp)
    sw      s4,2008(sp)
    add     sp,sp,-64

Differential Revision: https://reviews.llvm.org/D68011

llvm-svn: 373688
2019-10-04 02:00:57 +00:00
Nick Desaulniers
ede784ff5a [AArch64InstPrinter] prefer bfi to bfc for < armv8.2-a
Summary:
Fixes pr/42576.

Link: https://github.com/ClangBuiltLinux/linux/issues/697

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68356

llvm-svn: 373655
2019-10-03 20:10:02 +00:00
Jinsong Ji
4a6881eabc [PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.

The testcase update also show that we can save some unecessary fmr as
well.

Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai

Reviewed By: #powerpc, nemanjai

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68112

llvm-svn: 373652
2019-10-03 19:36:42 +00:00
Jordan Rupprecht
b2b43c8576 [NFC] Fix unused variable in release builds
llvm-svn: 373646
2019-10-03 18:35:44 +00:00
Craig Topper
185ee6ec7c [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.

With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.

Diffrential Revision: https://reviews.llvm.org/D68374

llvm-svn: 373645
2019-10-03 18:34:42 +00:00
Simon Pilgrim
eb8d85e5db [X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.

This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner.

This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.

llvm-svn: 373641
2019-10-03 18:13:50 +00:00
Matt Arsenault
ed77b27441 AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT
llvm-svn: 373639
2019-10-03 17:59:03 +00:00
Matt Arsenault
233ff982c7 AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.

llvm-svn: 373638
2019-10-03 17:55:27 +00:00
Matt Arsenault
56271fe180 AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.

llvm-svn: 373637
2019-10-03 17:50:32 +00:00
Matt Arsenault
3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
Tom Stellard
e6f5171305 AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.

In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.

The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block.  This will help to reduce the time this pass
spends re-optimizing instructions.

In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.

This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65961

llvm-svn: 373630
2019-10-03 17:11:47 +00:00
Yonghong Song
02ac75092d [BPF] Handle offset reloc endpoint ending in the middle of chain properly
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
  struct t1 { int c; };
  struct s1 { struct t1 b; };
  struct r1 { struct s1 a; };
  #define _(x) __builtin_preserve_access_index(x)
  void test1(void *p1, void *p2, void *p3);
  void test(struct r1 *arg) {
    struct s1 *ps = _(&arg->a);
    struct t1 *pt = _(&arg->a.b);
    int *pi = _(&arg->a.b.c);
    test1(ps, pt, pi);
  }

The IR looks like:
  %0 = llvm.preserve.struct.access(base, ...)
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
  %0 = base + special_global_variable
  // llvm.preserve.struct.access(base, ...) is delisted
  // from the instruction stream.
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.

This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.

Differential Revision: https://reviews.llvm.org/D68389

llvm-svn: 373621
2019-10-03 16:30:29 +00:00
Guillaume Chatelet
d1f23bd225 Revert "[Alignment][NFC] Allow constexpr Align"
This reverts commit b3af236fb5fc6e50fcc1b54d868f0bff557f3fb1.

llvm-svn: 373619
2019-10-03 15:53:50 +00:00
Edward Jones
f5177a7db4 [RISCV] Add obsolete aliases of fscsr, frcsr (fssr, frsr)
These old aliases were renamed, but are still used by some projects (eg newlib).

Differential Revision: https://reviews.llvm.org/D68392

llvm-svn: 373618
2019-10-03 15:47:28 +00:00
Ehsan Amiri
f21dbcca90 [AArch64][SVE] Adding patterns for floating point SVE add instructions.
llvm-svn: 373600
2019-10-03 14:19:55 +00:00
Simon Atanasyan
f6551ddfce [mips] Push fixup_Mips_LO16 fixup for jialc and jic instructions
llvm-svn: 373591
2019-10-03 12:08:26 +00:00
Sander de Smalen
4f99b6f0fe [AArch64] Static (de)allocation of SVE stack objects.
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.

The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.

SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.

The layout looks as follows:

     +-------------+
     | stack arg   |   
     +-------------+
     | Callee Saves|
     |   X29, X30  |       (if available)
     |-------------| <- FP (if available)
     |     :       |   
     |  SVE area   |   
     |     :       |   
     +-------------+
     |/////////////| alignment gap.
     |     :       |   
     | Stack objs  |
     |     :       |   
     +-------------+ <- SP after call and frame-setup

SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.

Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61437

llvm-svn: 373585
2019-10-03 11:33:50 +00:00
Simon Pilgrim
44bc1186e4 Fix uninitialized variable warning. NFCI
llvm-svn: 373583
2019-10-03 11:22:00 +00:00
Simon Pilgrim
b327dc1966 Fix uninitialized variable warning. NFCI
llvm-svn: 373582
2019-10-03 11:21:46 +00:00
Guillaume Chatelet
b3af236fb5 [Alignment][NFC] Allow constexpr Align
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68329

llvm-svn: 373580
2019-10-03 10:53:10 +00:00
Matt Arsenault
efb5a24ab0 AMDGPU/GlobalISel: Don't re-get subtarget
It's already available in the class.

llvm-svn: 373568
2019-10-03 05:46:10 +00:00
Matt Arsenault
1c135a39aa AMDGPU/GlobalISel: Expand G_BITCAST legality
llvm-svn: 373567
2019-10-03 05:46:08 +00:00
Craig Topper
eb420aa379 [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same.
This improves broadcast load folding of i64 elements on 32-bit
targets where i64 isn't legal.

Previously we had to represent these as vXf64 vbroadcast_loads and
a bitcast to vXi64. But we didn't have any isel patterns
looking for that.

This also allows us to remove or simplify some isel patterns that
were looking for bitcasted vbroadcast_loads.

llvm-svn: 373566
2019-10-03 05:30:02 +00:00
Craig Topper
f849f41469 [X86] Add broadcast load folding patterns to NoVLX VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns.
More fixes for PR36191.

llvm-svn: 373560
2019-10-03 03:16:27 +00:00
Craig Topper
241c72ddd9 [X86] Remove a couple redundant isel patterns that look to have been copy/pasted from right above them. NFC
llvm-svn: 373559
2019-10-03 03:16:21 +00:00
Daniel Sanders
603e98948e [gicombiner] Fix windows issue where single quotes in the command are passed through to tablegen
llvm-svn: 373545
2019-10-02 23:38:06 +00:00
Stanislav Mekhanoshin
1384c3a5b8 [AMDGPU] Fix illegal agpr use by VALU
When SIFixSGPRCopies attempts to fix an illegal copy from vector to
scalar register it calls moveToVALU(). A copy from an agpr to sgpr
becomes a copy from agpr to agpr, which may result in the illegal
register class at a use of this copy.

Solution is to copy it always into a vgpr. This may result in a
subsequent copy into an agpr if that is what really needed, however
should not happen too often and likely will be folded later.

The opposite situation may not happen because an sgpr is always
illegal where agpr is legal, so such user instructions may not
exist.

Differential Revision: https://reviews.llvm.org/D68358

llvm-svn: 373544
2019-10-02 23:23:46 +00:00
Daniel Sanders
505d7f3105 [gicombiner] Add the boring boilerplate for the declarative combiner
Summary:
This is the first of a series of patches extracted from a much bigger WIP
patch. It merely establishes the tblgen pass and the way empty combiner
helpers are declared and integrated into a combiner info.

The tablegen pass takes a -combiners option to select the combiner helper
that will be generated. This can be given multiple values to generate
multiple combiner helpers at once. Doing so helps to minimize parsing
overhead.

The reason for creating a GlobalISel subdirectory in utils/TableGen is that
there will be quite a lot of non-pass files (~15) by the time the patch
series is done.

Reviewers: volkan

Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68286

llvm-svn: 373527
2019-10-02 21:13:07 +00:00