30920 Commits

Author SHA1 Message Date
David Zarzycki
7653ff398d [X86] Enable AVX512BW for memcmp()
llvm-svn: 373845
2019-10-06 10:25:52 +00:00
Matt Arsenault
c0ec72d4f8 AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics
llvm-svn: 373840
2019-10-06 01:37:38 +00:00
Matt Arsenault
bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault
a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault
69c65a8609 AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics
This wasn't updated for the immarg handling change.

llvm-svn: 373837
2019-10-06 01:37:34 +00:00
Craig Topper
2decdf42b9 [FastISel] Copy the inline assembly dialect to the INLINEASM instruction.
Fixes PR43575.

llvm-svn: 373836
2019-10-05 23:21:17 +00:00
Simon Pilgrim
8815be04ec [X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.

This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.

Differential Revision: https://reviews.llvm.org/D68226

llvm-svn: 373834
2019-10-05 20:49:34 +00:00
David Bolvansky
41c934acaf [SelectionDAG] Add tests for LKK algorithm
Added some tests testing urem and srem operations with a constant divisor.

Patch by TG908 (Tim Gymnich)

Differential Revision: https://reviews.llvm.org/D68421

llvm-svn: 373830
2019-10-05 14:29:25 +00:00
Philip Reames
d5a4dad206 Fix a *nasty* miscompile in experimental unordered atomic lowering
This is an omission in rL371441.  Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.

Included test case is how I spotted this.  We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with.  I'm sure it showed up a bunch of other ways as well.

Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode.  Finding this with testing would have been "unpleasant".  

llvm-svn: 373814
2019-10-05 00:32:10 +00:00
Philip Reames
9fe5d730c7 [Test] Add a test case fo a missed oppurtunity in implicit null checking
llvm-svn: 373813
2019-10-04 23:46:26 +00:00
Reid Kleckner
67cfa79c01 Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
This reverts r371177 (git commit f879c6875563c0a8cd838f1e13b14dd33558f1f8)

It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.

See the PR for a test case to use when relanding.

llvm-svn: 373805
2019-10-04 22:24:21 +00:00
Jessica Paquette
784892c964 [MachineOutliner] Disable outlining from noreturn functions
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.

Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.

Add machine-outliner-noreturn.mir to test this.

llvm-svn: 373791
2019-10-04 21:24:12 +00:00
Eli Friedman
23ae13d51f [ScheduleDAG] When a node is cloned, add an edge between the nodes.
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node.  Make sure this assumption actually holds.

Fixes a "Node emitted out of order - early" assertion on the testcase.

This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.

Differential Revision: https://reviews.llvm.org/D68068

llvm-svn: 373782
2019-10-04 19:51:40 +00:00
Craig Topper
87aa59a0c7 [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.

This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.

Removes over 10K bytes from the isel table.

Differential Revision: https://reviews.llvm.org/D68446

llvm-svn: 373766
2019-10-04 18:02:46 +00:00
Craig Topper
074fa390d2 [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC

Differential Revision: https://reviews.llvm.org/D68432

llvm-svn: 373765
2019-10-04 17:53:18 +00:00
Kevin P. Neal
68b8052121 [FPEnv] Strict FP tests should use the requisite function attributes.
A set of function attributes is required in any function that uses constrained
floating point intrinsics. None of our tests use these attributes.

This patch fixes this.

These tests have been tested against the IR verifier changes in D68233.

Reviewed by:	andrew.w.kaylor, cameron.mcinally, uweigand
Approved by:	andrew.w.kaylor
Differential Revision:	https://reviews.llvm.org/D67925

llvm-svn: 373761
2019-10-04 17:03:46 +00:00
Tim Northover
a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
Matt Arsenault
d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault
412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Matt Arsenault
be9521acaa AMDGPU/GlobalISel: Support wave32 waterfall loops
llvm-svn: 373714
2019-10-04 08:35:35 +00:00
David Zarzycki
03b216d854 [X86] Enable inline memcmp() to use AVX512
llvm-svn: 373706
2019-10-04 07:42:34 +00:00
Shiva Chen
ff55e2e047 [RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.

    add     sp,sp,-2032
    sw      ra,2028(sp)
    sw      s0,2024(sp)
    sw      s1,2020(sp)
    sw      s3,2012(sp)
    sw      s4,2008(sp)
    add     sp,sp,-64

Differential Revision: https://reviews.llvm.org/D68011

llvm-svn: 373688
2019-10-04 02:00:57 +00:00
Sanjay Patel
288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Philip Reames
82cb5bc302 [Tests] Add a unordered atomic load combine test
llvm-svn: 373659
2019-10-03 20:28:59 +00:00
Philip Reames
65d63ac05a [Test] Fix inconsistency in alignment in test case
The IR was using a fixed 8 byte alignment, but the MIR portion was using native alignment.  Since the test doesn't appear to be deliberately testing overalignment, just make the IR match the MIR.

llvm-svn: 373658
2019-10-03 20:24:18 +00:00
Jinsong Ji
230cf9a360 [AArch64][SVE] Move the testcase into CodeGen dir
https://reviews.llvm.org/rL373600 added an AArch64 testcase in top dir
which should be moved to Codegen dir.

llvm-svn: 373657
2019-10-03 20:21:23 +00:00
Jinsong Ji
4a6881eabc [PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.

The testcase update also show that we can save some unecessary fmr as
well.

Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai

Reviewed By: #powerpc, nemanjai

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68112

llvm-svn: 373652
2019-10-03 19:36:42 +00:00
Craig Topper
185ee6ec7c [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.

With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.

Diffrential Revision: https://reviews.llvm.org/D68374

llvm-svn: 373645
2019-10-03 18:34:42 +00:00
Matt Arsenault
ed77b27441 AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT
llvm-svn: 373639
2019-10-03 17:59:03 +00:00
Matt Arsenault
233ff982c7 AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.

llvm-svn: 373638
2019-10-03 17:55:27 +00:00
Matt Arsenault
56271fe180 AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.

llvm-svn: 373637
2019-10-03 17:50:32 +00:00
Matt Arsenault
9256183994 AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization
llvm-svn: 373636
2019-10-03 17:50:31 +00:00
Matt Arsenault
3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
James Molloy
9972c992eb [ModuloSchedule] removeBranch() *before* creating the trip count condition
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.

This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.

Fixes expensive-checks build.

llvm-svn: 373629
2019-10-03 17:10:32 +00:00
Yonghong Song
02ac75092d [BPF] Handle offset reloc endpoint ending in the middle of chain properly
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
  struct t1 { int c; };
  struct s1 { struct t1 b; };
  struct r1 { struct s1 a; };
  #define _(x) __builtin_preserve_access_index(x)
  void test1(void *p1, void *p2, void *p3);
  void test(struct r1 *arg) {
    struct s1 *ps = _(&arg->a);
    struct t1 *pt = _(&arg->a.b);
    int *pi = _(&arg->a.b.c);
    test1(ps, pt, pi);
  }

The IR looks like:
  %0 = llvm.preserve.struct.access(base, ...)
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
  %0 = base + special_global_variable
  // llvm.preserve.struct.access(base, ...) is delisted
  // from the instruction stream.
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.

This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.

Differential Revision: https://reviews.llvm.org/D68389

llvm-svn: 373621
2019-10-03 16:30:29 +00:00
Sanjay Patel
38c265fe26 [MSP430] add tests for unwanted shift codegen; NFC (PR43542)
llvm-svn: 373607
2019-10-03 14:54:03 +00:00
Simon Atanasyan
afe7197f13 [mips] Use llvm-readobj -A flag in test cases. NFC
llvm-svn: 373589
2019-10-03 12:08:04 +00:00
Sander de Smalen
4f99b6f0fe [AArch64] Static (de)allocation of SVE stack objects.
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.

The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.

SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.

The layout looks as follows:

     +-------------+
     | stack arg   |   
     +-------------+
     | Callee Saves|
     |   X29, X30  |       (if available)
     |-------------| <- FP (if available)
     |     :       |   
     |  SVE area   |   
     |     :       |   
     +-------------+
     |/////////////| alignment gap.
     |     :       |   
     | Stack objs  |
     |     :       |   
     +-------------+ <- SP after call and frame-setup

SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.

Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61437

llvm-svn: 373585
2019-10-03 11:33:50 +00:00
Craig Topper
3a6950d3f0 [X86] Add test case for v8i64->v8i8 truncate with avx512 and prefer-vector-width/min-legal-vector-width=256. NFC
With vpmovqb, we should be able to do better here until we get
AVX512VBMI on Cannonlake/Icelake.

llvm-svn: 373569
2019-10-03 06:18:45 +00:00
Matt Arsenault
1c135a39aa AMDGPU/GlobalISel: Expand G_BITCAST legality
llvm-svn: 373567
2019-10-03 05:46:08 +00:00
Craig Topper
eb420aa379 [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same.
This improves broadcast load folding of i64 elements on 32-bit
targets where i64 isn't legal.

Previously we had to represent these as vXf64 vbroadcast_loads and
a bitcast to vXi64. But we didn't have any isel patterns
looking for that.

This also allows us to remove or simplify some isel patterns that
were looking for bitcasted vbroadcast_loads.

llvm-svn: 373566
2019-10-03 05:30:02 +00:00
Craig Topper
f849f41469 [X86] Add broadcast load folding patterns to NoVLX VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns.
More fixes for PR36191.

llvm-svn: 373560
2019-10-03 03:16:27 +00:00
Stanislav Mekhanoshin
1384c3a5b8 [AMDGPU] Fix illegal agpr use by VALU
When SIFixSGPRCopies attempts to fix an illegal copy from vector to
scalar register it calls moveToVALU(). A copy from an agpr to sgpr
becomes a copy from agpr to agpr, which may result in the illegal
register class at a use of this copy.

Solution is to copy it always into a vgpr. This may result in a
subsequent copy into an agpr if that is what really needed, however
should not happen too often and likely will be folded later.

The opposite situation may not happen because an sgpr is always
illegal where agpr is legal, so such user instructions may not
exist.

Differential Revision: https://reviews.llvm.org/D68358

llvm-svn: 373544
2019-10-02 23:23:46 +00:00
Craig Topper
f5bda7fe24 [X86] Add test cases for suboptimal vselect+setcc splitting.
If the vselect result type needs to be split, it will try to
also try to split the condition if it happens to be a setcc.

With avx512 where k-registers are legal, its probably better
to just use a kshift to split the mask register.

llvm-svn: 373536
2019-10-02 22:35:03 +00:00
Yi-Hong Lyu
c7be067974 [PowerPC] Fix SH field overflow issue
Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store
rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow and
fails assertion in assembly printing stage.

Differential Revision: https://reviews.llvm.org/D66991

llvm-svn: 373519
2019-10-02 20:25:16 +00:00
Craig Topper
74c7d6be28 [X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef
The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified.

Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening.

This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR.

Differential Revision: https://reviews.llvm.org/D68311

llvm-svn: 373495
2019-10-02 17:47:09 +00:00
Thomas Lively
5b74c39d72 [WebAssembly] Error when using wasm64 for ISel
Summary:
64-bit WebAssembly (wasm64) is not specified and not supported in the
WebAssembly backend. We do have support for it in clang, however, and
we would like to keep that support because we expect wasm64 to be
specified and supported in the future. For now add an error when
trying to use wasm64 from the backend to minimize user confusion from
unexplained crashes.

Reviewers: aheejin, dschuff, sunfish

Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68254

llvm-svn: 373493
2019-10-02 17:34:44 +00:00
Piotr Sobczak
265e94e657 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491
2019-10-02 17:22:36 +00:00
Hans Wennborg
9330005a54 Reapply r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.

> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373477
2019-10-02 14:35:06 +00:00
Kerry McLaughlin
822b298958 [AArch64][SVE] Implement int_aarch64_sve_cnt intrinsic
Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021

Reviewers: c-rhodes, sdesmalen, rovka

Reviewed By: c-rhodes

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68023

llvm-svn: 373468
2019-10-02 13:09:54 +00:00