63504 Commits

Author SHA1 Message Date
Tianqing Wang
bec4a8157d [X86] Update MachineLoopInfo in CMOV conversion.
If a CMOV is in a loop and is converted to branches, CMOV conversion wouldn't
add newly created basic blocks to loop info. Since the candidates is collected
based on loops, instructions in these basic blocks will be ignored.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D104623
2021-07-21 10:53:46 +08:00
Albion Fung
2fd1520247 [PowerPC] Implemented mtmsr, mfspr, mtspr Builtins
Implemented builtins for mtmsr, mfspr, mtspr on PowerPC;
the patch is intended for XL Compatibility.

Differential revision: https://reviews.llvm.org/D106130
2021-07-20 17:51:00 -05:00
Jon Roelofs
75187aa352 [AArch64][GlobalISel] Legalize ctpop for v2s64, v2s32, v4s32, v4s16, v8s16
https://llvm.godbolt.org/z/nTTK6M5qe

Differential revision: https://reviews.llvm.org/D106388
2021-07-20 15:37:56 -07:00
Albion Fung
3434ac9e39 [PowerPC] Store, load, move from and to registers related builtins
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.

Differential revision: https://reviews.llvm.org/D105946
2021-07-20 15:46:14 -05:00
Jessica Paquette
8f54ebd51d [AArch64][GlobalISel] Select llvm.aarch64.neon.st2 intrinsics
Add manual selection code similar to the code in AArch64ISelDAGToDAG, and add
`createTuple` helpers similar to the code there as well.

This accounted for around 111 fallbacks while building clang for AArch64 with
GlobalISel.

This also should make it easy to add selection code for other store
intrinsics.

As a minor cleanup, this uses `createQTuple` in the other place where we use
REG_SEQUENCE.

Differential Revision: https://reviews.llvm.org/D106332
2021-07-20 13:23:46 -07:00
Eli Friedman
664a1fd9f0 [AArch64] Use the CMP_SWAP_128 variants added in 843c6140.
Accidentally forgot to flip the opcode... and I didn't notice because it
was working fine for the GlobalISel.
2021-07-20 13:23:27 -07:00
Fangrui Song
0c0549fbb3 [AArch64] Delete unused Opcode after D106039 2021-07-20 12:51:44 -07:00
Eli Friedman
843c614058 [AArch64] Fix i128 cmpxchg using ldxp/stxp.
Basically two parts to this fix:

1. Stop using AtomicExpand to expand cmpxchg i128
2. Fix AArch64ExpandPseudoInsts to use a correct expansion.

From ARM architecture reference:

To atomically load two 64-bit quantities, perform a Load-Exclusive
pair/Store-Exclusive pair sequence of reading and writing the same value
for which the Store-Exclusive pair succeeds, and use the read values
from the Load-Exclusive pair.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51102

Differential Revision: https://reviews.llvm.org/D106039
2021-07-20 12:38:12 -07:00
Victor Huang
1a762f93f8 [PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D105194
2021-07-20 13:06:22 -05:00
Craig Topper
81efb82570 [RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants.
If we need to shift left anyway we might be able to take advantage
of LUI implicitly shifting its immediate left by 12 to cover part
of the shift. This allows us to use more bits of the LUI immediate
to avoid an ADDI.

isDesirableToCommuteWithShift now considers compressed instruction
opportunities when deciding if commuting should be allowed.

I believe this is the same or similar to one of the optimizations
from D79492.

Reviewed By: luismarques, arcbbb

Differential Revision: https://reviews.llvm.org/D105417
2021-07-20 09:22:06 -07:00
Craig Topper
98d4adc2d1 [RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)
Replace some existing isel patterns that are covered by the new
code. SLLIUWPat has been removed in favor of folding its root case
into the new code. The other uses in isel patterns for shXadd.uw
have been switched to using hardcoded AND masks.

This is based on the original version of D49585 from ARM. The final
version of that was made a DAG combine, but I've chosen to keep it
as custom isel. I'm not convinced DAG combine is as good with
shift pairs as it is with and+shift. I saw some issues optimizing
the shifts created by vscale lowering if an and isn't created for
from a shift pair.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106230
2021-07-20 08:53:55 -07:00
Stefan Pintilie
1a6dc92be7 [PowerPC] Inefficient register allocation of ACC registers results in many copies.
ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.

This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.

This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D105854
2021-07-20 10:53:40 -05:00
Craig Topper
84877a098a [RISCV] Use unordered indexed loads for MGATHER.
I don't think the semantics of the llvm masked gather intrinsic care
about the order the elements are loaded. For example, type legalization
by splitting will chain them in parallel. This is different than
scatter which we do chain in order.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106025
2021-07-20 08:46:02 -07:00
Bradley Smith
191f9fa5d2 [AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts
Instead move them to the instcombine that happens in AArch64TargetTransformInfo.

Differential Revision: https://reviews.llvm.org/D106144
2021-07-20 14:17:30 +00:00
Simon Pilgrim
c188f0b876 [X86] X86InstCombineIntrinsic.cpp - silence clang-tidy warnings about incorrect uses of auto. NFCI.
We were using auto instead of auto* in a number of places which failed the llvm-qualified-auto check.

Additionally we were using auto in some places where the type wasn't immediately obvious - the style guide rule of thumb is only to use auto from casts etc. where the type is already explicitly stated.
2021-07-20 13:37:45 +01:00
Sebastian Neubauer
2b08f6af62 [AMDGPU] Improve register computation for indirect calls
First, collect the register usage in each function, then apply the
maximum register usage of all functions to functions with indirect
calls.

This is more accurate than guessing the maximum register usage without
looking at the actual usage.

As before, assume that indirect calls will hit a function in the
current module.

Differential Revision: https://reviews.llvm.org/D105839
2021-07-20 13:48:50 +02:00
Stanislav Mekhanoshin
9dc2636623 [AMDGPU] Disable LDS lowering for GFX shaders
Apparently these need external LDS symbols to remain.

Fixes: SC1-3279

Differential Revision: https://reviews.llvm.org/D106288
2021-07-20 02:55:25 -07:00
Sander de Smalen
eb1a5120b8 [AArch64][SVE][InstCombine] last{a,b} of a splat vector
Replace last{a,b}(splat(X)) with X, irrespective of the predicate.

Patch by/Committing on behalf of: Usman Nadeem (mnadeem)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105520
2021-07-20 09:44:43 +01:00
Cullen Rhodes
15af3aaa2e [AArch64][SME] Add system registers and related instructions
This patch adds the new system registers introduced in SME:

  - ID_AA64SMFR0_EL1 (ro) SME feature identifier.
  - SMCR_ELx (r/w) streaming mode control register for configuring
    effective SVE Streaming SVE Vector length when the PE is in
    Streaming SVE mode.
  - SVCR (r/w) streaming vector control register, visible at all
    exception levels. Provides access to PSTATE.SM and PSTATE.ZA
    using MSR and MRS instructions.
  - SMPRI_EL1 (r/w) streaming mode execution priority register.
  - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
  - SMIDR_EL1 (ro) streaming mode identification register.
  - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
    SME context.
  - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
    labelling memory accesses performed in streaming mode.

Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:

  - MSR SVCRSM, #<imm1>
  - MSR SVCRZA, #<imm1>
  - MSR SVCRSMZA, #<imm1>

The following smstart/smstop aliases are also implemented for
convenience:

  smstart    -> MSR SVCRSMZA, #1
  smstart sm -> MSR SVCRSM,   #1
  smstart za -> MSR SVCRZA,   #1

  smstop     -> MSR SVCRSMZA, #0
  smstop sm  -> MSR SVCRSM,   #0
  smstop za  -> MSR SVCRZA,   #0

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105576
2021-07-20 08:06:26 +00:00
Amara Emerson
56a6686e0c [AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128.
We don't support truncating s128 stores, so don't form them.
2021-07-20 00:04:34 -07:00
Kai Luo
e2ee27b20b [PowerPC] Fallback to base's implementation of shouldExpandAtomicCmpXchgInIR and shouldExpandAtomicCmpXchgInIR
If we can't decide `shouldExpandAtomicCmpXchgInIR` or `shouldExpandAtomicCmpXchgInIR` in PPC's implementation after https://reviews.llvm.org/rGb9c3941cd61de1e1b9e4f3311ddfa92394475f4b, resort to base's implementation.

This fixes internal build of OpenMP which uses atomic operations on float.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106234
2021-07-20 06:14:24 +00:00
Matt Arsenault
30fa074c0a AArch64/GlobalISel: Preserve memory types 2021-07-19 20:21:05 -04:00
Derek Schuff
ad1f5457d2 [WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections
Debug info sections need R_WASM_FUNCTION_OFFSET_I32 relocs (with FK_Data_4 fixup
kinds) to refer to functions (instead of R_WASM_TABLE_INDEX as is used in data
sections). Usually this is done in a convoluted way, with unnamed temp data
symbols which target the start of the function, in which case
WasmObjectWriter::recordRelocation converts it to use the section symbol
instead. However in some cases the function can actually be undefined; in this
case the dwarf generator uses the function symbol (a named undefined function
symbol) instead. In that case the section-symbol transform doesn't work and we
need to generate the correct reloc type a different way. In this change
WebAssemblyWasmObjectWriter::getRelocType takes the fixup section type into
account to choose the correct reloc type.

Fixes PR50408
Differential Revision: https://reviews.llvm.org/D103557
2021-07-19 14:02:33 -07:00
Jonas Paulsson
6c0e6895d0 [SystemZ] Handle NoRegister in SystemZTargetLowering::emitMemMemWrapper().
Bugfix: The compiler should be able to generate a memset to nullptr.

Review: Ulrich Weigand
2021-07-19 20:04:44 +02:00
Amy Huang
fd972bb9fd Revert "[llvm][sve] Lowering for VLS truncating stores" because it
causes a seg fault (see https://reviews.llvm.org/D104471).

This reverts commit c305557acdaad453e32309d575fe9c6c7090c099.
2021-07-19 11:03:33 -07:00
Wouter van Oortmerssen
670944fb20 [WebAssembly] Support R_WASM_MEMORY_ADDR_TLS_SLEB64 for wasm64
Also fixed TLS tests swapping addr & value in store op
Differential Revision: https://reviews.llvm.org/D106096
2021-07-19 10:22:43 -07:00
Craig Topper
50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Simon Pilgrim
142e60f40b [X86] Fix case of IsAfterLegalize argument. NFC.
Pulled out of D106280
2021-07-19 17:15:28 +01:00
David Green
5561ad8b36 [ARM] Remove PromotedBitwiseVT for NEON types
This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32,
treating them the same as the AArch64 and MVE backends where we just add
the relevant patterns for each legal type. This prevents a lot of
bitcasts from being added to the DAG, which have the potential to make
optimizations more difficult. It does mean adding extra patterns, and
some codegen can change due to the types now being legal, not promoted.

Differential Revision: https://reviews.llvm.org/D105588
2021-07-19 16:36:33 +01:00
Matt Arsenault
e574fd9d52 AArch64/GlobalISel: Cleanup unnecessary size checks in call lowering
The CCValAssign types should now be accurate, so these are no longer
necessary.
2021-07-19 11:01:30 -04:00
Jeremy Morse
f46321207f [InstrRef][X86] Drop debug instruction numbers from x87 instructions
Avoid a crash when using instruction referencing if x87 floating point
instructions are used. These instructions are significantly mutated when
they're rewritten from referring to registers, to referring to
floating-point-stack positions. As a result, their operands are re-ordered,
and (InstrRef) LiveDebugValues asserts when it sees a DBG_INSTR_REF
referring to a non-reg non-def register operand.

To fix this, drop the instruction numbers, and thus variable locations.
This patch adds a helper utility do do that.

Dropping the variable locations is sub-optimal, but applying DBG_VALUEs to
the $fp0 and similar registers is dropped on emission too. It seems we've
never done well at describing variables that live in x87 registers, at all.

Differential Revision: https://reviews.llvm.org/D105657
2021-07-19 15:08:27 +01:00
Jay Foad
96d8f2a1e0 [AMDGPU] Fix typo in comments idexen -> idxen 2021-07-19 13:39:30 +01:00
Kazushi (Jam) Marukawa
4ee28b4fec [VE] Set getExtendForAtomicOps to ISD::ANY_EXTEND
The implementation of subword atomics does not actually
guarantee the result is zero-extended, which now caused
failures after https://reviews.llvm.org/D101342 was landed.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D106225
2021-07-19 19:58:44 +09:00
Kazushi (Jam) Marukawa
b28e5b7910 [VE] Disable relative lookup table converter pass for VE
VE's linker, /opt/nec/ve/bin/nld, doesn't implement relative lookup table.
The relative lookup table is introduced by https://reviews.llvm.org/D94355,
but we need to disable it at the moment.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D106224
2021-07-19 19:25:33 +09:00
Florian Mayer
d23f26f0af [NFC] [MTE] helper for stack tagging lifetimes.
Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D106135
2021-07-19 11:09:16 +01:00
Cullen Rhodes
f91eaa7007 [AArch64][SME] Add SVE2 instructions added in SME
This patch adds support for the following instructions:

    SCLAMP, UCLAMP, REV, DUP (predicate)

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D105577
2021-07-19 08:03:05 +00:00
David Green
eb1e95dbdf [ARM] Extend more reductions during lowering
This relaxes the VMLAV and VADDV reduction recognition code to handle
smaller than legal types, extending them as needed. That was already
handled for some reductions, this extends it to more types in a more
generic way. If a smaller than legal value is found it is extended to
the legal type as needed.

Differential Revision: https://reviews.llvm.org/D106051
2021-07-19 08:58:03 +01:00
Sander de Smalen
0ed0573527 [AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors.
The case for nxv2f32/nxv2i32 was already covered by D104573.
This patch builds on top of that by making the mechanism work for
nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106138
2021-07-19 08:29:28 +01:00
Eli Friedman
6601be4419 [X86] Remove incorrect use of known bits in shuffle simplification.
This reverts commit 2a419a0b9957ebac9e11e4b43bc9fbe42a9207df.

The result of a shufflevector must not propagate poison from any element
other than the one noted in the shuffle mask.

The regressions outside of fptoui-may-overflow.ll can probably be
recovered some other way; for example, using isGuaranteedNotToBePoison.

See discussion on https://reviews.llvm.org/D106053 for more background.

Differential Revision: https://reviews.llvm.org/D106222
2021-07-18 18:13:11 -07:00
Simon Pilgrim
51a12d2ff0 [X86][SSE] matchShuffleWithPACK - avoid poison pollution from bitcasting multiple elements together.
D106053 exposed that we've not been taking into account that by bitcasting smaller elements together and then performing a ComputeKnownBits on the result we'd be allowing a poison element to influence other neighbouring elements being used in the pack. Instead we now peek through any existing bitcast to ensure that the source type already matches the width source of the pack node we're trying to match.

This has also been a chance to stop matchShuffleWithPACK creating unused nodes on the fly which could affect oneuse tests during shuffle lowering/combining.

The only regression we're seeing is due to being unable to peek through a bitcast as its on the other side of a extract_subvector - which should go away once we finally allow shuffle combining across different vector widths (by making matchShuffleWithPACK using const SelectionDAG& we've gotten closer to this - see PR45974).
2021-07-18 14:25:28 +01:00
Jon Roelofs
5cd63e9ec2 [AArch64][GlobalISel] Legalize bswap <2 x i16>
Differential revision: https://reviews.llvm.org/D105935
2021-07-17 15:31:15 -07:00
David Green
5acddf5b09 [ARM] Lower non-extended small gathers via truncated gathers.
Corollary to 1113e06821e6baffc84b8caf96a28bf62e6d28dc this allows us to
match gather that dont produce a full vector width results. They use an
extended gather which is truncated back to the original type.
2021-07-17 22:38:31 +01:00
Eli Friedman
e41e865b15 [AArch64] Prepare for changes to STEP_VECTOR.
Rewrite patterns to assume that the operand of STEP_VECTOR is a
constant. The old patterns will stop working when the operand is changed
from a Constant to a TargetConstant. (See D105673.)

Add test coverage for certain patterns that weren't exercised by
existing regression tests.

Differential Revision: https://reviews.llvm.org/D105847
2021-07-17 14:13:41 -07:00
Nikita Popov
2c68ecccc9 [OpaquePtr] Remove uses of CreateGEP() without element type
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
2021-07-17 22:56:27 +02:00
Craig Topper
d0f8047d37 [RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8. 2021-07-17 11:24:20 -07:00
Nikita Popov
6d3e7c783b [OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.
2021-07-17 18:32:36 +02:00
Nikita Popov
357756ecf6 [OpaquePtr] Remove uses of CreateConstGEP1_64() without element type
Remove uses of to-be-deprecated API.
2021-07-17 16:43:20 +02:00
Nikita Popov
be5af50e7d [BPF] Use elementtype attribute for preserve.array/struct.index intrinsics
Use the elementtype attribute introduced in D105407 for the
llvm.preserve.array/struct.index intrinsics. It carries the
element type of the GEP these intrinsics effectively encode.

This patch:

 * Adds a verifier check that the attribute is required.
 * Adds it in the IRBuilder methods for these intrinsics.
 * Autoupgrades old bitcode without the attribute.
 * Updates the lowering code to use the attribute rather than
   the pointer element type.
 * Updates lots of tests to specify the attribute.
 * Adds -force-opaque-pointers to the intrinsic-array.ll test
   to demonstrate they work now.

https://reviews.llvm.org/D106184
2021-07-17 11:09:18 +02:00
Craig Topper
173332d175 [RISCV] Manually emit the best shift for VSCALE lowering to improve codegen.
We assume VLENB is a multiple of 8 and previously relied on shift
pairs being optimized to an AND+SHL/SHR and computeKnownBits
removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd
to have multiple uses. This patch manually emits the best shift
to workaround this.
2021-07-17 00:52:07 -07:00
jacquesguan
f4ec30d808 [RISCV] Make VLEN no greater than 65536
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106134
2021-07-17 12:47:46 +08:00