5629 Commits

Author SHA1 Message Date
Ahmed Bougacha
c68b469e07 [AArch64][SVE] Don't crash on pre-legalizer types in extload combine.
This was assuming the vector types were MVTs, but they don't have to be.

Note that the concrete output of the test isn't very useful, since it's
dominated by nonsensical calling convention lowering for the weird types.

Differential Revision: https://reviews.llvm.org/D126505
2022-06-09 10:33:21 -07:00
Guillaume Chatelet
dc3367970e [SelectionDAG] Handle bzero/memset libcalls globally instead of per target
Differential Revision: https://reviews.llvm.org/D127279
2022-06-09 08:34:55 +00:00
Florian Mayer
0593ce5f0b [MC] Add 'G' to augmentation string for MTE instrumented functions
This was agreed on in
https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html

The thread proposed two options
* add a character to augmentation string and handle in libuwind
* use a separate personality function.

It was determined that this is the simpler and better option.

This is part of ARM's Aarch64 ABI:
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22

The next step after this is teaching libunwind to untag when this
augmentation character is set.

Reviewed By: MaskRay, eugenis

Differential Revision: https://reviews.llvm.org/D127007
2022-06-08 12:36:32 -07:00
David Green
a1aef4f374 [AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt
The ToBeRemoved is used to remove any MachineInstructions that are no
longer needed, making sure we don't invalidate the iterator that is
currently in use by erasing the instruction straight away. This makes
issues for keeping the code in SSA from though, where subsequent
transforms that require SSA form may have been broken by previous
peepholes.

If, instead, we use make_early_inc_range the iteration issue shouldn't
be present, so long as we do not remove the subsequent instruction in
the peephole optimizations. That way the code between transforms is kept
in SSA form, meaning hopefully less things that can go wrong.

Differential Revision: https://reviews.llvm.org/D127296
2022-06-08 17:26:07 +01:00
David Green
33ead6e444 [AArch64] Add tests for bitcast high register extracts. NFC 2022-06-08 15:26:31 +01:00
Paul Walker
d88354213c [SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST.
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.

To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.

There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.

Depends on D126957.
Fixes: #55114

Differential Revision: https://reviews.llvm.org/D127126
2022-06-08 10:30:07 +01:00
Paul Walker
a1121c31d8 [SVE] Fix incorrect code generation for bitcasts of unpacked vector types.
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
               01234567
e.g. nxv2i32 = XX??XX??
     nxv4f16 = X?X?X?X?

Differential Revision: https://reviews.llvm.org/D126957
2022-06-08 10:30:07 +01:00
David Green
bccbf5276e [AArch64] Remove isDef32
isDef32 would attempt to make a guess at which SelectionDag nodes were
32bit sources, and use the nature of 32bit AArch64 instructions
implicitly zeroing the upper register half to not emit zext that were
expected to already be zero. This was a bit fragile though, needing to
guess at the correct opcodes that do not become 32bit defs later in
ISel.

This patch removed isDef32, relying on the AArch64MIPeephole optimizer
to remove redundant SUBREG_TO_REG nodes. A part of
SelectArithExtendedRegister was left with the same logic as a heuristic
to prevent some regressions from it picking less optimal sequences.
The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a
FPR will become a FMOVSWr, which it lowers immediately to make sure that
remains true through register allocation.

Fixes #55833

Differential Revision: https://reviews.llvm.org/D127154
2022-06-07 18:57:59 +01:00
Matt Arsenault
56303223ac llvm-reduce: Don't assert on functions which don't track liveness
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
2022-06-07 10:00:25 -04:00
David Green
6468feaeac [AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833. NFC 2022-06-07 13:55:53 +01:00
Michael Kitzan
b7fcf6632f [GISel] Add new combines for G_ADD
Patch adds new GICombineRules for G_ADD:

G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y

Patch additionally adds new combine tests for AArch64 target for
these new rules.

Reviewed by: paquette

Differential Revision: https://reviews.llvm.org/D87936
2022-06-06 11:19:45 -07:00
David Green
4ea1b43527 [AArch64] Generate ADDP from shuffled add
This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into
shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two
vectors and returns one, adding adjacent pairs. So we match x in a
custom combine as it is lowered from a v8i32. The original code
would be 2 rev64 and 2 add, with the new code being a single addp
with a zip1;zip2 shuffle, producing smaller code.

Differential Revision: https://reviews.llvm.org/D126686
2022-06-06 11:39:51 +01:00
Paul Walker
2dde272db7 [SVE] Refactor sve-bitcast.ll to include all combinations for legal types.
Patch enables custom lowering for MVT::nxv4bf16 because otherwise
the refactored test file triggers a selection failure.

The reason for the refactoring it to highlight cases where the
generated code is wrong.
2022-06-03 12:09:19 +01:00
David Green
79e3b043e5 [AArch64] Add extra addp codegen tests. NFC 2022-06-03 11:36:40 +01:00
Serguei Katkov
24e16e4af2 [SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and
just use this value.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126525
2022-06-03 12:24:33 +07:00
Serguei Katkov
c4d955dd7f [MachineSSAUpdate] Add a test for redundant phi generation. 2022-06-03 11:27:14 +07:00
Paul Walker
48ea26a387 [SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR.
LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering
scalable vector floating point INSERT_SUBVECTOR. However, these
nodes only make sense for integer types and thus isel patterns do
not exist for floating point, which leads to isel failures.

This patch ensures floating point operands are cast to integer
before the core lowering takes place.

Fixes: #55037

Differential Revision: https://reviews.llvm.org/D126487
2022-06-02 14:51:04 +01:00
Nikita Popov
41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Hendrik Greving
a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving
e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving
430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Fangrui Song
873d2aff42 [AArch64][test] Replace -march with -mtriple for llc RUN lines
-march is error-prone: -march inherits the OS and environment from the default
target triple. Use -mtriple which is more common.
2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov
a72cc958a3 [CodeGen][AArch64] Add support for LDAPR
This diff adds support for LDAPR (RCPC extension)
(https://github.com/llvm/llvm-project/issues/55561).

Differential revision: https://reviews.llvm.org/D126250

Test plan: ninja check-all
2022-05-31 21:40:50 +00:00
Sander de Smalen
9c38fc111b [AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.

This patch is largely NFC.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D125977
2022-05-31 16:25:01 +02:00
David Green
5cb14dc5a3 [AArch64] Look through copy in MachineCombiner FMUL patterns.
This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISel, they may also be FMUL(COPY(DUP(..))), which this patch now
ignores the no-op COPY in.

Differential Revision: https://reviews.llvm.org/D126632
2022-05-31 09:28:00 +01:00
Edd Barrett
d245974e1a Test stackmap support for floating point types.
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).

One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
2022-05-30 10:49:32 +01:00
David Green
99b0078064 [AArch64] Tests for showing MachineCombiner COPY patterns. NFC 2022-05-30 10:47:44 +01:00
David Green
9a3144d078 [AArch64] Reuse larger DUP if available
If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the
larger node using a vector extract to obtain the smaller. This comes up
in the smull/smlal code, but needs a small fixup to allow the smull2
code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still
match smull2 extracts.

Differential Revision: https://reviews.llvm.org/D126449
2022-05-29 19:42:13 +01:00
Serge Pavlov
bdd0093f4d [GlobalISel] Add G_IS_FPCLASS
Add a generic opcode to represent `llvm.is_fpclass` intrinsic.

Differential Revision: https://reviews.llvm.org/D121454
2022-05-27 13:49:47 +07:00
Rahman Lavaee
3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c53e130c3c3dd3877384301e3c495554.
2022-05-26 18:45:40 -07:00
Rahman Lavaee
4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Adrian Tong
7c13ae6490 Give option to use isCopyInstr to determine which MI is
treated as Copy instruction in MCP.

This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement

Differential Revision: https://reviews.llvm.org/D125335
2022-05-26 18:43:16 +00:00
Chen Zheng
d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf728b104ea87e6eb84c0be69b779df7.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Chen Zheng
80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Paul Walker
6f215ca680 [SelectionDAG] Add support to widen ISD::STEP_VECTOR operations.
Fixes: #55165

Differential Revision: https://reviews.llvm.org/D126168
2022-05-24 22:42:37 +01:00
Chen Zheng
62a9b36fcf [MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-24 01:16:19 -04:00
Craig Topper
569d8945f3 [DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2.
If the VT is i2, then 2 is really -2.

Test has not been commited yet, but diff shows the change.

Fixes PR55644.

Differential Revision: https://reviews.llvm.org/D126213
2022-05-23 11:13:57 -07:00
Craig Topper
75eb0576de [AArch64] Add test case for pr55644. NFC 2022-05-23 11:13:57 -07:00
Edd Barrett
c5e5cf1258 Test stackmap support for i128
This diff adds tests that check the currently-working stackmap cases for i128.
This will help ensure no regressions are later introduced by D125680 (when
ready).

Note that i128 stackmap support is currently incomplete, so we cant test all
i128 functionality:

    i128 constants >= 2^{63} crash LLVM
    non-constant i128s crash LLVM

So this change tests only constant i128 operands of value < 2^{63}.

A couple of incorrect comments are also fixed.
2022-05-23 11:56:24 +01:00
Simon Pilgrim
dd231f02a3 [AArch64] Regenerate andandshift.ll test checks 2022-05-23 11:48:24 +01:00
Andre Vieira
572fc7d2fd [AArch64] Order STP Q's by ascending address
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule
STP Q's to the same base-address in ascending order of offsets. We have found
this to improve performance on Neoverse N1 and should not hurt other AArch64
cores.

Differential Revision: https://reviews.llvm.org/D125377
2022-05-23 09:50:44 +01:00
Florian Hahn
0cc981e021
[AArch64] implement isReassocProfitable, disable for (u|s)mlal.
Currently reassociating add expressions can lead to failing to select
(u|s)mlal. Implement isReassocProfitable to skip reassociating
expressions that can be lowered to (u|s)mlal.

The same issue exists for the *mlsl variants as well, but the DAG
combiner doesn't use the isReassocProfitable hook before reassociating.
To be fixed in a follow-up commit as this requires DAGCombiner changes
as well.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125895
2022-05-23 09:39:00 +01:00
David Green
6ef5e242f2 [AArch64] Fix assumptions on input type of tryCombineFixedPointConvert
It is possible for the input type to not be v2i64 or v4i32, so weaken
the assertion to a return, fixing the crash in the new test.

Fixes #55606
2022-05-23 08:55:54 +01:00
Paul Walker
258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Bill Wendling
d497129f9b [AArch64] Use proper instruction mnemonics for FPRs
The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D.

Differential Revision: https://reviews.llvm.org/D126083
2022-05-20 12:02:26 -07:00
Rahul Anand R
534ea8bca5 [AArch64] Generate AND in place of CSEL for predicated CTTZ
This patch implements a for a target specific optimization that replaces
the cmp and csel from cttz with an and mask.

Recommitted with a fix for truncated value sizes.

Differential Revision: https://reviews.llvm.org/D123782
2022-05-20 13:41:32 +01:00
Bill Wendling
6e00a34cdb [AArch64] Add support for -fzero-call-used-regs
Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D124836
2022-05-19 16:58:28 -07:00
David Green
602f81ec33 [AArch64] Fix zero element TBL indices
A TBL instruction will fill out-of-range values with 0's, something used
in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for
v16i8, but for v8i8 the input is still treated as a v16i8, so
out-of-range values (like a lane index of 8) would end up loading values
from the top half of the input register. Clean this up by detecting the
out of range values and making sure they really use out of range values.
There is a fix for swapped indices of 64bit input vectors too, which
could be incorrectly adjusted if the zerovector was the first operand.

Fixes #55545

Differential Revision: https://reviews.llvm.org/D125865
2022-05-19 13:54:35 +01:00
David Green
dd644ddf85 [AArch64] Extend zero vector TBL codegen tests. NFC 2022-05-19 13:01:55 +01:00
Jon Roelofs
d699e54ca2 Fix an or+and miscompile w/ GlobalISel
Fixes #55284
2022-05-18 19:09:47 -07:00