This was assuming the vector types were MVTs, but they don't have to be.
Note that the concrete output of the test isn't very useful, since it's
dominated by nonsensical calling convention lowering for the weird types.
Differential Revision: https://reviews.llvm.org/D126505
The ToBeRemoved is used to remove any MachineInstructions that are no
longer needed, making sure we don't invalidate the iterator that is
currently in use by erasing the instruction straight away. This makes
issues for keeping the code in SSA from though, where subsequent
transforms that require SSA form may have been broken by previous
peepholes.
If, instead, we use make_early_inc_range the iteration issue shouldn't
be present, so long as we do not remove the subsequent instruction in
the peephole optimizations. That way the code between transforms is kept
in SSA form, meaning hopefully less things that can go wrong.
Differential Revision: https://reviews.llvm.org/D127296
Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work
with TypeSize directly rather than silently casting to unsigned.
To accomplish this I've extended TypeSize with an interface that
essentially allows TypeSize division when both operands have the
same number of dimensions.
There still exists combinations of scalable vector bitcasts that
cause compiler crashes. I call these out by adding "is missing"
entries to sve-bitcast.
Depends on D126957.
Fixes: #55114
Differential Revision: https://reviews.llvm.org/D127126
Bitcasting between unpacked scalable vector types of different
element counts is not a NOP because the live elements are laid out
differently.
01234567
e.g. nxv2i32 = XX??XX??
nxv4f16 = X?X?X?X?
Differential Revision: https://reviews.llvm.org/D126957
isDef32 would attempt to make a guess at which SelectionDag nodes were
32bit sources, and use the nature of 32bit AArch64 instructions
implicitly zeroing the upper register half to not emit zext that were
expected to already be zero. This was a bit fragile though, needing to
guess at the correct opcodes that do not become 32bit defs later in
ISel.
This patch removed isDef32, relying on the AArch64MIPeephole optimizer
to remove redundant SUBREG_TO_REG nodes. A part of
SelectArithExtendedRegister was left with the same logic as a heuristic
to prevent some regressions from it picking less optimal sequences.
The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a
FPR will become a FMOVSWr, which it lowers immediately to make sure that
remains true through register allocation.
Fixes#55833
Differential Revision: https://reviews.llvm.org/D127154
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
Patch adds new GICombineRules for G_ADD:
G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y
Patch additionally adds new combine tests for AArch64 target for
these new rules.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D87936
This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into
shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two
vectors and returns one, adding adjacent pairs. So we match x in a
custom combine as it is lowered from a v8i32. The original code
would be 2 rev64 and 2 add, with the new code being a single addp
with a zip1;zip2 shuffle, producing smaller code.
Differential Revision: https://reviews.llvm.org/D126686
Patch enables custom lowering for MVT::nxv4bf16 because otherwise
the refactored test file triggers a selection failure.
The reason for the refactoring it to highlight cases where the
generated code is wrong.
If all available vals to basic block are the same - do not build new phi node and
just use this value.
Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126525
LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering
scalable vector floating point INSERT_SUBVECTOR. However, these
nodes only make sense for integer types and thus isel patterns do
not exist for floating point, which leads to isel failures.
This patch ensures floating point operands are cast to integer
before the core lowering takes place.
Fixes: #55037
Differential Revision: https://reviews.llvm.org/D126487
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:
* If IR that contains *neither* explicit ptr nor %T* types is passed
to tools, we will now use opaque pointer mode, unless
-opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
It is possible to opt-out by calling setOpaquePointers(false) on
LLVMContext.
A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.
Differential Revision: https://reviews.llvm.org/D126689
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.
This patch is largely NFC.
Reviewed By: paulwalker-arm, david-arm
Differential Revision: https://reviews.llvm.org/D125977
This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISel, they may also be FMUL(COPY(DUP(..))), which this patch now
ignores the no-op COPY in.
Differential Revision: https://reviews.llvm.org/D126632
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).
One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the
larger node using a vector extract to obtain the smaller. This comes up
in the smull/smlal code, but needs a small fixup to allow the smull2
code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still
match smull2 extracts.
Differential Revision: https://reviews.llvm.org/D126449
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).
This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.
The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.
Differential Revision: https://reviews.llvm.org/D122930
treated as Copy instruction in MCP.
This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement
Differential Revision: https://reviews.llvm.org/D125335
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.
Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf728b104ea87e6eb84c0be69b779df7.
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle
irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
If the VT is i2, then 2 is really -2.
Test has not been commited yet, but diff shows the change.
Fixes PR55644.
Differential Revision: https://reviews.llvm.org/D126213
This diff adds tests that check the currently-working stackmap cases for i128.
This will help ensure no regressions are later introduced by D125680 (when
ready).
Note that i128 stackmap support is currently incomplete, so we cant test all
i128 functionality:
i128 constants >= 2^{63} crash LLVM
non-constant i128s crash LLVM
So this change tests only constant i128 operands of value < 2^{63}.
A couple of incorrect comments are also fixed.
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule
STP Q's to the same base-address in ascending order of offsets. We have found
this to improve performance on Neoverse N1 and should not hurt other AArch64
cores.
Differential Revision: https://reviews.llvm.org/D125377
Currently reassociating add expressions can lead to failing to select
(u|s)mlal. Implement isReassocProfitable to skip reassociating
expressions that can be lowered to (u|s)mlal.
The same issue exists for the *mlsl variants as well, but the DAG
combiner doesn't use the isReassocProfitable hook before reassociating.
To be fixed in a follow-up commit as this requires DAGCombiner changes
as well.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D125895
This patch implements a for a target specific optimization that replaces
the cmp and csel from cttz with an and mask.
Recommitted with a fix for truncated value sizes.
Differential Revision: https://reviews.llvm.org/D123782
Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D124836
A TBL instruction will fill out-of-range values with 0's, something used
in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for
v16i8, but for v8i8 the input is still treated as a v16i8, so
out-of-range values (like a lane index of 8) would end up loading values
from the top half of the input register. Clean this up by detecting the
out of range values and making sure they really use out of range values.
There is a fix for swapped indices of 64bit input vectors too, which
could be incorrectly adjusted if the zerovector was the first operand.
Fixes#55545
Differential Revision: https://reviews.llvm.org/D125865