llvm-project

Author	SHA1	Message	Date
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Florian Mayer	0593ce5f0b	[MC] Add 'G' to augmentation string for MTE instrumented functions This was agreed on in https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html The thread proposed two options * add a character to augmentation string and handle in libuwind * use a separate personality function. It was determined that this is the simpler and better option. This is part of ARM's Aarch64 ABI: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22 The next step after this is teaching libunwind to untag when this augmentation character is set. Reviewed By: MaskRay, eugenis Differential Revision: https://reviews.llvm.org/D127007	2022-06-08 12:36:32 -07:00
David Green	a1aef4f374	[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt The ToBeRemoved is used to remove any MachineInstructions that are no longer needed, making sure we don't invalidate the iterator that is currently in use by erasing the instruction straight away. This makes issues for keeping the code in SSA from though, where subsequent transforms that require SSA form may have been broken by previous peepholes. If, instead, we use make_early_inc_range the iteration issue shouldn't be present, so long as we do not remove the subsequent instruction in the peephole optimizations. That way the code between transforms is kept in SSA form, meaning hopefully less things that can go wrong. Differential Revision: https://reviews.llvm.org/D127296	2022-06-08 17:26:07 +01:00
David Green	33ead6e444	[AArch64] Add tests for bitcast high register extracts. NFC	2022-06-08 15:26:31 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
David Green	bccbf5276e	[AArch64] Remove isDef32 isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs later in ISel. This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences. The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a FPR will become a FMOVSWr, which it lowers immediately to make sure that remains true through register allocation. Fixes #55833 Differential Revision: https://reviews.llvm.org/D127154	2022-06-07 18:57:59 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
David Green	6468feaeac	[AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833 . NFC	2022-06-07 13:55:53 +01:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
David Green	4ea1b43527	[AArch64] Generate ADDP from shuffled add This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code. Differential Revision: https://reviews.llvm.org/D126686	2022-06-06 11:39:51 +01:00
Paul Walker	2dde272db7	[SVE] Refactor sve-bitcast.ll to include all combinations for legal types. Patch enables custom lowering for MVT::nxv4bf16 because otherwise the refactored test file triggers a selection failure. The reason for the refactoring it to highlight cases where the generated code is wrong.	2022-06-03 12:09:19 +01:00
David Green	79e3b043e5	[AArch64] Add extra addp codegen tests. NFC	2022-06-03 11:36:40 +01:00
Serguei Katkov	24e16e4af2	[SSAUpdaterImpl] Do not generate phi node with all the same incoming values If all available vals to basic block are the same - do not build new phi node and just use this value. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525	2022-06-03 12:24:33 +07:00
Serguei Katkov	c4d955dd7f	[MachineSSAUpdate] Add a test for redundant phi generation.	2022-06-03 11:27:14 +07:00
Paul Walker	48ea26a387	[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering scalable vector floating point INSERT_SUBVECTOR. However, these nodes only make sense for integer types and thus isel patterns do not exist for floating point, which leads to isel failures. This patch ensures floating point operands are cast to integer before the core lowering takes place. Fixes: #55037 Differential Revision: https://reviews.llvm.org/D126487	2022-06-02 14:51:04 +01:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Fangrui Song	873d2aff42	[AArch64][test] Replace -march with -mtriple for llc RUN lines -march is error-prone: -march inherits the OS and environment from the default target triple. Use -mtriple which is more common.	2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov	a72cc958a3	[CodeGen][AArch64] Add support for LDAPR This diff adds support for LDAPR (RCPC extension) (https://github.com/llvm/llvm-project/issues/55561). Differential revision: https://reviews.llvm.org/D126250 Test plan: ninja check-all	2022-05-31 21:40:50 +00:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Edd Barrett	d245974e1a	Test stackmap support for floating point types. It appears that float support is complete, or at least, the stackmap records emitted are not inconceivable (I must admit that I don't know about many of the architectures under test here). One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect) quirk of the stackmap format: in the case of a Register record, the Offset or SmallConstant field can encode a sub-register index! I've only ever seen this field zero for Register entries up until now.	2022-05-30 10:49:32 +01:00
David Green	99b0078064	[AArch64] Tests for showing MachineCombiner COPY patterns. NFC	2022-05-30 10:47:44 +01:00
David Green	9a3144d078	[AArch64] Reuse larger DUP if available If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts. Differential Revision: https://reviews.llvm.org/D126449	2022-05-29 19:42:13 +01:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit 4d8d2580c53e130c3c3dd3877384301e3c495554.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after 62a9b36fcf728b104ea87e6eb84c0be69b779df7. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	75eb0576de	[AArch64] Add test case for pr55644. NFC	2022-05-23 11:13:57 -07:00
Edd Barrett	c5e5cf1258	Test stackmap support for i128 This diff adds tests that check the currently-working stackmap cases for i128. This will help ensure no regressions are later introduced by D125680 (when ready). Note that i128 stackmap support is currently incomplete, so we cant test all i128 functionality: i128 constants >= 2^{63} crash LLVM non-constant i128s crash LLVM So this change tests only constant i128 operands of value < 2^{63}. A couple of incorrect comments are also fixed.	2022-05-23 11:56:24 +01:00
Simon Pilgrim	dd231f02a3	[AArch64] Regenerate andandshift.ll test checks	2022-05-23 11:48:24 +01:00
Andre Vieira	572fc7d2fd	[AArch64] Order STP Q's by ascending address This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores. Differential Revision: https://reviews.llvm.org/D125377	2022-05-23 09:50:44 +01:00
Florian Hahn	0cc981e021	[AArch64] implement isReassocProfitable, disable for (u\|s)mlal. Currently reassociating add expressions can lead to failing to select (u\|s)mlal. Implement isReassocProfitable to skip reassociating expressions that can be lowered to (u\|s)mlal. The same issue exists for the *mlsl variants as well, but the DAG combiner doesn't use the isReassocProfitable hook before reassociating. To be fixed in a follow-up commit as this requires DAGCombiner changes as well. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125895	2022-05-23 09:39:00 +01:00
David Green	6ef5e242f2	[AArch64] Fix assumptions on input type of tryCombineFixedPointConvert It is possible for the input type to not be v2i64 or v4i32, so weaken the assertion to a return, fixing the crash in the new test. Fixes #55606	2022-05-23 08:55:54 +01:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Bill Wendling	d497129f9b	[AArch64] Use proper instruction mnemonics for FPRs The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D. Differential Revision: https://reviews.llvm.org/D126083	2022-05-20 12:02:26 -07:00
Rahul Anand R	534ea8bca5	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Recommitted with a fix for truncated value sizes. Differential Revision: https://reviews.llvm.org/D123782	2022-05-20 13:41:32 +01:00
Bill Wendling	6e00a34cdb	[AArch64] Add support for -fzero-call-used-regs Support the "-fzero-call-used-regs" option on AArch64. This involves much less specialized code than the X86 version. Most of the checks can be done with TableGen. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D124836	2022-05-19 16:58:28 -07:00
David Green	602f81ec33	[AArch64] Fix zero element TBL indices A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so out-of-range values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand. Fixes #55545 Differential Revision: https://reviews.llvm.org/D125865	2022-05-19 13:54:35 +01:00
David Green	dd644ddf85	[AArch64] Extend zero vector TBL codegen tests. NFC	2022-05-19 13:01:55 +01:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00

1 2 3 4 5 ...

5629 Commits