llvm-project

Author	SHA1	Message	Date
David Green	4704da1374	[ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP Given a patch like D129506, using instructions not valid for the current target feature set becomes an error. This fixes an issue in ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in Thumb1Only code, such as thumbv8m.baseline targets. Differential Revision: https://reviews.llvm.org/D129695	2022-07-20 12:04:22 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
David Green	e22576455f	[ARM] Update atomic tests for D129695. NFC	2022-07-19 19:36:08 +01:00
Simon Pilgrim	d8888e14a0	Revert rG14364200821f7b2d97edf6e78160c514800d3ec6 "[ARM] Regenerate reg_sequence.ll test checks" Breaks on some apple machines	2022-07-16 17:32:58 +01:00
Simon Pilgrim	1436420082	[ARM] Regenerate reg_sequence.ll test checks	2022-07-16 17:10:35 +01:00
Simon Pilgrim	a5d0122f75	[DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero. Differential Revision: https://reviews.llvm.org/D129150	2022-07-16 11:38:24 +01:00
Simon Pilgrim	8dbf9e8d02	[ARM] Regenerate pr36577.ll test checks	2022-07-15 13:54:17 +01:00
Simon Pilgrim	0a9752d937	[ARM] Regenerate hoist-and-by-const-from-shl-in-eqcmp-zero.ll cleanup some common prefixes	2022-07-15 11:28:41 +01:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Simon Pilgrim	5271fcd8fc	[ARM] Regenerate select_xform.ll test checks	2022-07-13 13:52:15 +01:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
Archibald Elliott	1666f09933	[ARM] Add Support for Cortex-M85 This patch adds support for Arm's Cortex-M85 CPU. The Cortex-M85 CPU is an Arm v8.1m Mainline CPU, with optional support for MVE and PACBTI, both of which are enabled by default. Parts have been coauthored by by Mark Murray, Alexandros Lamprineas and David Green. Differential Revision: https://reviews.llvm.org/D128415	2022-07-05 10:43:31 +01:00
David Green	dee59f7a9e	[ARM] Add Thumb-1 CTTZ codegen tests. NFC	2022-06-30 16:45:00 +01:00
Tim Northover	4aafebce52	SelectionDAG: allow FP extensions when folding extract/insert. Before, we were trying to sign extend half -> float, and asserted in getNode.	2022-06-28 12:08:35 +01:00
Lucas Prates	70a5c52534	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-27 14:08:48 +01:00
Tim Northover	69ae441e4c	ARM: don't try to load function pointer before long call. Deciding to load an arbitrary global based on whether the entire module is being built for long calls is pretty clearly spurious, and in fact the existing indirect logic is sufficient.	2022-06-27 13:59:35 +01:00
Martin Sebor	b19194c032	[InstCombine] handle subobjects of constant aggregates Remove the known limitation of the library function call folders to only work with top-level arrays of characters (as per the TODO comment in the code) and allows them to also fold calls involving subobjects of constant aggregates such as member arrays.	2022-06-21 11:55:14 -06:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Craig Topper	314dbde12c	[DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore. The VT we want to shrink to may not be legal especially after type legalization. Fixes PR56110. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128135	2022-06-19 15:50:15 -07:00
Krasimir Georgiev	8f2ba36336	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m" This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and dependent cbcce82ef6b512d97e92a319a75a03e997c844e1. Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test failures under asan, e.g., CodeGen/ARM/execute-only.ll: https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.	2022-06-15 16:10:02 +02:00
Lucas Prates	7625e01d66	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-14 13:37:51 +01:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
Lucas Prates	33b9ad647e	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records" Reverting change due to test failure. This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.	2022-06-13 11:00:49 +01:00
Lucas Prates	6119053dab	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-13 10:21:06 +01:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
Sam Parker	447c411fef	[ARM][ParallelDSP] Fix self reference bug Ensure we don't generate a smlad intrinsic that takes itself as an argument. Differential Revision: https://reviews.llvm.org/D127213	2022-06-09 09:10:57 +00:00
Martin Storsjö	98dc3e86fd	[ARM] [MinGW] Default to WinEH exception handling instead of Dwarf Switching this target to WinEH also seems to affect the `-windows-itanium` target. Differential Revision: https://reviews.llvm.org/D126870	2022-06-06 23:27:19 +03:00
Martin Storsjö	485432f3c8	[ARM] Make a narrow tMOVi8 where possible in SEH prologues We intentionally disable Thumb2SizeReduction for SEH prologues/epilogues, to avoid needing to guess what will happen with the instructions in a potential future pass in frame lowering. But for this specific case, where we know we can express the intent with a narrow instruction, change to that instruction form directly in frame lowering. Differential Revision: https://reviews.llvm.org/D126949	2022-06-03 22:33:55 +03:00
Martin Storsjö	bd52506d24	[ARM] Make narrow push/pop in SEH prologues/epilogues where applicable We intentionally disable Thumb2SizeReduction for SEH prologues/epilogues, to avoid needing to guess what will happen with the instructions in a potential future pass in frame lowering. But for this specific case, where we know we can express the intent with a narrow instruction, change to that instruction form directly in frame lowering. Differential Revision: https://reviews.llvm.org/D126948	2022-06-03 22:33:55 +03:00
Martin Storsjö	40c937cba2	[ARM] Fix restoring stack for varargs with SEH split frame pointer push Previously, the "add sp, #12" ended up inserted after "bx lr". Differential Revision: https://reviews.llvm.org/D126872	2022-06-03 09:32:00 +03:00
Martin Storsjö	9245c4930f	[ARM] Fix a test case typo. NFC. The test looked for the wrong string, but it happened to match as it was a substring of the actual output. This fixes a typo from d8e67c1cccd8fcb62230166caea744592288da17.	2022-06-02 13:04:50 +03:00
Martin Storsjö	668bb96379	[ARM] Implement lowering of the sponentry intrinsic This is needed for SEH based setjmp on Windows. Differential Revision: https://reviews.llvm.org/D126763	2022-06-02 12:29:59 +03:00
Martin Storsjö	2ab19bfa41	[ARM] Adjust the frame pointer when it's needed for SEH unwinding For functions that require restoring SP from FP (e.g. that need to align the stack, or that have variable sized allocations), the prologue and epilogue previously used to look like this: push {r4-r5, r11, lr} add r11, sp, #8 ... sub r4, r11, #8 mov sp, r4 pop {r4-r5, r11, pc} This is problematic, because this unwinding operation (restoring sp from r11 - offset) can't be expressed with the SEH unwind opcodes (probably because this unwind procedure doesn't map exactly to individual instructions; note the detour via r4 in the epilogue too). To make unwinding work, the GPR push is split into two; the first one pushing all other registers, and the second one pushing r11+lr, so that r11 can be set pointing at this spot on the stack: push {r4-r5} push {r11, lr} mov r11, sp ... mov sp, r11 pop {r11, lr} pop {r4-r5} bx lr For the same setup, MSVC generates code that uses two registers; r11 still pointing at the {r11,lr} pair, but a separate register used for restoring the stack at the end: push {r4-r5, r7, r11, lr} add r11, sp, #12 mov r7, sp ... mov sp, r7 pop {r4-r5, r7, r11, pc} For cases with clobbered float/vector registers, they are pushed after the GPRs, before the {r11,lr} pair. Differential Revision: https://reviews.llvm.org/D125649	2022-06-02 12:28:46 +03:00
Martin Storsjö	d8e67c1ccc	[ARM] Add SEH opcodes in frame lowering Skip inserting regular CFI instructions if using WinCFI. This is based a fair amount on the corresponding ARM64 implementation, but instead of trying to insert the SEH opcodes one by one where we generate other prolog/epilog instructions, we try to walk over the whole prolog/epilog range and insert them. This is done because in many cases, the exact number of instructions inserted is abstracted away deeper. For some cases, we manually insert specific SEH opcodes directly where instructions are generated, where the automatic mapping of instructions to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes). Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions, without heuristics about what later passes will do. Mark SEH instructions as scheduling boundaries, to make sure that they aren't reordered away from the instruction they describe by PostRAScheduler. Mark the SEH instructions with the NoMerge flag, to avoid doing tail merging of functions that have multiple epilogs that all end with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue". Differential Revision: https://reviews.llvm.org/D125648	2022-06-02 12:28:46 +03:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Simon Pilgrim	0a96885940	[ARM] uxtb.ll - adjust armv6 triple so the update_llc_test_checks.py script can be used to regenerate the tests No need to specify armv6-apple-darwin in these UXTB codegen tests	2022-06-01 15:28:19 +01:00
Simon Pilgrim	e1d02f6c37	[ARM][Thumb2] Refresh UXTB16 tests to match optimized IR from instcombine As discussed on D77804, instcombine will have already performed a similar SimplifyMultipleUseDemandedBits call which will break the UXTB16 pattern that was being match in these DAG tests I've updated the existing tests so that it match the instcombine IR (with a suitable FIXME) and added an equivalent test pattern suggested by @dmgreen	2022-06-01 15:28:19 +01:00
Nuno Lopes	80b3dcc045	[Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace There are a few places where we use report_fatal_error when the input is broken. Currently, this function always crashes LLVM with an abort signal, which then triggers the backtrace printing code. I think this is excessive, as wrong input shouldn't give a link to LLVM's github issue URL and tell users to file a bug report. We shouldn't print a stack trace either. This patch changes report_fatal_error so it uses exit() rather than abort() when its argument GenCrashDiag=false. Reviewed by: nikic, MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D126550	2022-05-30 19:19:23 +01:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after 62a9b36fcf728b104ea87e6eb84c0be69b779df7. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Ivan Kosarev	ad1d60c3be	[FileCheck] Catch missspelled directives. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125604	2022-05-26 11:37:19 +01:00
David Green	18cb3b3506	[ARM] Fix vcvtb/t.f16 input liveness The `vcvtb.f16.f32 Sd, Sn` (and vcvtt.f16.f32) instruction convert a f32 into a f16, writing either the top or bottom halves of the register. That means that half of the input register Sd is used in the output. This wasn't being modelled in the instructions, leading later analyses to believe that the registers were dead where they were not, generating invalid scheduling Fix that be specifying the input Sda register for the instructions too, allowing them to be set for cases like vector inserts. Most of the changes are plumbing through the constraint string, cstr. Differential Revision: https://reviews.llvm.org/D126118	2022-05-25 12:16:26 +01:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
David Green	a86cfaea54	[ARM] Add register-mask for tail returns The TC_RETURN/TCRETURNdi under Arm does not currently add the register-mask operand when tail folding, which leads to the register (like LR) not being 'used' by the return. This changes the code to unconditionally set the register mask on the call, as opposed to skipping it for tail calls. I don't believe this will currently alter any codegen, but should glue things together better post-frame lowering. It matches the AArch64 code better. Differential Revision: https://reviews.llvm.org/D125906	2022-05-21 15:28:24 +01:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
Craig Topper	74f6ded49d	[AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC This bug is in generic DAG combine and easily reproducible on many targets. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D125640	2022-05-16 09:28:11 -07:00

1 2 3 4 5 ...

4575 Commits