llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Nick Anderson	f1ec0d12bb	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #75380 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-09 13:32:59 +07:00
Simon Pilgrim	7648371c25	Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054 )" Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)" Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104	2024-01-05 12:28:10 +00:00
Nick Anderson	e0c554ad87	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #64560 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-05 13:47:56 +07:00
zhongyunde 00443407	d6f4d5209f	[CGP][AArch64] Rebase the common base offset for better ISel When all the large const offsets masked with the same value from bit-12 to bit-23. Fold add x8, x0, #2031, lsl #12 add x8, x8, #960 ldr x9, [x8, x8] ldr x8, [x8, #2056] into add x8, x0, #2031, lsl #12 ldr x9, [x8, #960] ldr x8, [x8, #3016]	2023-12-05 09:01:41 +08:00
Paul Walker	6383785bad	[SVE][CodeGenPrepare] Sink address calculations that match SVE gather/scatter addressing modes. (#66996 ) SVE supports scalar+vector and scalar+extw(vector) addressing modes. However, the masked gather/scatter intrinsics take a vector of addresses, which means address computations can be hoisted out of loops. The is especially true for things like offsets where the true size of offsets is lost by the time you get to code generation. This is problematic because it forces the code generator to legalise towards `<vscale x 2 x ty>` vectors that will not maximise bandwidth if the main block datatypes is in fact i32 or smaller. This patch sinks GEPs and extends for cases where one of the above addressing modes can be used. NOTE: There are cases where it would be better to split the extend in two with one half hoisted out of a loop and the other within the loop. Whilst true I think this switch of default is still better than before because the extra extends are an improvement over being forced to split a gather/scatter.	2023-10-11 13:20:08 +01:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Fangrui Song	ca65969c3e	[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a CIE needs an R_AARCH64_ABS64 relocation. In common configurations that `__gcc_personality_v0` is defined in a shared object, this will lead to a discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset: https://github.com/llvm/llvm-project/issues/60392 Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed since the initial port in 2012), let's follow suit by simplifying the code. ( For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame offset > 2GiB. This is unneeded for small code model since "Max text segment size < 2GiB" but making `-fno-pic -mcmodel={tiny,small}` different seems unnecessary: the scenarios that uses both -fno-pic and C++ exceptions have been increasingly rare now, so there is little advantage optimizing for the little size saving with code complexity. ) --- Two clang/test/Interpreter tests would fail without 6747fc07d1aa94e22622e278e5a02ba70675ac9b ([ORC] Use JITLink as the default linker for LLJIT on Linux/arm64.) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D143039	2023-02-05 10:46:43 -08:00
NAKAMURA Takumi	5b2549b0d2	Revert "[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings" It causes failurs in clang-interpreter. This reverts commit 565a1fb1334b8cf510af1338cae3f50815a99f90, aka llvmorg-17-init-1048-g565a1fb1334b	2023-02-04 20:49:39 +09:00
Fangrui Song	565a1fb133	[AArch64] Unconditionally use DW_EH_PE_indirect\|DW_EH_PE_pcrel personality/lsda/ttype encodings For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a CIE needs an R_AARCH64_ABS64 relocation. In common configurations that `__gcc_personality_v0` is defined in a shared object, this will lead to a discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset: https://github.com/llvm/llvm-project/issues/60392 Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed since the initial port in 2012), let's follow suit by simplifying the code. ( For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame offset > 2GiB. This is necessary for small code model since "Max text segment size < 2GiB" but it is unnecessary to make `-fno-pic -mcmodel={tiny,small}` different: The scenarios that uses both -fno-pic and C++ exceptions have been increasingly rare now, so there is little advantage optimizing for the little size saving with code complexity. ) Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D143039	2023-02-03 10:41:04 -08:00
David Green	28de5f99bb	[AArch64] Sink to umull if we know tops bits are zero. This is an extension to the code for sinking splats to multiplies, where if we can detect that the top bits are known-zero we can treat the instruction like a zext. The existing code was also adjusted in the process to make it more precise about only sinking if both operands are zext or sext. Differential Revision: https://reviews.llvm.org/D141275	2023-01-16 10:44:38 +00:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Nikita Popov	33a3139f80	[CGP] Avoid branch on poison UB in test (NFC)	2023-01-03 14:52:48 +01:00
Nikita Popov	02b02cd050	[CodeGenPrepare] Avoid branch on undef UB in tests (NFC)	2023-01-03 13:51:00 +01:00
Matt Arsenault	d9e51e7552	CodeGenPrepare: Convert most tests to opaque pointers NVPTX/dont-introduce-addrspacecast.ll required manually removing a check for a bitcast. AArch64/combine-address-mode.ll required rerunning update_test_checks Mips required some manual updates due to a CHECK-NEXT coming after a deleted bitcast. ARM/sink-addrmode.ll needed one small manual fix. Excludes one X86 function which needs more attention.	2022-11-28 09:21:59 -05:00
Matt Arsenault	e446d1d93f	CodeGenPrepare: Don't use anonymous values some in tests These are always an obstacle to test updates, and often break after running opaquify scripts on them.	2022-11-27 10:30:37 -05:00
Florian Hahn	b2c195da6d	[CGP] Update failing test missed in 81a11da762577.	2022-09-15 19:35:25 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Florian Hahn	786c687810	[AArch64] Add support for FMA intrinsics to shouldSinkOperands. If the fma operates on a legal vector type, the indexed variants can be used, if the second operand is a splat of a valid index. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D126234	2022-05-27 10:37:03 +01:00
Florian Hahn	a9a012086a	[AArch64] Add additional tests for sinking free shuffles for FMAs.	2022-05-26 10:35:38 +01:00
Florian Hahn	8661725686	[AArch64] Add tests with free shuffles for indexed fma variants. The new tests contain examples where shuffles are free, because indexed fma instructions can be used.	2022-05-23 20:27:42 +01:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.	2022-03-04 17:36:26 +01:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Florian Hahn	166968a892	[AArch64] Add test cases where zext can be lowered to series of tbl. Add a set of tests for upcoming patches that allow lowering vector zext using AArch64 tbl instructions instead of shifts.	2022-02-25 15:36:32 +00:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
Micah Weston	93deac2e2b	[AArch64] Optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-22 12:39:22 +00:00
Florian Hahn	62476c7c14	Revert "[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt" This reverts commit e6698f09929a134bf0f46d9347142b86d8f636a2. This commit appears to introduce new machine verifier failures when building the llvm-test-suite with `-mllvm -verify-machineinstrs` enabled: https://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O3/11061/ FAILED: MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite-build/tools/timeit --summary MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c * Bad machine code: Illegal virtual register for instruction * - function: alloc_tree - basic block: %bb.1 if.else (0x7fc0db8f8bb0) - instruction: %31:gpr64 = nsw MADDXrrr killed %39:gpr64sp, killed %25:gpr64, $xzr - operand 1: killed %39:gpr64sp Expected a GPR64 register, but got a GPR64sp register fatal error: error in backend: Found 1 machine code errors. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c'. 4. Running pass 'Verify generated machine code' on function '@alloc_tree' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 clang 0x000000011191896b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43 1 clang 0x00000001119179b5 llvm::sys::RunSignalHandlers() + 85 2 clang 0x00000001119180e2 llvm::sys::CleanupOnSignal(unsigned long) + 210 3 clang 0x0000000111849f6a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106 4 clang 0x0000000111849ee8 llvm::CrashRecoveryContext::HandleExit(int) + 24 5 clang 0x0000000111914acc llvm::sys::Process::Exit(int, bool) + 44 6 clang 0x000000010f4e9be9 LLVMErrorHandler(void, char const, bool) + 89 7 clang 0x0000000114eba333 llvm::report_fatal_error(llvm::Twine const&, bool) + 323 8 clang 0x0000000110d8c620 (anonymous namespace)::MachineVerifier::BBInfo::~BBInfo() + 0 9 clang 0x0000000110cdddca llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 378 10 clang 0x00000001110b0154 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092 11 clang 0x00000001110b6268 llvm::FPPassManager::runOnModule(llvm::Module&) + 72 12 clang 0x00000001110b074a llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986 13 clang 0x0000000111c20ad4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 3764 14 clang 0x0000000111f6dd31 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1905 15 clang 0x00000001131a28b3 clang::ParseAST(clang::Sema&, bool, bool) + 643 16 clang 0x00000001122b02a4 clang::FrontendAction::Execute() + 84 17 clang 0x000000011222d6a9 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 873 18 clang 0x000000011232faf5 clang::ExecuteCompilerInvocation(clang::CompilerInstance) + 661 19 clang 0x000000010f4e9860 cc1_main(llvm::ArrayRef<char const>, char const, void) + 2544 20 clang 0x000000010f4e7168 ExecuteCC1Tool(llvm::SmallVectorImpl<char const>&) + 312 21 clang 0x00000001120ab187 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const::$_1>(long) + 23 22 clang 0x0000000111849eb4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228 23 clang 0x00000001120aac24 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) const + 324 24 clang 0x000000011207b85d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const&) const + 221 25 clang 0x000000011207bdad clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) const + 125 26 clang 0x0000000112092f7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const> >&) + 204 27 clang 0x000000010f4e6977 main + 10375 28 libdyld.dylib 0x00007fff6be90cc9 start + 1 29 libdyld.dylib 0x0000000000000018 start + 18446603338705728336 clang-14: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 14.0.0 (https://github.com/llvm/llvm-project.git c90d136be4e055f1b409f38706d0fe3e2211af08) Target: arm64-apple-darwin19.5.0 Thread model: posix InstalledDir: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin clang-14: note: diagnostic msg: *******************	2022-01-18 13:17:02 +00:00
Micah Weston	e6698f0992	[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt Fixes the build issue with D111034, whose goal was to optimize add/sub with long immediates. Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The change which fixed the build issue in D111034 was the use of new virtual registers so that SSA form is maintained until deleting MI. Differential Revision: https://reviews.llvm.org/D117429	2022-01-17 17:17:15 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit 3de3ca3137bec5115cd10c53f4059f9bf1054e96.	2021-11-03 14:15:21 +08:00
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Ben Shi	d0dbc991c0	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit 9bf6bef9951a1c230796ccad2c5c0195ce4c4dff.	2021-10-16 22:17:18 +00:00
Ben Shi	9bf6bef995	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-10-16 08:50:39 +00:00
David Green	adec922361	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830	2021-10-09 15:58:31 +01:00
David Green	92128b7801	[AArch64] Regenerate even more tests This updates a few more check lines, in some mte tests that were close to auto generated already and some CodeGenPrepare/consthoist tests where being able to see the entire code sequence is useful for determining whether code differences are improvements or not.	2021-10-06 14:32:01 +01:00
Andrew Wei	c9066c5d37	[CGP] Fix the crash for combining address mode when having cyclic dependency In the combination of addressing modes, when replacing the matched phi nodes, sometimes the phi node to be replaced has been modified. For example, there’s matcher set [A, B] and [C, A], which will have cyclic dependency: A is replaced by B and C will be replaced by A. Because we tried to match new phi node to another new phi node, we should ignore new phi nodes when mapping new phi node to old one. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D108635	2021-08-26 22:52:42 +08:00
Tiehu Zhang	9cfa9b44a5	[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262	2021-08-17 18:58:15 +08:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
David Green	3f74a68c35	[AArch64] Regenerate sink-free-instructions.ll. NFC	2021-08-10 13:25:42 +01:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00

1 2

75 Commits