75 Commits

Author SHA1 Message Date
Yingwei Zheng
38a44bdc93
[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572)
In commit
2b582440c1,
we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better
analysis/codegen (See also the discussion in
https://github.com/llvm/llvm-project/pull/76338).

This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass`
is not supported by the target, it will be expanded by TLI.

Fixes the regression introduced by
2b582440c1
and
https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.
2024-03-18 18:27:45 +08:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
zhongyunde 00443407
d6f4d5209f [CGP][AArch64] Rebase the common base offset for better ISel
When all the large const offsets masked with the same value from bit-12 to bit-23.
Fold
  add     x8, x0, #2031, lsl #12
  add     x8, x8, #960
  ldr     x9, [x8, x8]
  ldr     x8, [x8, #2056]

into
  add     x8, x0, #2031, lsl #12
  ldr     x9, [x8, #960]
  ldr     x8, [x8, #3016]
2023-12-05 09:01:41 +08:00
Paul Walker
6383785bad
[SVE][CodeGenPrepare] Sink address calculations that match SVE gather/scatter addressing modes. (#66996)
SVE supports scalar+vector and scalar+extw(vector) addressing modes.
However, the masked gather/scatter intrinsics take a vector of
addresses, which means address computations can be hoisted out of
loops.  The is especially true for things like offsets where the
true size of offsets is lost by the time you get to code generation.

This is problematic because it forces the code generator to legalise
towards `<vscale x 2 x ty>` vectors that will not maximise bandwidth
if the main block datatypes is in fact i32 or smaller.

This patch sinks GEPs and extends for cases where one of the above
addressing modes can be used.

NOTE: There are cases where it would be better to split the extend
in two with one half hoisted out of a loop and the other within the
loop.  Whilst true I think this switch of default is still better
than before because the extra extends are an improvement over being
forced to split a gather/scatter.
2023-10-11 13:20:08 +01:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Harvin Iriawan
db158c7c83 [AArch64] Update generic sched model to A510
Refresh of the generic scheduling model to use A510 instead of A55.
  Main benefits are to the little core, and introducing SVE scheduling information.
  Changes tested on various OoO cores, no performance degradation is seen.

  Differential Revision: https://reviews.llvm.org/D156799
2023-08-21 12:25:15 +01:00
Matthias Braun
02ba5b8c6b Ignore load/store until stack address computation
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.

We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.

Differential Revision: https://reviews.llvm.org/D152213
2023-06-26 13:50:36 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Fangrui Song
ca65969c3e [AArch64] Unconditionally use DW_EH_PE_indirect|DW_EH_PE_pcrel personality/lsda/ttype encodings
For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a
CIE needs an R_AARCH64_ABS64 relocation. In common configurations that
`__gcc_personality_v0` is defined in a shared object, this will lead to a
discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and
D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset:
https://github.com/llvm/llvm-project/issues/60392

Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed
since the initial port in 2012), let's follow suit by simplifying the code.
(
For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's
DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame
offset > 2GiB. This is unneeded for small code model since "Max text segment
size < 2GiB" but making `-fno-pic -mcmodel={tiny,small}` different seems
unnecessary: the scenarios that uses both -fno-pic and C++ exceptions have been
increasingly rare now, so there is little advantage optimizing for the little
size saving with code complexity.
)

---

Two clang/test/Interpreter tests would fail without 6747fc07d1aa94e22622e278e5a02ba70675ac9b
([ORC] Use JITLink as the default linker for LLJIT on Linux/arm64.)

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D143039
2023-02-05 10:46:43 -08:00
NAKAMURA Takumi
5b2549b0d2 Revert "[AArch64] Unconditionally use DW_EH_PE_indirect|DW_EH_PE_pcrel personality/lsda/ttype encodings"
It causes failurs in clang-interpreter.

This reverts commit 565a1fb1334b8cf510af1338cae3f50815a99f90, aka llvmorg-17-init-1048-g565a1fb1334b
2023-02-04 20:49:39 +09:00
Fangrui Song
565a1fb133 [AArch64] Unconditionally use DW_EH_PE_indirect|DW_EH_PE_pcrel personality/lsda/ttype encodings
For -fno-pic, without DW_EH_PE_indirect, the personality routine pointer in a
CIE needs an R_AARCH64_ABS64 relocation. In common configurations that
`__gcc_personality_v0` is defined in a shared object, this will lead to a
discouraged canonical PLT entry, or, if `ld.lld -z notext` (betwen D122459 and
D143136), a dynamic R_AARCH64_ABS64 relocation with an incorrect offset:
https://github.com/llvm/llvm-project/issues/60392

Since GCC uses DW_EH_PE_indirect for -fno-pic code (the behavior hasn't changed
since the initial port in 2012), let's follow suit by simplifying the code.
(
For tiny and small code models, we use DW_EH_PE_sdata8 instead of GCC's
DW_EH_PE_sdata4. This is a deliberate choice to support personality-.eh_frame
offset > 2GiB. This is necessary for small code model since "Max text segment
size < 2GiB" but it is unnecessary to make `-fno-pic -mcmodel={tiny,small}`
different: The scenarios that uses both -fno-pic and C++ exceptions have been
increasingly rare now, so there is little advantage optimizing for the little
size saving with code complexity.
)

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D143039
2023-02-03 10:41:04 -08:00
David Green
28de5f99bb [AArch64] Sink to umull if we know tops bits are zero.
This is an extension to the code for sinking splats to multiplies, where
if we can detect that the top bits are known-zero we can treat the
instruction like a zext. The existing code was also adjusted in the
process to make it more precise about only sinking if both operands are
zext or sext.

Differential Revision: https://reviews.llvm.org/D141275
2023-01-16 10:44:38 +00:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Nikita Popov
33a3139f80 [CGP] Avoid branch on poison UB in test (NFC) 2023-01-03 14:52:48 +01:00
Nikita Popov
02b02cd050 [CodeGenPrepare] Avoid branch on undef UB in tests (NFC) 2023-01-03 13:51:00 +01:00
Matt Arsenault
d9e51e7552 CodeGenPrepare: Convert most tests to opaque pointers
NVPTX/dont-introduce-addrspacecast.ll required manually removing a check for
a bitcast.

AArch64/combine-address-mode.ll required rerunning update_test_checks

Mips required some manual updates due to a CHECK-NEXT coming after a
deleted bitcast.

ARM/sink-addrmode.ll needed one small manual fix.

Excludes one X86 function which needs more attention.
2022-11-28 09:21:59 -05:00
Matt Arsenault
e446d1d93f CodeGenPrepare: Don't use anonymous values some in tests
These are always an obstacle to test updates, and often break after
running opaquify scripts on them.
2022-11-27 10:30:37 -05:00
Florian Hahn
b2c195da6d
[CGP] Update failing test missed in 81a11da762577. 2022-09-15 19:35:25 +01:00
Florian Hahn
81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Florian Hahn
786c687810
[AArch64] Add support for FMA intrinsics to shouldSinkOperands.
If the fma operates on a legal vector type, the indexed variants can be
used, if the second operand is a splat of a valid index.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D126234
2022-05-27 10:37:03 +01:00
Florian Hahn
a9a012086a
[AArch64] Add additional tests for sinking free shuffles for FMAs. 2022-05-26 10:35:38 +01:00
Florian Hahn
8661725686
[AArch64] Add tests with free shuffles for indexed fma variants.
The new tests contain examples where shuffles are free, because indexed
fma instructions can be used.
2022-05-23 20:27:42 +01:00
Momchil Velikov
d0ea42a7c1 [AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330
2022-04-12 16:50:50 +01:00
Momchil Velikov
50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
Hans Wennborg
85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
2022-03-04 17:36:26 +01:00
Momchil Velikov
63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov
74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Momchil Velikov
32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Florian Hahn
166968a892
[AArch64] Add test cases where zext can be lowered to series of tbl.
Add a set of tests for upcoming patches that allow lowering vector zext
using AArch64 tbl instructions instead of shifts.
2022-02-25 15:36:32 +00:00
Sunho Kim
44601f4956 [AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull
This teaches AArch64TargetLowering::shouldSinkOperands to sink the
operands of aarch64_neon_pmull intrinsic.

Differential Revision: https://reviews.llvm.org/D117944
2022-02-03 16:46:49 +00:00
Micah Weston
93deac2e2b [AArch64] Optimize add/sub with immediate through MIPeepholeOpt
Fixes the build issue with D111034, whose goal was to optimize
add/sub with long immediates.

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The change which fixed the build issue in D111034 was the use of new virtual
registers so that SSA form is maintained until deleting MI.

Differential Revision: https://reviews.llvm.org/D117429
2022-01-22 12:39:22 +00:00
Florian Hahn
62476c7c14
Revert "[AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt"
This reverts commit e6698f09929a134bf0f46d9347142b86d8f636a2.

This commit appears to introduce new machine verifier failures when
building the llvm-test-suite with `-mllvm -verify-machineinstrs` enabled:

https://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O3/11061/

FAILED: MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o
/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite-build/tools/timeit --summary MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG  -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin    -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk   -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o   -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c
*** Bad machine code: Illegal virtual register for instruction ***
- function:    alloc_tree
- basic block: %bb.1 if.else (0x7fc0db8f8bb0)
- instruction: %31:gpr64 = nsw MADDXrrr killed %39:gpr64sp, killed %25:gpr64, $xzr
- operand 1:   killed %39:gpr64sp
Expected a GPR64 register, but got a GPR64sp register
fatal error: error in backend: Found 1 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin/clang -DNDEBUG -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk -w -Werror=date-time -DTORONTO -MD -MT MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -MF MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o.d -o MultiSource/Benchmarks/Olden/health/CMakeFiles/health.dir/health.c.o -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/test-suite/MultiSource/Benchmarks/Olden/health/health.c'.
4.	Running pass 'Verify generated machine code' on function '@alloc_tree'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang         0x000000011191896b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43
1  clang         0x00000001119179b5 llvm::sys::RunSignalHandlers() + 85
2  clang         0x00000001119180e2 llvm::sys::CleanupOnSignal(unsigned long) + 210
3  clang         0x0000000111849f6a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang         0x0000000111849ee8 llvm::CrashRecoveryContext::HandleExit(int) + 24
5  clang         0x0000000111914acc llvm::sys::Process::Exit(int, bool) + 44
6  clang         0x000000010f4e9be9 LLVMErrorHandler(void*, char const*, bool) + 89
7  clang         0x0000000114eba333 llvm::report_fatal_error(llvm::Twine const&, bool) + 323
8  clang         0x0000000110d8c620 (anonymous namespace)::MachineVerifier::BBInfo::~BBInfo() + 0
9  clang         0x0000000110cdddca llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 378
10 clang         0x00000001110b0154 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092
11 clang         0x00000001110b6268 llvm::FPPassManager::runOnModule(llvm::Module&) + 72
12 clang         0x00000001110b074a llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986
13 clang         0x0000000111c20ad4 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 3764
14 clang         0x0000000111f6dd31 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1905
15 clang         0x00000001131a28b3 clang::ParseAST(clang::Sema&, bool, bool) + 643
16 clang         0x00000001122b02a4 clang::FrontendAction::Execute() + 84
17 clang         0x000000011222d6a9 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 873
18 clang         0x000000011232faf5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 661
19 clang         0x000000010f4e9860 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 2544
20 clang         0x000000010f4e7168 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) + 312
21 clang         0x00000001120ab187 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) + 23
22 clang         0x0000000111849eb4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228
23 clang         0x00000001120aac24 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const + 324
24 clang         0x000000011207b85d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const + 221
25 clang         0x000000011207bdad clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const + 125
26 clang         0x0000000112092f7c clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) + 204
27 clang         0x000000010f4e6977 main + 10375
28 libdyld.dylib 0x00007fff6be90cc9 start + 1
29 libdyld.dylib 0x0000000000000018 start + 18446603338705728336
clang-14: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 14.0.0 (https://github.com/llvm/llvm-project.git c90d136be4e055f1b409f38706d0fe3e2211af08)
Target: arm64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-aarch64-O3/compiler/bin
clang-14: note: diagnostic msg:
********************
2022-01-18 13:17:02 +00:00
Micah Weston
e6698f0992 [AArch64] Revive optimize add/sub with immediate through MIPeepholeOpt
Fixes the build issue with D111034, whose goal was to optimize
add/sub with long immediates.

Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The change which fixed the build issue in D111034 was the use of new virtual
registers so that SSA form is maintained until deleting MI.

Differential Revision: https://reviews.llvm.org/D117429
2022-01-17 17:17:15 +00:00
David Green
760d4d03d5 [AArch64] Sink splat shuffles to lane index intrinsics
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.

Differential Revision: https://reviews.llvm.org/D112994
2021-11-22 08:11:35 +00:00
Ben Shi
59c3b48d99 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 3de3ca3137bec5115cd10c53f4059f9bf1054e96.
2021-11-03 14:15:21 +08:00
Ben Shi
3de3ca3137 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-11-03 03:06:43 +00:00
Ben Shi
d0dbc991c0 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 9bf6bef9951a1c230796ccad2c5c0195ce4c4dff.
2021-10-16 22:17:18 +00:00
Ben Shi
9bf6bef995 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-10-16 08:50:39 +00:00
David Green
adec922361 [AArch64] Make -mcpu=generic schedule for an in-order core
We would like to start pushing -mcpu=generic towards enabling the set of
features that improves performance for some CPUs, without hurting any
others. A blend of the performance options hopefully beneficial to all
CPUs. The largest part of that is enabling in-order scheduling using the
Cortex-A55 schedule model. This is similar to the Arm backend change
from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling
using the cortex-a8 schedule model.

The idea is that in-order cpu's require the most help in instruction
scheduling, whereas out-of-order cpus can for the most part out-of-order
schedule around different codegen. Our benchmarking suggests that
hypothesis holds. When running on an in-order core this improved
performance by 3.8% geomean on a set of DSP workloads, 2% geomean on
some other embedded benchmark and between 1% and 1.8% on a set of
singlecore and multicore workloads, all running on a Cortex-A55 cluster.

On an out-of-order cpu the results are a lot more noisy but show flat
performance or an improvement. On the set of DSP and embedded
benchmarks, run on a Cortex-A78 there was a very noisy 1% speed
improvement. Using the most detailed results I could find, SPEC2006 runs
on a Neoverse N1 show a small increase in instruction count (+0.127%),
but a decrease in cycle counts (-0.155%, on average). The instruction
count is very low noise, the cycle count is more noisy with a 0.15%
decrease not being significant. SPEC2k17 shows a small decrease (-0.2%)
in instruction count leading to a -0.296% decrease in cycle count. These
results are within noise margins but tend to show a small improvement in
general.

When specifying an Apple target, clang will set "-target-cpu apple-a7"
on the command line, so should not be affected by this change when
running from clang. This also doesn't enable more runtime unrolling like
-mcpu=cortex-a55 does, only changing the schedule used.

A lot of existing tests have updated. This is a summary of the important
differences:
 - Most changes are the same instructions in a different order.
 - Sometimes this leads to very minor inefficiencies, such as requiring
   an extra mov to move variables into r0/v0 for the return value of a test
   function.
 - misched-fusion.ll was no longer fusing the pairs of instructions it
   should, as per D110561. I've changed the schedule used in the test
   for now.
 - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to
   the different latencies. This seems fine to me.
 - Some SVE tests do not always remove movprfx where they did before due
   to different register allocation giving different destructive forms.
 - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll
   produce two LDR where they previously produced an LDP due to
   store-pair-suppress kicking in.
 - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD.
 - Some tests such as arm64-neon-mul-div.ll and
   ragreedy-local-interval-cost.ll have more, less or just different
   spilling.
 - In aarch64_generated_funcs.ll.generated.expected one part of the
   function is no longer outlined. Interestingly if I switch this to use
   any other scheduled even less is outlined.

Some of these are expected to happen, such as differences in outlining
or register spilling. There will be places where these result in worse
codegen, places where they are better, with the SPEC instruction counts
suggesting it is not a decrease overall, on average.

Differential Revision: https://reviews.llvm.org/D110830
2021-10-09 15:58:31 +01:00
David Green
92128b7801 [AArch64] Regenerate even more tests
This updates a few more check lines, in some mte tests that were close
to auto generated already and some CodeGenPrepare/consthoist tests where
being able to see the entire code sequence is useful for determining
whether code differences are improvements or not.
2021-10-06 14:32:01 +01:00
Andrew Wei
c9066c5d37 [CGP] Fix the crash for combining address mode when having cyclic dependency
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D108635
2021-08-26 22:52:42 +08:00
Tiehu Zhang
9cfa9b44a5 [CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107262
2021-08-17 18:58:15 +08:00
David Green
013030a0b2 [AArch64] Correct sinking of shuffles to adds/subs
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.

Differential Revision: https://reviews.llvm.org/D107623
2021-08-10 13:25:42 +01:00
David Green
3f74a68c35 [AArch64] Regenerate sink-free-instructions.ll. NFC 2021-08-10 13:25:42 +01:00
David Sherwood
8439415333 [IR] Let ConstantVector::getSplat use poison instead of undef
This patch updates ConstantVector::getSplat to use poison instead
of undef when using insertelement/shufflevector to splat.

This follows on from D93793.

Differential Revision: https://reviews.llvm.org/D107751
2021-08-10 08:27:43 +01:00