216 Commits

Author SHA1 Message Date
Csanád Hajdú
72901fe19e
[AArch64] Fold UBFMXri to UBFMWri when it's an LSR or LSL alias (#106968)
Using the LSR or LSL aliases of UBFM can be faster on some CPUs, so it
is worth changing 64 bit UBFM instructions, that are equivalent to 32
bit LSR/LSL operations, to 32 bit variants.

This change folds the following patterns:
* If `Imms == 31` and `Immr <= Imms`:
   `UBFMXri %0, Immr, Imms`  ->  `UBFMWri %0.sub_32, Immr, Imms`
* If `Immr == Imms + 33`:
   `UBFMXri %0, Immr, Imms`  ->  `UBFMWri %0.sub_32, Immr - 32, Imms`
2024-09-17 11:21:23 +01:00
Momchil Velikov
b0ffaa7905
[AArch64] Prevent the AArch64LoadStoreOptimizer from reordering CFI instructions (#101317)
When AArch64LoadStoreOptimizer pass merges an SP update with a
load/store instruction and needs to adjust unwind information either:
* create the merged instruction at the location of the SP update
  (so no CFI  instructions are moved), or
* only move a CFI instruction if the move would not reorder it across
  other CFI  instructions

If neither of the above is possible, don't perform the optimisation.
2024-09-10 13:07:06 +01:00
zhongyunde 00443407
e5a5ac0c23 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2024-08-28 14:15:09 +08:00
zhongyunde 00443407
8067b88f83 [AArch64] Fix buildbot breakage of ubsan
Fix the ERROR: UndefinedBehaviorSanitizer, reproduced by
  BUILDBOT_REVISION=43ffe2eed llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh
It might be also related to #76202
2024-08-28 14:15:08 +08:00
Vitaly Buka
7f7f4feaf0
Revert "[AArch64] Optimize when storing symmetry constants" (#105474)
Reverts llvm/llvm-project#93717

Introduce stack use after return
https://lab.llvm.org/buildbot/#/builders/24/builds/1003
2024-08-20 23:37:19 -07:00
hanbeom
ee572ed4ac
[AArch64] Optimize when storing symmetry constants (#93717)
This change looks for instructions of storing symmetric constants
instruction 32-bit units. usually consisting of several 'MOV' and
one or less 'ORR'.

If found, load only the lower 32-bit constant and change it to copy
and save to the upper 32-bit using the 'STP' instruction.

For example:
  renamable $x8 = MOVZXi 49370, 0
  renamable $x8 = MOVKXi $x8, 320, 16
  renamable $x8 = ORRXrs $x8, $x8, 32
  STRXui killed renamable $x8, killed renamable $x0, 0
becomes
  $w8 = MOVZWi 49370, 0
  $w8 = MOVKWi $w8, 320, 16
STPWi killed renamable $w8, killed renamable $w8, killed renamable $x0,
0


related issue : https://github.com/llvm/llvm-project/issues/51483
2024-08-20 14:29:26 +01:00
Thurston Dang
324b676a3d Revert "[AArch64] Fold more load.x into load.i with large offset"
This reverts commit 43ffe2eed0d9f73789dbe213023733d164999306.

Reason: buildbot breakage starting at https://lab.llvm.org/buildbot/#/builders/85/builds/1102

I manually bisected and found that clang crashed with 43ffe2eed0d9f73789dbe213023733d164999306 but not the immediately preceding commit (33190490c667aaf8b08d5af8b8ce84524f856e80)
2024-08-16 22:32:12 +00:00
zhongyunde 00443407
43ffe2eed0 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2024-08-15 18:22:52 +08:00
zhongyunde 00443407
33190490c6 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917

Note: This PR is try relanding the commit 32878c2065 with fix crash for PR79756
  this crash is exposes when there is MOVKWi instruction in the head of a block,
but without MOVZWi
2024-08-15 18:22:52 +08:00
Momchil Velikov
461126c29c
[AArch64] Fix incorrectly getting the destination reg of an insn (#101205)
This popped up while investigating
https://github.com/llvm/llvm-project/issues/96950
In a few places where we need the destination reg of an instruction we
were using a call that worked only by accident.
2024-08-02 15:43:28 +01:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Yuta Mukai
ea23761429
[AArch64] Verify ldp/stp alignment stricter (#84124)
When ldp-aligned-only/stp-aligned-only is specified, modified to cancel
ldp/stp transformation if MachineMemOperand is not present or the access
size is unknown.
In the previous implementation, the test passed when there was no
MachineMemOperand. Also, if the size was unknown, an incorrect value was
used or an assertion failed. (But actually, if there is no
MachineMemOperand, it will be excluded from the target by
isCandidateToMergeOrPair() before reaching the part.)

A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified
so that it only counts if it is not canceled.
2024-03-06 20:19:56 +09:00
Florian Mayer
6f11c95d06
Revert "[AArch64] Verify ldp/stp alignment stricter" (#84096)
Reverts llvm/llvm-project#83948

This broke the ASan buildbot:
https://lab.llvm.org/buildbot/#/builders/168/builds/19054/steps/10/logs/stdio
2024-03-05 15:52:09 -08:00
Yuta Mukai
6b5888c27f
[AArch64] Verify ldp/stp alignment stricter (#83948)
When ldp-aligned-only/stp-aligned-only is specified, modified to cancel
ldp/stp transformation if MachineMemOperand is not present or the access
size is unknown.
In the previous implementation, the test passed when there was no
MachineMemOperand. Also, if the size was unknown, an incorrect value was
used or an assertion failed. (But actually, if there is no
MachineMemOperand, it will be excluded from the target by
isCandidateToMergeOrPair() before reaching the part.)

A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified
so that it only counts if it is not cancelled.
2024-03-06 01:47:28 +09:00
David Green
915c3d9e5a Revert "[AArch64] merge index address with large offset into base address"
This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.
2024-01-28 17:01:21 +00:00
Sjoerd Meijer
e034f209f5
[AArch64LoadStoreOptimizer] Debug messages to track decision making. NFC (#77593)
With these debug message it's possible to see why some pairs get
rejected for combining.
2024-01-11 09:26:48 +00:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
zhongyunde 00443407
f568763641 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2023-12-21 18:54:15 +08:00
zhongyunde 00443407
32878c2065 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
2023-12-21 18:54:14 +08:00
David Green
b6ee831b59 [AArch64] Load/store optimizer fixes and cleanup.
This includes a couple of fixes after #71908 for bundles and some cleanup for
the debug output. One was an iterator type that asserted on bundles, the second
a rather subtle issue where forAllMIsUntilDef would hit the LdStLimit when
renaming registers, meaning the last instruction was not updated leaving an
invalid `ldp x6, x6` instruction.
2023-11-29 07:41:15 +00:00
Zhaoxuan Jiang
147c5d6686
[AArch64] Allow LDR merge with same destination register by renaming (#71908)
The patch is based on a reverted patch:
https://reviews.llvm.org/D103597. It was trying to rename registers
before alias check, which is not safe and causes miscompiles. This patch
does 2 things:

1. Do the renaming with necessary checks passed, including alias check.
2. Rename the register for the instructions between the pairs and
combine the second load into the first. By doing so we can just check
the renamability between the pairs and avoid scanning unknown amount of
instructions before/after the pairs.

Necessary refactoring has been made in order to reuse as much code
possible with STR renaming.
2023-11-23 08:21:27 +00:00
Kazu Hirata
8842d59c9f [llvm] Stop including llvm/ADT/BitVector.h (NFC)
Identified with clangd.
2023-11-11 13:24:01 -08:00
Zhaoxuan Jiang
1f54ef78d5
[AArch64] Only clear kill flags if necessary when merging str (#69680)
Previously the kill flags of the source register were unconditionally
cleared when a `str` pair was merged, which results in suboptimal
register allocation and inhibits some renaming opportunities which may
allow further merging `str`.
2023-11-02 17:03:21 -07:00
Cullen Rhodes
54732a3e0b [AArch64] Use TargetRegisterClass::hasSubClassEq in tryToFindRegisterToRename
When renaming store operands for pairing in the load/store optimizer it
tries to find an available register from the minimal physical register
class of the original register. For each register it compares the
equality of minimal physical register class of all sub/super registers
with the minimal physical register class of the original register.

Simply checking for register class equality can break once additional
register classes are added, as was the case when adding:

    def foo : RegisterClass<"AArch64", [i32], 32, (sequence "W%u", 12, 15)>

which broke:

    CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
    CodeGen/AArch64/stp-opt-with-renaming.mir

Since the introduction of the register class above, the rename register
in test1 of the reserved regs test changed from x12 to x18. The reason
for this is the minimal physical register class of x12 (as well as
x13-x15) and its sub/super registers no longer matches that of x9
(GPR64noip_and_tcGPR64).

Rather than selecting a matching register based on a comparison of the minimal
physical register classes of the original and rename registers, this patch
selects based on `MachineInstr::getRegClassConstraint` for the original
register.

It's worth mentioning the parameter passing registers (r0-r7) could be now be
used as rename registers since the GPR32arg and GPR64arg register classes are
subclasses of the minimal physical register class for x8 for example. I'm not
entirely sure if we want to exclude those registers, if so maybe we could
explicitly exclude those register classes.

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88663
2023-10-30 08:47:39 +00:00
Zhuojia Shen
bcc5b48b0f Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre"
This reverts commit 0def4e6b0f638b97a73bd4674365961d8fabda28, applies a
quick fix that disallows merging two pre-indexed loads, and adds MIR
regression tests.

Differential Revision: https://reviews.llvm.org/D152407
2023-09-22 21:08:07 -07:00
Manos Anagnostakis
008f26b12e
[AArch64] New subtarget features to control ldp and stp formation (#66098)
On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared to
simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two
for controlling ldp and two for controlling stp, to cover the ampere1
and ampere1a alignment needs and to enable optional fine-grained control
over ldp and stp generation in general. The latter can be utilized by
another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the
benefit from the alignment, whereas for the rest of the cpus the default
behaviour of the compiler is maintained.
2023-09-14 16:58:39 +02:00
Alexander Kornienko
0def4e6b0f Revert "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre"
This reverts commit b0093e13fcfdd4eea5bbd7ae57d3d1b82f4135c3 due to a miscompile
under MSan. See https://reviews.llvm.org/D152407#4533478 for more details.

Reviewed By: asmok-g

Differential Revision: https://reviews.llvm.org/D156328
2023-07-26 16:22:24 +02:00
Zhuojia Shen
b0093e13fc [AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre
This patch optimizes a pair of LDRSWpre and LDRSWui (or LDURSWi)
instructions into a single LDPSWpre instruction.  This is a missing case
in D99272.

MIR test cases in D152564 are updated to verify the optimization.

Differential Revision: https://reviews.llvm.org/D152407
2023-07-18 09:46:47 -07:00
Zain Jaffal
0c93879d96 [AArch64] merge scaled and unscaled zero narrow stores.
This patch fixes a crash when a sclaed and unscaled zero stores are merged.

Differential Revision: https://reviews.llvm.org/D150963
2023-05-26 15:07:24 +01:00
Hsiangkai Wang
0847cc06a6 [NFC][AArch64] Use 'i' to encode the offset form of load/store.
STG, STZG, ST2G, STZ2G are the exceptions to append 'Offset' to name the
offset format of load/store instructions. All other load/store
instructions use 'i' as the appendix. If there is no special reason to
do so, we should make the naming consistent.

Differential Revision: https://reviews.llvm.org/D141819
2023-03-06 12:34:19 +00:00
Kazu Hirata
c08fad8193 [llvm] Remove redundant initialization of std::optional (NFC) 2022-12-20 15:53:38 -08:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Kazu Hirata
298cb551fb [AArch64] Use std::optional in AArch64LoadStoreOptimizer.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:08:30 -08:00
chenglin.bi
ec4db1d0dc [AAArch64][Windows] Fix the crash when running ninja check-asan
The crash comes from mismatch between load count in epilogue and seh instruction count.
Still because of the pass AArch64LoadStoreOpt. It remove some load in the epilogue but haven't remove the corresponding seh instruction.
This patch don't optimize the load in the epilogue to fix the issue.

Fix: #58516

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D136430
2022-10-21 22:11:54 +08:00
Eli Friedman
76ccd1db73 [AArch64] Don't form paired loads from epilogue operations on Windows
AArch64LoadStoreOptimizer has a bunch of different guards to avoid
corrupting Windows SEH prologues/epilogues, but apparently we missed the
case of merging two instructions where the first instruction isn't part
of the epilogue, but the second instruction is.

Fixes issue discovered at https://reviews.llvm.org/D130049#3704064

Differential Revision: https://reviews.llvm.org/D134992
2022-10-04 11:41:59 -07:00
Kazu Hirata
258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
zhongyunde
c42a225545 [MachineScheduler] Order more stores by ascending address
According D125377, we order STP Q's by ascending address. While on some
targets, paired 128 bit loads and stores are slow, so the STP will split
into STRQ and STUR, so I hope these stores will also be ordered.
Also add subtarget feature ascend-store-address to control the aggressive order.

Reviewed By: dmgreen, fhahn

Differential Revision: https://reviews.llvm.org/D126700
2022-06-13 17:33:50 +08:00
Zongwei Lan
ad73ce318e [Target] use getSubtarget<> instead of static_cast<>(getSubtarget())
Differential Revision: https://reviews.llvm.org/D125391
2022-05-26 11:22:41 -07:00
Momchil Velikov
e0ff354b83 [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer
[Re-commit after fixing a dereference of "end" iterator]

The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.

This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D114547
2022-04-18 12:09:44 +01:00
Momchil Velikov
62d4686be3 Revert "[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer"
This reverts commit ecbf32dd88fc91b4fe709dc14bb3493dda6e8854.

It's possible this patch is the reason for an asertion failure
`!NodePtr->isKnownSentinel()` in `AArch64LoadStoreOpt::mergeUpdateInsn`
(https://lab.llvm.org/buildbot/#/builders/185/builds/1555) reverting while I
investigate.
2022-04-14 09:33:40 +01:00
Momchil Velikov
ecbf32dd88 [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer
The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.

This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D114547
2022-04-13 17:04:53 +01:00
Florian Hahn
d2c8aa0bf4
[AArch64] Pass Reg instead of MI to tryToFindRenameRegister (NFC).
FirstMI is only used to get the load/store operand and the machine
function. Pass the MF and register explicitly, so the helper can be used
to find rename registers for other instructions in the future.
2022-03-01 14:02:02 +00:00
Florian Hahn
45c969defa
[AArch64] Remove unused argument from tryToFindRegisterToRename (NFC).
The MI argument is not used by the function. Remove it.
2022-03-01 12:47:37 +00:00
Huihui Zhang
1d74b53172 [AArch64][LoadStoreOptimizer] Ignore undef registers when checking rename register used between paired instructions.
The content of undef registers are not used in meaningful ways, when checking
if a rename register is used between paired instructions we should ignore
undef registers.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D119305
2022-02-10 10:21:37 -08:00
Jim Lin
d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Tim Northover
3d41ef68e7 AArch64: don't form indexed paired ops if base reg overlaps operands.
The registers involved might not be identical, but can still overlap (e.g.
"str w0, [x0, #4]!").
2021-08-20 11:39:38 +01:00
Martin Storsjö
1cb7849a55 Revert "[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier"
This reverts commit ea011ec5ed53599305de62ca5fcfd31f4b3448c3.

This still causes some miscompiles, I'll follow up in the phabricator
review with a sample of that issue (which is part of the sample of
the previous issue).
2021-06-23 09:54:16 +03:00
Meera Nakrani
ea011ec5ed [AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier
This is a recommit that fixes unwanted STP generation by checking that
the base register has not been modified or used elsewhere.

Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.

Differential Revision: https://reviews.llvm.org/D103597
2021-06-22 15:29:13 +00:00
Martin Storsjö
99653702fd Revert "[AArch64LoadStoreOptimizer] Generate more STPs by renaming registers earlier"
This reverts commit d96ea46629803641038ebe46d8cd512f8cf7e20f, as it
caused various misoptimizations, see https://reviews.llvm.org/D103597
for discussion on the issues.
2021-06-10 10:30:13 +03:00