636 Commits

Author SHA1 Message Date
Maurice Heumann
607c525110
[ARM64] [Windows] Mark block address as taken when expanding catchrets (#109252)
This fixes issue #109250

The issue happens during the `MachineBlockPlacement` pass. The block,
whose address was previously not taken, is deemed redundant by the pass
and subsequently replaced using
`MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`.

ReplaceUsesOfBlockWith only replaces uses in the terminator. However,
`expandPostRAPseudo` introduces new block uses when expanding catchrets.
These uses do not get replaced, which results in undefined label errors
later on.

Marking the block addresss as taken prevents the replacement of the
block, without also replacing non-terminator uses.
2024-09-30 11:14:38 -07:00
Sander de Smalen
91a3c6f3d6
[AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396)
This removes a redundant 'COPY' instruction that #81716 probably forgot
to remove.

This redundant COPY led to an issue because because code in
LiveRangeSplitting expects that the instruction emitted by
`loadRegFromStackSlot` is an instruction that accesses memory, which
isn't the case for the COPY instruction.
2024-09-05 17:54:57 +01:00
Kyungwoo Lee
140381d4bf
[MachineOutliner][NFC] Remove unnecessary RepeatedSequenceLocs.clear() (#106171)
- When `getOutliningCandidateInfo()` returns `std::nullopt` (meaning no
`OutlinedFunction` is created), there is no need to clear the input
argument, `RepeatedSequenceLocs`, as it's already being cleared in the
main loop of `findCandidates()`.
- Replaced `2` by `MinRepeats`, which I missed from
https://github.com/llvm/llvm-project/pull/105398
2024-08-28 07:09:54 -07:00
zhongyunde 00443407
e5a5ac0c23 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2024-08-28 14:15:09 +08:00
Kyungwoo Lee
93b8d07a75
[MachineOutliner][NFC] Refactor (#105398)
This patch prepares the NFC groundwork for global outlining using
CGData, which will follow
https://github.com/llvm/llvm-project/pull/90074.

- The `MinRepeats` parameter is now explicitly passed to the
`getOutliningCandidateInfo` function, rather than relying on a default
value of 2. For local outlining, the minimum number of repetitions is
typically 2, but for the global outlining (mentioned above), we will
optimistically create a single `Candidate` for each `OutlinedFunction`
if stable hashes match a specific code sequence. This parameter is
adjusted accordingly in global outlining scenarios.
- I have also implemented `unique_ptr` for `OutlinedFunction` to ensure
safe and efficient memory management within `FunctionList`, avoiding
unnecessary implicit copies.

This depends on https://github.com/llvm/llvm-project/pull/101461.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-08-27 14:38:36 -07:00
Piyou Chen
b01c006f73
[TII][RISCV] Add renamable bit to copyPhysReg (#91179)
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
2024-08-27 10:08:43 +08:00
Craig Topper
7e6b1504c7 [AArch64] Pass DebugLoc by reference to AArch64InstrInfo::copyGPRRegTuple. NFC 2024-08-25 22:20:58 -07:00
Craig Topper
b12d338c17 [AArch64] Use MCRegister in AArch64InstrInfo::copyGPRRegTuple interface. NFC
This matches copyPhysReg.
2024-08-25 22:11:31 -07:00
Thurston Dang
324b676a3d Revert "[AArch64] Fold more load.x into load.i with large offset"
This reverts commit 43ffe2eed0d9f73789dbe213023733d164999306.

Reason: buildbot breakage starting at https://lab.llvm.org/buildbot/#/builders/85/builds/1102

I manually bisected and found that clang crashed with 43ffe2eed0d9f73789dbe213023733d164999306 but not the immediately preceding commit (33190490c667aaf8b08d5af8b8ce84524f856e80)
2024-08-16 22:32:12 +00:00
Kazu Hirata
dca820951c
[llvm] Use llvm::any_of (NFC) (#104443) 2024-08-15 17:59:10 -07:00
zhongyunde 00443407
43ffe2eed0 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2024-08-15 18:22:52 +08:00
zhongyunde 00443407
33190490c6 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917

Note: This PR is try relanding the commit 32878c2065 with fix crash for PR79756
  this crash is exposes when there is MOVKWi instruction in the head of a block,
but without MOVZWi
2024-08-15 18:22:52 +08:00
David Green
36231a5b55
[AArch64] Add verification for MemOp immediate ranges (#97561)
This adds an implementation of AArch64InstrInfo::verifyInstruction for
AArch64, and adds some basic verification of the range of immediate
ranges of memory operations using the information from getMemOpInfo.

Some extra memory operations have been added to getMemOpInfo, along with
the equivalent opcodes to getLoadStoreImmIdx to ensure we use the
correct index.

Please let us know if this starts reporting verification failures, Thanks.
2024-08-15 11:20:20 +01:00
David Green
a3cf8642bf
[AArch64] Cleanup existing values in getMemOpInfo (#98196)
This patch tries to clean up some of the existing values in
getMemOpInfo. All values should now be in bytes (not bits), and the
MinOffset/MaxOffset are now always represented unscaled (the immediate
that will be present in the final instruction).

Although I could not find a place where it altered codegen, the offset
of a post-index instruction will be 0, not scale*imm. A
IsPostIndexLdStOpcode method has been added to try and make sure that
case is handled properly.
2024-08-03 12:31:10 +01:00
Momchil Velikov
461126c29c
[AArch64] Fix incorrectly getting the destination reg of an insn (#101205)
This popped up while investigating
https://github.com/llvm/llvm-project/issues/96950
In a few places where we need the destination reg of an instruction we
were using a call that worked only by accident.
2024-08-02 15:43:28 +01:00
Daniil Kovalev
56fd2472d8
[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552)
For pauthtest ABI, there is a bunch of ptrauth-* options, including
ptrauth-returns. Use "ptrauth-returns" function attribute to indicate
need for LR signing with B key for non-leaf function to avoid using
"sign-return-address" and "sign-return-address-key" which were
originally designed for pac-ret.

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
2024-07-25 22:21:03 +03:00
Matt Arsenault
3cb5604d2c
MachineOutliner: Use PM to query MachineModuleInfo (#99688)
Avoid getting this from the MachineFunction
2024-07-24 13:22:56 +04:00
David Green
0d7403184d [AArch64] Add a AArch64InstrInfo::isFpOrNEON method for checking physical register call. NFC 2024-07-15 08:13:52 +01:00
David Green
d3cb277ea3 [AArch64] Rearrange Opcodes in getMemOpInfo. NFC
This just changes the order of the opcodes and fields in getMemOpInfo, none of
the values are altered.
2024-07-08 23:05:50 +01:00
Nikita Popov
4169338e75
[IR] Don't include Module.h in Analysis.h (NFC) (#97023)
Replace it with a forward declaration instead. Analysis.h is pulled in
by all passes, but not all passes need to access the module.
2024-06-28 14:30:47 +02:00
Sander de Smalen
c436649313
[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543)
I've not added any new tests for these, because the original conditions
were wrong (they did not consider streaming mode) and we have tests for
the positive cases.
2024-06-25 13:27:06 +01:00
Sander de Smalen
62baf21daa
[AArch64] Check for streaming mode in HasSME* features. (#96302)
This also fixes up some asserts in copyPhysReg, loadRegFromStackSlot and
storeRegToStackSlot.
2024-06-24 20:12:31 +01:00
Momchil Velikov
6ec02f7316
[AArch64] Refactor redundant PTEST optimisations (NFC) (#87802)
This patch refactors `AArch64InstrInfo::optimizePTestInstr` to simplify
the convoluted conditions and control flow
and make it easier to add the optimisation in
https://github.com/llvm/llvm-project/pull/81141
2024-06-18 08:00:59 +01:00
Nikita Popov
db08b0999d
[ARM][AArch64] Bail out if CandidatesWithoutStackFixups is empty (#95410)
The following code assumes that RepeatedSequenceLocs is non-empty. Bail
out if there are less than 2 candidates left, as no outlining is
possible in that case. The same check is already present in all the
other places where elements from RepeatedSequenceLocs may be dropped.

This fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716
2024-06-14 09:29:21 +02:00
Kerry McLaughlin
ea6577a74b
[AArch64][SME] Disable outlining for functions with streaming-mode changes (#95132) 2024-06-12 10:35:29 +01:00
Yuta Mukai
0c5319e546
[ModuloSchedule][AArch64] Implement modulo variable expansion for pipelining (#65609)
Modulo variable expansion is a technique that resolves overlap of
variable lifetimes by unrolling. The existing implementation solves it
by making a copy by move instruction for processors with ordinary
registers such as Arm and x86. This method may result in a very large
number of move instructions, which can cause performance problems.

Modulo variable expansion is enabled by specifying -pipeliner-mve-cg. A
backend must implement some newly defined interfaces in
PipelinerLoopInfo. They were implemented for AArch64.

Discourse thread:
https://discourse.llvm.org/t/implementing-modulo-variable-expansion-for-machinepipeliner
2024-06-12 10:27:35 +09:00
Nikita Popov
1c9f4d4b6f
[ARM] Avoid reference into modified vector (#93965)
FirstCand is a reference to RepeatedSequenceLocs[0]. However, that
vector is being modified a lot throughout the function, including one
place that reassigns the whole vector. I'm not sure whether this can
really happen in practice, but it doesn't seem unlikely that this could
lead to a use-after-free.

Avoid this by directly using RepeatedSequenceLocs[0] at the start of the
function (as a lot of other places already do) and only creating
FirstCand at the end where no more modifications take place.
2024-06-03 17:10:35 +02:00
Sander de Smalen
b71434f8b3
[AArch64] Avoid NEON ORR when NEON and SVE are unavailable (#93940)
For streaming-compatible functions with only +sme, we can't use
a NEON ORR (aliased as 'mov') for copies of Q-registers, so
we need to use a spill/fill instead.

This also fixes the fill, which should use the post-incrementing
addressing mode.
2024-06-03 09:22:21 +01:00
Ahmed Bougacha
cc548ec47c
[AArch64][PAC] Lower authenticated calls with ptrauth bundles. (#85736)
This adds codegen support for the "ptrauth" operand bundles, which can
be used to augment indirect calls with the equivalent of an
`@llvm.ptrauth.auth` intrinsic call on the call target (possibly
preceded by an `@llvm.ptrauth.blend` on the auth discriminator if
applicable.)

This allows the generation of combined authenticating calls
on AArch64 (in the BLRA* PAuth instructions), while avoiding
the raw just-authenticated function pointer from being
exposed to attackers.

This is done by threading a PtrAuthInfo descriptor through
the call lowering infrastructure, eventually selecting a BLRA
pseudo.  The pseudo encapsulates the safe discriminator
computation, which together with the real BLRA* call get emitted
in late pseudo expansion in AsmPrinter.

Note that this also applies to the other forms of indirect calls,
notably invokes, rvmarker, and tail calls.  Tail-calls in particular
bring some additional complexity, with the intersecting register
constraints of BTI and PAC discriminator computation.
However this doesn't currently support PAuth_LR tail-call variants.

This also adopts an x8+ allocation order for GPR64noip, matching
GPR64.
2024-05-31 14:08:10 -07:00
Paul Walker
37c6b9ff72 [NFC][LLVM] Mainly whitespace changes.
Also marks AliasSetTracker::size() as const.
2024-05-17 10:37:26 +00:00
Antonio Frighetto
23b6709c72 [AArch64] Drop poison-generating flags in genSubAdd2SubSub combiner
A miscompilation issue has been addressed with improved handling.

Fixes: https://github.com/llvm/llvm-project/issues/88950.
2024-04-26 11:33:56 +02:00
Xu Zhang
f6d431f208
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and  `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI  parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`,  `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-04-24 14:24:14 +01:00
Kai Nacke
21d177096f
[NFC] Refactor looping over recomputeLiveIns into function (#88040)
https://github.com/llvm/llvm-project/pull/79940 put calls to
recomputeLiveIns into
a loop, to repeatedly call the function until the computation converges.
However,
this repeats a lot of code. This changes moves the loop into a function
to simplify
the handling.

Note that this changes the order in which recomputeLiveIns is called.
For example,

```
  bool anyChange = false;
  do {
    anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB);
  } while (anyChange);
```

only begins to recompute the live-ins for LoopMBB after the computation
for ExitMBB
has converged. With this change, all basic blocks have a recomputation
of the live-ins
for each loop iteration. This can result in less or more calls,
depending on the
situation.
2024-04-15 17:12:25 -04:00
Pengcheng Wang
b564036933
[MachineCombiner][NFC] Split target-dependent patterns
We split target-dependent MachineCombiner patterns into their target
folder.

This makes MachineCombiner much more target-independent.

Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc

Reviewed By: topperc, mshockwave

Pull Request: https://github.com/llvm/llvm-project/pull/87991
2024-04-11 12:20:27 +08:00
Sam Tebbs
fb8dbd1fb6
[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716)
7dc20ab introduced an extra COPY when spilling and filling a PNR
register, which can't be elided as the input (PNR predicate) and output
(PPR predicate) register classes differ. The patch adds a new register
class that covers both PPR and PNR so that STR_PXI and LDR_PXI can
take either of them, removing the need for the copy.
2024-04-09 16:17:27 +01:00
Eli Friedman
c83f23d6ab
[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.

On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).

To reflect this, this patch:

- Enables aggressive folding of shifts into loads by default.

- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).

- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.

I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
2024-04-04 11:25:44 -07:00
Harvin Iriawan
57146daeaa
[CodeGen] Update for scalable MemoryType in MMO (#70452)
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
2024-03-23 12:56:25 +00:00
zhongyunde 00443407
a110a1c0ed [AArch64] MachineCombiner msub matching for i64 2024-03-08 18:14:26 +08:00
zhongyunde 00443407
3a62edcf52 [AArch64] MachineCombiner msub matching
Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix https://github.com/llvm/llvm-project/issues/84152
2024-03-08 18:14:25 +08:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Sander de Smalen
5bd01ac822
[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235)
We can add implicit defs/uses of the 'VG' register to the instructions
to prevent the register allocator from rematerializing values in between
streaming-mode changes, as the def/use of VG will further nail down the
ordering that comes out of ISel. This avoids the heavy-handed approach
to prevent any kind of rematerialization.

While we could add 'VG' as a Use to all SVE instructions, we only really
need to do this for instructions that are rematerializable, as the
smstart/smstop instructions and pseudos act as scheduling barriers which
is sufficient to prevent other instructions from being scheduled in
between the streaming-mode-changing call sequence. However, we may
revisit this in the future.
2024-02-29 15:35:46 +00:00
ostannard
5452cbc4a6
[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020)
When using -mbranch-protection=pac-ret+pc, x16 is used in the function
epilogue to hold the address of the signing instruction. This is used by
a HINT instruction which can only use x16, so we can't change this. This
means that we can't use it to hold the function pointer for an indirect
tail-call.

There is existing code to force indirect tail-calls to use x16 or x17
when BTI is enabled, so there are now 4 combinations:

bti  pac-ret+pc  Valid function pointer registers
off  off         Any non callee-saved register
on   off         x16 or x17
off  on          Any non callee-saved register except x16
on   on          x17
2024-02-08 15:31:54 +00:00
Sjoerd Meijer
35904ec4e1
[AArch64] MI Scheduler STP combine (#80188)
Add opcodes for different store instructions to the target hook that can
enable more STP pairs. This is split off from the patch that does the
same for some load instructions (#79003).

Patch co-authored by Cameron McInally.
2024-02-06 10:29:42 +00:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Yuta Mukai
70eab122bc
[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589)
Add AArch64 implementations for the interfaces of MachinePipeliner pass.
The pass is disabled by default for AArch64. It is enabled by specifying
--aarch64-enable-pipeliner.

5 tests in llvm-test-suites show performance improvement by more than 5%
on a Neoverse V1 processor.

| test | improvement |
| ---------------------------------------------------------------- |
-----------:|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test | 16%
|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test | 16%
|
| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test | 14% |
| SingleSource/Benchmarks/Misc/flops-5.test | 13% |
| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test | 6% |

(base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm
-aarch64-enable-pipeliner -mllvm
-pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm
-pipeliner-enable-copytophi=0)

On the other hand, there are cases of significant performance
degradation. Algorithm improvements and adding the option/pragma will be
needed in the future.
2024-02-02 10:33:44 +09:00
Sjoerd Meijer
8841846050
[AArch64] MI Scheduler LDP combine follow up (#79003)
This is a follow up of 75d820dcdd86, adding more opcodes to the combine
target hook enabling more LDP creation.

Patch co-authored by Cameron McInally.
2024-01-31 15:41:32 +00:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
David Green
915c3d9e5a Revert "[AArch64] merge index address with large offset into base address"
This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.
2024-01-28 17:01:21 +00:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00