602 Commits

Author SHA1 Message Date
Sam Tebbs
fb8dbd1fb6
[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716)
7dc20ab introduced an extra COPY when spilling and filling a PNR
register, which can't be elided as the input (PNR predicate) and output
(PPR predicate) register classes differ. The patch adds a new register
class that covers both PPR and PNR so that STR_PXI and LDR_PXI can
take either of them, removing the need for the copy.
2024-04-09 16:17:27 +01:00
Eli Friedman
c83f23d6ab
[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.

On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).

To reflect this, this patch:

- Enables aggressive folding of shifts into loads by default.

- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).

- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.

I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
2024-04-04 11:25:44 -07:00
Harvin Iriawan
57146daeaa
[CodeGen] Update for scalable MemoryType in MMO (#70452)
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
2024-03-23 12:56:25 +00:00
zhongyunde 00443407
a110a1c0ed [AArch64] MachineCombiner msub matching for i64 2024-03-08 18:14:26 +08:00
zhongyunde 00443407
3a62edcf52 [AArch64] MachineCombiner msub matching
Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix https://github.com/llvm/llvm-project/issues/84152
2024-03-08 18:14:25 +08:00
David Green
44be5a7fdc
[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
2024-03-06 17:40:13 +00:00
Sander de Smalen
5bd01ac822
[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235)
We can add implicit defs/uses of the 'VG' register to the instructions
to prevent the register allocator from rematerializing values in between
streaming-mode changes, as the def/use of VG will further nail down the
ordering that comes out of ISel. This avoids the heavy-handed approach
to prevent any kind of rematerialization.

While we could add 'VG' as a Use to all SVE instructions, we only really
need to do this for instructions that are rematerializable, as the
smstart/smstop instructions and pseudos act as scheduling barriers which
is sufficient to prevent other instructions from being scheduled in
between the streaming-mode-changing call sequence. However, we may
revisit this in the future.
2024-02-29 15:35:46 +00:00
ostannard
5452cbc4a6
[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020)
When using -mbranch-protection=pac-ret+pc, x16 is used in the function
epilogue to hold the address of the signing instruction. This is used by
a HINT instruction which can only use x16, so we can't change this. This
means that we can't use it to hold the function pointer for an indirect
tail-call.

There is existing code to force indirect tail-calls to use x16 or x17
when BTI is enabled, so there are now 4 combinations:

bti  pac-ret+pc  Valid function pointer registers
off  off         Any non callee-saved register
on   off         x16 or x17
off  on          Any non callee-saved register except x16
on   on          x17
2024-02-08 15:31:54 +00:00
Sjoerd Meijer
35904ec4e1
[AArch64] MI Scheduler STP combine (#80188)
Add opcodes for different store instructions to the target hook that can
enable more STP pairs. This is split off from the patch that does the
same for some load instructions (#79003).

Patch co-authored by Cameron McInally.
2024-02-06 10:29:42 +00:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Yuta Mukai
70eab122bc
[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589)
Add AArch64 implementations for the interfaces of MachinePipeliner pass.
The pass is disabled by default for AArch64. It is enabled by specifying
--aarch64-enable-pipeliner.

5 tests in llvm-test-suites show performance improvement by more than 5%
on a Neoverse V1 processor.

| test | improvement |
| ---------------------------------------------------------------- |
-----------:|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test | 16%
|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test | 16%
|
| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test | 14% |
| SingleSource/Benchmarks/Misc/flops-5.test | 13% |
| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test | 6% |

(base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm
-aarch64-enable-pipeliner -mllvm
-pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm
-pipeliner-enable-copytophi=0)

On the other hand, there are cases of significant performance
degradation. Algorithm improvements and adding the option/pragma will be
needed in the future.
2024-02-02 10:33:44 +09:00
Sjoerd Meijer
8841846050
[AArch64] MI Scheduler LDP combine follow up (#79003)
This is a follow up of 75d820dcdd86, adding more opcodes to the combine
target hook enabling more LDP creation.

Patch co-authored by Cameron McInally.
2024-01-31 15:41:32 +00:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
David Green
915c3d9e5a Revert "[AArch64] merge index address with large offset into base address"
This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.
2024-01-28 17:01:21 +00:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
Anatoly Trosinenko
10bd69a4f7
[MachineOutliner] Refactor iterating over Candidate's instructions (#78972)
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.

This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
2024-01-23 17:21:40 +03:00
Eli Friedman
a6065f0fa5
Arm64EC entry/exit thunks, consolidated. (#79067)
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.

Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.

The other changes are supporting changes, to handle the new constructs
generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.

Thanks to @dpaoliello for extensive testing and suggestions.

(Originally posted as https://reviews.llvm.org/D157547 .)
2024-01-22 21:28:07 -08:00
Sjoerd Meijer
75d820dcdd
[AArch64] MI Scheduler: create more LDP/STP pairs (#77565)
Target hook `canPairLdStOpc` is missing quite a few opcodes for which
LDPs/STPs can created. I was hoping that it would not be necessary to
add these missing opcodes here and that the attached motivating test
case would be handled by the LoadStoreOptimiser (especially after
#71908), but it's not. The problem is that after register allocation
some things are a lot harder to do. Consider this for the motivating
example

```
[1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0)
[2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4)
[3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr
[4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0)
[5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0)
```
We can't combine the the load in line [5] into the load on [1]:
regisister q1 is used in between. And we can can't combine [1] into 
[5]: it is aliasing with the STR on line [4].

So, adding some missing opcodes here seems the best/easiest approach.
I will follow up to add some more missing cases here.
2024-01-11 09:46:47 +00:00
Momchil Velikov
4b6968952e
[AArch64] Implement spill/fill of predicate pair register classes (#76068)
We are getting ICE with, e.g.
```
#include <arm_sve.h>

 void g();
 svboolx2_t f0(int64_t i, int64_t n) {
     svboolx2_t r = svwhilelt_b16_x2(i, n);
     g();
     return r;
 }
```
2023-12-22 15:54:12 +00:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
zhongyunde 00443407
f568763641 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2023-12-21 18:54:15 +08:00
zhongyunde 00443407
32878c2065 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
2023-12-21 18:54:14 +08:00
DianQK
7649d22306
[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184)
Follows
https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782.
Fixes #74680.
2023-12-14 19:19:55 +08:00
Oskar Wirga
9930f3e298
[AArch64] Fix case of 0 dynamic alloc when stack probing (#74877)
I accidentally closed
https://github.com/llvm/llvm-project/pull/74806

If the dynamic allocation size is 0, then we will still probe the
current sp value despite not decrementing sp! This results in
overwriting stack data, in my case the stack canary.

The fix here is just to load the value of [sp] into xzr which is
essentially a no-op but still performs a read/probe of the new page.
2023-12-10 08:01:29 -05:00
Alex Bradbury
b717365216
[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778)
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).

This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.

As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
2023-12-06 15:30:48 +00:00
Momchil Velikov
cc944f502f
[AArch64] Stack probing for function prologues (#66524)
This adds code to AArch64 function prologues to protect against stack
clash attacks by probing (writing to) the stack at regular enough
intervals to ensure that the guard page cannot be skipped over.

The patch depends on and maintains the following invariants:

Upon function entry the caller guarantees that it has probed the stack
(e.g. performed a store) at some address [sp, #N], where`0 <= N <=
1024`. This invariant comes from a requirement for compatibility with
GCC. Any address range in the allocated stack, no smaller than
stack-probe-size bytes contains at least one probe At any time the stack
pointer is above or in the guard page Probes are performed in
descreasing address order
The stack-probe-size is a function attribute that can be set by a
platform to correspond to the guard page size.

By default, the stack probe size is 4KiB, which is a safe default as
this is the smallest possible page size for AArch64. Linux uses a 64KiB
guard for AArch64, so this can be overridden by the stack-probe-size
function attribute.

For small frames without a frame pointer (<= 240 bytes), no probes are
needed.

For larger frame sizes, LLVM always stores x29 to the stack. This serves
as an implicit stack probe. Thus, while allocating stack objects the
compiler assumes that the stack has been probed at [sp].

There are multiple probing sequences that can be emitted, depending on
the size of the stack allocation:

A straight-line sequence of subtracts and stores, used when the
allocation size is smaller than 5 guard pages. A loop allocating and
probing one page size per iteration, plus at most a single probe to deal
with the remainder, used when the allocation size is larger but still
known at compile time. A loop which moves the SP down to the target
value held in a register (or a loop, moving a scratch register to the
target value help in SP), used when the allocation size is not known at
compile-time, such as when allocating space for SVE values, or when
over-aligning the stack. This is emitted in AArch64InstrInfo because it
will also be used for dynamic allocas in a future patch. A single probe
where the amount of stack adjustment is unknown, but is known to be less
than or equal to a page size.

---------

Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>
2023-11-30 17:41:51 +00:00
David Green
4d80122598
[AArch64] Teach areMemAccessesTriviallyDisjoint about scalable widths. (#73655)
The base change here is to change getMemOperandWithOffsetWidth to return
a TypeSize Width, which in turn allows areMemAccessesTriviallyDisjoint
to reason about trivially disjoint widths.
2023-11-30 16:54:28 +00:00
Alex Bradbury
6cf3566850
[NFC][MachineScheduler] Rename NumLoads parameter of shouldClusterMemOps to ClusterSize (#73757)
As the same hook is called for both load and store clustering, NumLoads
is a misleading name. Use ClusterSize instead.
2023-11-29 09:47:03 +00:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Antonio Frighetto
c16b94a3bf [AArch64] Fix missing opcode when calling isAArch64FrameOffsetLegal
`LDAPURi` was accidentally left unhandled in `isAArch64FrameOffsetLegal`.

Reported-by: fhahn
2023-11-06 18:06:44 +01:00
Sander de Smalen
7dc20abed0 [AArch64] Fix spillfill-sve.mir with expensive checks.
This fixes an issue introduced by PR #70679.

Using constrainRegClass() is not strong enough to actually force
the use of a register to be a PPR register class. It will need an
actual COPY to do the conversion.

The downside is that this introduces an extra register, which is an
issue we may want to fix at a later point using a custom copy operation
where the register allocator uses the same register when it can.
2023-11-01 16:29:44 +00:00
Sander de Smalen
2efea512c2
[AArch64] Fix spilling/filling of virtual registers in PNR regclass. (#70679)
We made the assumption that the registers were always physical
registers, which doesn't have to be true.
2023-11-01 10:57:12 +00:00
Antonio Frighetto
8ce4b7bcd5 [AArch64] Handle newly-added atomic instructions in getMemOpInfo
2-stage AArch64 buildbot was previously failing.

Fixes: https://lab.llvm.org/buildbot/#/builders/198/builds/5636.
2023-10-31 18:58:20 +01:00
Sander de Smalen
73498d2608
[AArch64] Also implement PNR -> PNR copies. (#70682)
Previously we only implemented PNR -> PPR and PPR -> PNR copies.
2023-10-31 16:52:42 +00:00
Paul Walker
7c90be2857
[SVE] Fix incorrect offset calculation when rewriting an instruction's frame index. (#70315)
When partially packing an offset into an SVE load/store instruction we
are incorrectly calculating the remainder.
2023-10-27 16:53:30 +01:00
Bill Wendling
389958a9f6 [CodeGen][NFC] Fix formatting
This fixes the formatting introduced by
fbf0a77e80f18a6d0fd8a28833b0bc87a99b1b2f.
2023-10-17 12:35:30 -07:00
Bill Wendling
fbf0a77e80
[CodeGen] Avoid potential sideeffects from XOR (#67193)
XOR may change flag values (e.g. for X86 gprs). In the case where that's
not desirable, specify that buildClearRegister() should use MOV instead.
2023-10-17 12:03:26 -07:00
Vladislav Dzhidzhoev
abd0d5d262 Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG
This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes
isAArch64FrameOffsetLegal function recognize LD1R instructions.

Original PR: https://github.com/llvm/llvm-project/pull/66914
PR of the fix: https://github.com/llvm/llvm-project/pull/69003
2023-10-17 17:40:05 +02:00
Momchil Velikov
bea3684944
[AArch64] Allow only LSL to be folded into addressing mode (#69235)
There was an error in decoding shift type, which permitted shift types
other than LSL to be (incorrectly) folded into the addressing mode of a
load/store instruction.
2023-10-17 11:30:14 +01:00
john-brawn-arm
a574ef6176
[AArch64] Fix incorrect big-endian spill in foldMemoryOperandImpl (#65601)
When an sreg sub-register of a q register was spilled,
AArch64InstrInfo::foldMemoryOperandImpl would emit a spill of a d
register, which gives the wrong result when the target is big-endian as
the following q register fill will put the value in the top half.

Fix this by greatly simplifying the existing code for widening the spill
to only handle wzr to xzr widening, as the default result we get if the
function returns nullptr is already that a widened spill will be
emitted.
2023-10-12 16:10:28 +01:00
Anatoly Trosinenko
1d2b558265 [AArch64][PAC] Check authenticated LR value during tail call
When performing a tail call, check the value of LR register after
authentication to prevent the callee from signing and spilling an
untrusted value. This commit implements a few variants of check,
more can be added later.

If it is safe to assume that executable pages are always readable,
LR can be checked just by dereferencing the LR value via LDR.

As an alternative, LR can be checked as follows:

    ; lowered AUT* instruction
    ; <some variant of check that LR contains a valid address>
    b.cond break_block
  ret_block:
    ; lowered TCRETURN
  break_block:
    brk 0xc471

As the existing methods either break the compatibility with execute-only
memory mappings or can degrade the performance, they are disabled by
default and can be explicitly enabled with a command line option.

Individual subtargets can opt-in to use one of the available methods
by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod().

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D156716
2023-10-11 17:38:17 +03:00
Anatoly Trosinenko
f1b2dd2a11
[AArch64][BTI] Prevent Machine Scheduler from moving branch targets (#68313)
Moving instructions that are recognized as branch targets by BTI can
result in runtime crash.

In outliner tests, replaced "BRK 1" with "HINT 0" (a.k.a. NOP) as a
generic outlinable instruction.
2023-10-06 18:05:03 +03:00
Matt Arsenault
f79379398d Revert "CodeGen: Disable isCopyInstrImpl if there are implicit operands"
This reverts commit bc7d88faf1a595ab59952a2054418cdd0d9eeee8.

This is broken with 414ff812d6241b728754ce562081419e7fc091eb reverted.
2023-10-02 22:43:24 +03:00
Matt Arsenault
bc7d88faf1 CodeGen: Disable isCopyInstrImpl if there are implicit operands
This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.

The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.

Hopefully fixes #64060 as well.

This needs to be applied to the release branch.

https://reviews.llvm.org/D156346
2023-10-02 15:16:40 +03:00
Momchil Velikov
fe763d8ad4
[AArch64] Limit immediate offsets when folding instructions into addressing modes (#67345)
Don't increase/decrease immediate offsets in folded instructions beyond
the limits of `LDP`.
2023-09-26 14:21:32 +01:00
Simon Pilgrim
98e8f04c9f Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC. 2023-09-25 17:19:21 +01:00