2315 Commits

Author SHA1 Message Date
Martin Storsjö
2a5e1da57a
Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232)
Reverts llvm/llvm-project#116970.

This change broke Wine compiled for armv7, causing segfaults when
starting Wine. See llvm/llvm-project#116970 for more detailed discussion
about the issue.
2024-12-02 00:02:25 +02:00
Sergei Barannikov
a348f223ca
[ARM] Stop gluing ALU nodes to branches / selects (#116970)
Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.

Pull Request: https://github.com/llvm/llvm-project/pull/116970
2024-11-30 08:14:24 +03:00
Sergei Barannikov
ad9dcd96dc
Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

`TRI::getCrossCopyRegClass` is modified in a way that prevents DAG
scheduler from copying FPSCR into a virtual register. The register
allocator might need to spill the virtual register, but that only seem
to work in Thumb mode.
2024-11-22 22:29:58 +03:00
Sergei Barannikov
5d32a1409d
Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175)
Reverts llvm/llvm-project#116676

Reverting per post-commit feedback (causes miscompilation errors and/or
assertion failures).
2024-11-21 18:26:53 +03:00
Sergei Barannikov
8c56dd3040
[ARM] Stop gluing FP comparisons to FMSTAT (#116676)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.

Pull Request: https://github.com/llvm/llvm-project/pull/116676
2024-11-20 16:07:05 +03:00
Sergei Barannikov
aff98e4be0
[ARM] Stop gluing 1-bit shifts (#116547)
1. When two (or more) nodes are glued, DAG scheduler will always
schedule them as one piece, i.e. it will not allow any instructions to
be scheduled between them. It does so because if nodes are glued this
usually means that there is an implicit register dependency between
them, and an intervening node could clobber this physical register. When
emitting such nodes into machine IR, they will also be stuck together,
e.g.:
```
    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr
```

2. If a node has Glue result, SelectionDAG will not try to CSE this
node. If it did, it would break the implicit physical register
dependency. In practice this means that if a node with Glue result has
multiple uses, it has to be duplicated before each use. This the reason
for `ARMTargetLowering::duplicateCmp` to exist.

When using normal data dependency, dependent nodes can freely be
scheduled around. If there is a physical register dependency between
nodes, the physical register will be copied to/from a virtual register,
allowing other nodes to intervene between them. The resulting machine IR
might look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr
```

The two copies are likely to be eliminated by register coalescer, given
that there are no instructions between them that clobber this physical
register. If the copies are unwanted in the first place (they could be
expensive or impossible), DAG scheduler will try to avoid inserting them
wherever possible, and the resulting machine IR will look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr
```

On ARM, arithmetic operations and LSLS already use the new data flow
approach. This patch extends it to include 1-bit shifts.

Pull Request: https://github.com/llvm/llvm-project/pull/116547
2024-11-19 17:46:48 +03:00
Craig Topper
cde4ae789e [ARM] Use getSignedTargetConstant. NFC 2024-11-18 13:12:23 -08:00
Sergei Barannikov
baf59be89b
[SelectionDAG] Fix return types of TC_RETURN for several targets (#116504)
TC_RETURN nodes do not have a glue result.
2024-11-17 02:14:05 +03:00
Kazu Hirata
9571cc2b28
[ARM] Remove unused includes (NFC) (#115995)
Identified with misc-include-cleaner.
2024-11-12 23:15:21 -08:00
David Green
750247bc1c [ARM] Fix APInt assert for CSNEG known bits.
Use APInt::getAllOnes(32), to avoid the assert when constructing an APInt
from -1.
2024-11-11 19:39:02 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Craig Topper
50896e7ef5 [ARM] Use getSignedConstant. NFC 2024-10-30 21:43:16 -07:00
Oliver Stannard
376d7b27fa [ARM] Optimise byval arguments in tail-calls
We don't need to copy byval arguments to tail calls via a temporary, if
we can prove that we are not copying from the outgoing argument area.
This patch does this when the source if the argument is one of:
* Memory in the local stack frame, which can't be used for tail-call
  arguments.
* A global variable.

We can also avoid doing the copy completely if the source and
destination are the same memory location, which is the case when the
caller and callee have the same signature, and pass some arguments
through unmodified.
2024-10-25 09:34:09 +01:00
Oliver Stannard
914a3990d1 [ARM] Avoid clobbering byval arguments when passing to tail-calls
When passing byval arguments to tail-calls, we need to store them into
the stack memory in which this the caller received it's arguments. If
any of the outgoing arguments are forwarded from incoming byval
arguments, then the source of the copy is from the same stack memory.
This can result in the copy corrupting a value which is still to be
read.

The fix is to first make a copy of the outgoing byval arguments in local
stack space, and then copy them to their final location. This fixes the
correctness issue, but results in extra copying, which could be
optimised.
2024-10-25 09:34:09 +01:00
Oliver Stannard
78ec2e2ed5 [ARM] Allow tail calls with byval args
Byval arguments which are passed partially in registers get stored into
the local stack frame, but it is valid to tail-call them because the
part which gets spilled is always re-loaded into registers before doing
the tail-call, so it's OK for the spill area to be deallocated.
2024-10-25 09:34:08 +01:00
Oliver Stannard
82e6472197 [ARM] Allow functions with sret returns to be tail-called
It is valid to tail-call a function which returns through an sret
argument, as long as we have an incoming sret pointer to pass on.
2024-10-25 09:34:08 +01:00
Oliver Stannard
c1eb790cd2 [ARM] Tail-calls do not require caller and callee arguments to match
The ARM backend was checking that the outgoing values for a tail-call
matched the incoming argument values of the caller. This isn't
necessary, because the caller can change the values in both registers
and the stack before doing the tail-call. The actual limitation is that
the callee can't need more stack space for it's arguments than the
caller does.

This is needed for code using the musttail attribute, as well as
enabling tail calls as an optimisation in more cases.
2024-10-25 09:34:08 +01:00
Oliver Stannard
246baeb5fe [ARM] Add debug trace for tail-call optimisation
There are lots of reasons a call might not be eligible for tail-call
optimisation, this adds debug trace to help understand the compiler's
decisions here.
2024-10-25 09:34:08 +01:00
Oliver Stannard
8e289e4fa6 [ARM] Fix comment typo 2024-10-25 09:34:07 +01:00
Alex Rønne Petersen
5785cbb405
[llvm] Ensure that soft float targets don't emit fma() libcalls. (#106615)
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.

Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.

Note: I don't have commit access.
2024-10-19 06:13:15 -07:00
Keith Packard
44b020a381
[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928)
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.

This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.

---------

Signed-off-by: Keith Packard <keithp@keithp.com>
2024-10-17 19:06:47 -07:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Jay Foad
9255850e89 [LLVM] Remove unused variables after #112546 2024-10-16 16:15:34 +01:00
Jay Foad
d9c95efb6c
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546)
Convert almost every instance of:
  CreateCall(Intrinsic::getOrInsertDeclaration(...), ...)
to the equivalent CreateIntrinsic call.
2024-10-16 15:43:30 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Nashe Mncube
439dcfafc5
[llvm][ARM][NFC] Renaming FeaturePrefLoopAlignment (#109932)
The feature 'FeaturePrefLoopAlignment' was misleading as it was used to
set the alignment of branch targets such as functions. Renamed to
FeaturePreferfBranchAlignment.
2024-09-26 13:36:12 +01:00
David Green
f1bbabd628
[ARM] Lower arm_neon_vbsl to ARMISD::VBSP and fold (vbsl x, y, y) to y (#109761)
This helps clean up the patterns a little and will help share combines
on both the intrinsic and VBSP. A combine is then added to fold away the
VBSP if both the selected operands are the same.
2024-09-25 10:03:39 +01:00
David Green
78ff736db2
[ARM] Fix VMOVRRD combine with non-canonical inserts. (#109639)
In some situations, in the test case here with the multiple calls being
late legalized, we can see inserts of the form:
```
     b = insert a, x, 0
    c = insert b, y, 1
   d = insert c, z, 0
  bc = bitcast d
 e = extract bc, 0
r = vmovrrd e
```
The redundant insert will usually be removed, but in some cases are not
prior to PerformVMOVRRDCombine. The code was finding the first insert
from each lane (x and y), as opposed to the last (z and y).
2024-09-24 08:10:50 +01:00
Kazu Hirata
1e4e1ceeeb
[Target] Avoid repeated hash lookups (NFC) (#108677) 2024-09-14 07:39:09 -07:00
David Green
637aa61732
[ARM] Fix VBICimm and VORRimm generation under Big endian. (#107813)
This is a smaller follow on to #105519 that fixes VBICimm and VORRimm
too. The logic behind lowering vector immediates under big endian
Neon/MVE is to treat them in natural lane ordering (same as little
endian), and VECTOR_REG_CAST them to the correct type (as opposed to
creating the constants in big endian form and bitcasting them). This
makes sure that is done when creating VORRIMM and VBICIMM.
2024-09-13 10:59:57 +01:00
Austin
3242e77841
[ARM][Codegen] Fix vector data miscompilation in arm32be (#105519)
Fix #102418, resolved the issue of generating incorrect vrev during
vectorization in big-endian scenarios
2024-09-07 14:09:29 +08:00
Nikita Popov
a7697c8655
[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)
These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCombine to improve
that alignment.

If desired I could also adjust the clang frontend to add alignment
annotations equivalent to the previous behavior, but I don't see any
indication that such an assumption is correct in the ARM intrinsics
docs.

Fixes https://github.com/llvm/llvm-project/issues/59081.
2024-09-05 09:26:53 +02:00
Oliver Stannard
9cf68679c4
[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721)
This test case was failing to compile with a "ran out of registers
during register allocation" error at -O0. This was because CMP_SWAP_64
has 3 operands which must be an even-odd register pair, and two other
GPR operands. All of the def operands are also early-clobber, so
registers can't be shared between uses and defs. Because the function
has an over-aligned alloca it needs frame and base pointers, so r6 and
r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the
only valid register pairs, and if the two individual GPR operands happen
to get allocated to registers in different pairs then only 2 pairs will
be available for the three GPRPair operands.

To fix this, I've merged the two GPR operands into a single GPRPair
operand. This means that the instruction now has 4 GPRPair operands,
which can always be allocated without relying on luck. This does
constrain register allocation a bit more, but this pseudo instruction is
only used at -O0, so I don't think that's a problem.
2024-09-02 08:54:10 +01:00
Sergei Barannikov
4d7a0abae8
[DataLayout] Change return type of getStackAlignment to MaybeAlign (#105478)
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.

This change also makes `exceedsNaturalStackAlignment` method redundant.
2024-08-27 22:59:33 +03:00
Kiran
c50d11e6d9 Revert "[ARM] musttail fixes"
committed by accident, see #104795

This reverts commit a2088a24dad31ebe44c93751db17307fdbe1f0e2.
2024-08-27 11:17:17 +01:00
Kiran
a2088a24da [ARM] musttail fixes
Backend:
- Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call
- Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret
- Allowed tail calls if byval args are used
- Added debug trace for IsEligibleForTailCallOptimisation

Frontend (clang):
- Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already

Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec
2024-08-27 10:44:06 +01:00
Craig Topper
4b0c0ec6b8
[CodeGen] Use MCRegister for CCState::AllocateReg and CCValAssign::getReg. NFC (#106032) 2024-08-26 11:40:25 -07:00
David Green
b9a0276550 [ARM] Add VECTOR_REG_CAST identity fold.
v16i8 VECTOR_REG_CAST (v16i8 Op) can use v16i8 Op directly, as the
VECTOR_REG_CAST is a noop.
2024-08-24 21:21:27 +01:00
Craig Topper
efa859cd3e [ARM] Use SelectonDAG::getSignedConstant. 2024-08-17 18:02:41 -07:00
Simon Pilgrim
11ba72e651
[KnownBits] Add KnownBits::add and KnownBits::sub helper wrappers. (#99468) 2024-08-12 10:21:28 +01:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Oliver Stannard
50a2b31800
[ARM] Be more precise about conditions for indirect tail-calls (#102451)
This code was trying to predict the conditions in which an indirect
tail call will have a free register to hold the target address, and
falling back to a non-tail call if all non-callee-saved registers are
used for arguments or return address authentication.

However, it was only taking the number of arguments into account, not
which registers they are allocated to, so floating-point arguments could
cause this to give the wrong result, causing either a later error due to
the lack of a free register, or a missed optimisation of not doing the
tail call.

The assignments of arguments to registers is available at this point in
the code, so we can calculate exactly which registers will be available
for the tail-call.
2024-08-09 08:50:21 +01:00
Sergei Barannikov
411d31ad69
Partially revert 92e18ffd803365c64910760ba20278f875d93681 (#101673)
It is likely to cause stage2 build failures:

https://lab.llvm.org/buildbot/#/builders/122/builds/389
https://lab.llvm.org/buildbot/#/builders/79/builds/552

I don't have an ARM machine to investigate, so I'm just reverting ARM
changes to see if it helps make the bots green again.
2024-08-02 16:38:31 +03:00
Sergei Barannikov
92e18ffd80
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752)
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
2024-08-02 12:29:39 +03:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Kazu Hirata
5e22a53698
[Target] Use range-based for loops (NFC) (#98705) 2024-07-13 17:40:51 -07:00
Joseph Huber
3f1a767572
[LLVM] Factor disabled Libcalls into the initializer (#98421)
Summary:
These Libcalls represent which functions are available to the backend.
If a runtime call is not available, the target sets the the name to
`nullptr`. Currently, this logic is spread around the various targets.
This patch pulls all of the locations that disable libcalls into the
intializer. This patch is effectively NFC.

The motivation behind this patch is that currently the LTO handling uses
the list of all runtime calls to determine which functions cannot be
internalized and must be extracted from static libraries. We do not want
this to happen for libcalls that are not emitted by the backend. A
follow-up patch will move out this logic so the LTO pass can know which
rtlib calls are actually used by the backend.
2024-07-11 12:59:25 -05:00