1872 Commits

Author SHA1 Message Date
Craig Topper
9240061800
[RegAllocFast] Don't align stack slots if the stack can't be realigned (#153682)
This is the fast regalloc equivalent of
773771ba382b1fbcf6acccc0046bfe731541a599.
2025-08-19 08:17:26 -07:00
Shoreshen
db96363c0a
[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563)
Cause:
1. `implicit_def` inside bundle does not count for define of reg in
machineinst verifier
2. Including `implicit_def` will cause relative reg not define, result
in `Bad machine code: Using an undefined physical register` in the
machineinst verifier

Fixes https://github.com/llvm/llvm-project/issues/139102

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-08-13 10:41:44 +08:00
Philip Reames
4d629f9744
[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226)
In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
2025-08-12 11:23:05 -07:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
David Green
0967957d7a
[CostModel] Handle all cost kinds in getCmpSelInstrCost (#148233)
Currently we always produce a cost of 1 for all CostKinds that are not
RecipThroughput, which can underestimate the cost if the type has a
higher legalization cost (like larger vectors). This relaxes it to cover
all cost kinds.
2025-07-15 18:08:52 +01:00
John Brawn
f8c2c4f161
[LSR] Account for hardware loop instructions (#147958)
A hardware loop instruction combines a subtract, compare with zero, and
branch. We currently account for the compare and branch being combined
into one in Cost::RateFormula, as part of more general handling for
compare-branch-zero, but don't account for the subtract, leading to
suboptimal decisions in some cases.

Fix this in Cost::RateRegister by noticing when we have such a subtract
and discounting the AddRecCost in such a case.
2025-07-14 16:48:54 +01:00
AZero13
0edc98cd6d
[ARM] Copy SMAX(lhs, 0) and SMIN(lhs, 0) patterns from AArch64 to ARM (#146565)
They work on ARM too.
2025-07-10 21:06:52 +01:00
Guy David
76274eb2b3
[PHIElimination] Revert #131837 #146320 #146337 (#146850)
Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337
2025-07-03 07:48:08 -04:00
Guy David
f5c62ee0fa
[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
2025-06-29 21:28:42 +03:00
AZero13
8bd6d36a44
[ARM] Override hasAndNotCompare (#145441)
bics is available on ARM.

USAT regressions are to be fixed after this because that is an issue
with the ARMISelLowering and should be another PR.

Note that opt optimizes those testcases to min/max intrinsics anyway so
this should have no real effect on codegen.

Proof: https://alive2.llvm.org/ce/z/kPVQ3_
2025-06-29 11:15:56 +01:00
David Green
825ad86aea
[DAG] Fold nested add(add(reduce(a), b), add(reduce(c), d)) (#115150)
This patch reassociates `add(add(vecreduce(a), b), add(vecreduce(c),
d))` into `add(vecreduce(add(a, c)), add(b, d))`, to combine the
reductions into a single node. This comes up after unrolling vectorized
loops.

There is another small change to move reassociateReduction inside fadd
outside of a AllowNewConst block, as new constants will not be created
and it should be OK to perform the combine later after legalization.
2025-06-24 13:08:59 +01:00
Bjorn Pettersson
dac0820b27 [Thumb2] Regenerate some test checks. NFC 2025-06-18 11:54:51 +02:00
Matt Arsenault
505c550e4c
DAG: Assert fcmp uno runtime calls are boolean values (#142898)
This saves 2 instructions in the ARM soft float case for fcmp ueq.

This code is written in an confusingly overly general way. The point
of getCmpLibcallCC is to express that the compiler-rt implementations
of the FP compares are different aliases around functions which may
return -1 in some cases. This does not apply to the call for unordered,
which returns a normal boolean.

Also stop overriding the default value for the unordered compare for ARM.
This was setting it to the same value as the default, which is now assumed.
2025-06-10 10:46:29 +09:00
David Green
6baaa0afc3
[ARM] Handle roundeven for MVE. (#142557)
Now that #141786 handles scalar and neon types, this adds MVE
definitions and legalization for llvm.roundeven intrinsics. The existing
llvm.arm.mve.vrintn are auto-upgraded to llvm.roundeven like other vrint
instructions, so should continue to work.
2025-06-08 18:23:50 +01:00
Folkert de Vries
46b389218b
[ARM]: codegen llvm.roundeven.v* (#141786)
fixes https://github.com/llvm/llvm-project/issues/73588

The aarch64 version of `frintn.ll` notes the intention to auto-upgrade
`frintn` to `roundeven`. I haven't been able to figure out how to make
that happen though (either for arm or aarch64).

The original issue came up in
https://github.com/rust-lang/stdarch/pull/1807
2025-05-30 19:36:29 +01:00
David Green
11953c647b
[ARM] Remove kill flags in ReplaceConstByVPNOTs. (#140082)
This is similar to #86300. The vpr register on this branch might be
killed before we reuse it.
2025-05-22 08:24:01 +01:00
Mohammad Bashir
bcdce987c0
Fix regression tests with bad FileCheck checks (#140373)
Fixes https://github.com/llvm/llvm-project/issues/140149
2025-05-22 07:59:57 +03:00
John Brawn
dd87127f4e
[DAGCombiner] Eliminate fp casts if we have the right fast math flags (#131345)
When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
2025-04-28 11:21:51 +01:00
Sergei Barannikov
7af555e524
[ARM][RISCV] Partially revert #101786 (#137120)
The change as is breaks the Linux kernel build as pointed out in the
comments.
2025-04-24 10:13:05 +03:00
Sergei Barannikov
11a3de7e98
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786)
This is a reland of #99752 with the bug fixed (see test diff in the
third commit in this PR).
All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type
of the argument, which can be wider than `int`. The fix is to make DAG
legalizer pass the correct return type to `makeLibCall` and sign-extend
the result afterwards.

Original commit message:
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.

Pull Request: https://github.com/llvm/llvm-project/pull/101786
2025-04-23 12:43:05 +03:00
Pengxuan Zheng
36acaa0be5 Revert "[ARM][ConstantIslands] Correct MinNoSplitDisp calculation (#114590)"
This reverts commit e48916f615e0ad2b994b2b785d4fe1b8a98bc322.
2025-04-10 13:56:52 -07:00
David Green
6c27817294
[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717)
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
2025-04-03 11:14:08 +01:00
Fangrui Song
04a67528d3
[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674)
The existing pretty printer generates excessive parentheses for
MCBinaryExpr expressions. This update removes unnecessary parentheses
of MCBinaryExpr with +/- operators and MCUnaryExpr.
Since relocatable expressions only use + and -, this change improves
readability in most cases.

Examples:

- (SymA - SymB) + C now prints as SymA - SymB + C.
  This updates the output of -fexperimental-relative-c++-abi-vtables for
  AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8`
- expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this
  change primarily affecting AMDGPUMCExpr.
2025-03-30 22:03:14 -07:00
Rifet-c
4a0212b9fc
[CodeGen] Combine two loops in SlotIndexes.cpp file (#127631)
Merged two loops that were iterating over the same machine basic block
into one, also did some minor readability improvements (variable
renaming, commenting and absorbing if condition into a variable)
2025-03-07 08:17:23 +07:00
Daniel Paoliello
16e051f0b9
[win] NFC: Rename EHCatchret to EHCont to allow for EH Continuation targets that aren't catchret instructions (#129953)
This change splits out the renaming and comment updates from #129612 as a non-functional change.
2025-03-06 09:28:44 -08:00
Matt Arsenault
c53eb93dd7
PeepholeOpt: Immediately check if a reg_sequence compose supports a subregister (#128279)
This is a quick fix for EXPENSIVE_CHECKS bot failures. I still think we
could
defer looking for a compatible subregister further up the use-def chain,
and
should be able to check compatibilty with the ultimate found source.
2025-02-26 10:15:17 +07:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Oliver Stannard
308ce8d524
[ARM] Fix calling convention for __fp16 with big-endian (#126741)
AAPCS32 defines the fp16 and bf16 types as being passed as if they were
extended to 32 bits, with the high 16 bits being unspecified. The
extension is specified as happening as-if it was done in a register,
which means that for big endian targets, the actual value gets passed in
the higher addressed half of the stack slot, instead of the lower
addressed half as for little endian. Previously, for targets with the
fp16 extension, we were passing these types as a 16 bit stack slot,
which worked for little endian because every later stack slot would be
4-byte aligned leaving the 2 byte gap, but was incorrect for big endian.
2025-02-13 09:09:29 +00:00
Matt Arsenault
58a88001f3
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This
will allow AMDGPU to remove its implementation of
shouldRewriteCopySrc, which exists as a 10 year old workaround
to this bug. peephole-opt-fold-reg-sequence-subreg.mir will
show the expected improvement once the custom implementation
is removed.

The copy coalescing processing here is overly abstracted
from what's actually happening. Previously when visiting
coalescable copy-like instructions, we would parse the
sources one at a time and then pass the def of the root
instruction into findNextSource. This means that the
first thing the new ValueTracker constructed would do
is getVRegDef to find the instruction we are currently
processing. This adds an unnecessary step, placing
a useless entry in the RewriteMap, and required skipping
the no-op case where getNewSource would return the original
source operand. This was a problem since in the case
of a subregister extract, shouldRewriteCopySource would always
say that it is useful to rewrite and the use-def chain walk
would abort, returning the original operand. Move the process
to start looking at the source operand to begin with.

This does not fix the confused handling in the uncoalescable
copy case which is proving to be more difficult. Some currently
handled cases have multiple defs from a single source, and other
handled cases have 0 input operands. It would be simpler if
this was implemented with isCopyLikeInstr, rather than guessing
at the operand structure as it does now.

There are some improvements and some regressions. The
regressions appear to be downstream issues for the most part. One
of the uglier regressions is in PPC, where a sequence of insert_subrgs
is used to build registers. I opened #125502 to use reg_sequence instead,
which may help.

The worst regression is an absurd SPARC testcase using a <251 x fp128>,
which uses a very long chain of insert_subregs.

We need improved subregister handling locally in PeepholeOptimizer,
and other pasess like MachineCSE to fix some of the other regressions.
We should handle subregister composes and folding more indexes
into insert_subreg and reg_sequence.
2025-02-05 23:29:02 +07:00
Albert Huang
409fa785d9
[ARM] Add "avoidmuls" to STAR-MC1 also (#123706)
PR #112540 as the reference.
2025-02-05 07:44:53 +00:00
Sergei Barannikov
ff9c041d96
[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)
Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-02-01 20:40:50 +03:00
Nikita Popov
eeac0ffaf4 Revert "[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)"
This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a.

This causes a large compile-time regression.
2025-01-10 09:05:06 +01:00
Pengcheng Wang
b4e17d4a31
[MachineLICM] Use RegisterClassInfo::getRegPressureSetLimit (#119826)
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787
2025-01-09 21:05:52 +08:00
David Green
f87a9db832 [ARM] Expand fp64 bf16 converts similarly to f32
This helps with +fp64 targets where the f64s are legal and not previously
lowered. It can treat fpextends as a shift + cvt and fptrunc can use a libcall.
2025-01-03 11:28:31 +00:00
David Green
ccbbacf0fa [ARM] Fix MVE incrementing gather offset calculation
The code was checking the gep ptr type as opposed to the gep source element
type in calculating the offset scale.

Fixes #120993
2024-12-24 08:48:01 +00:00
David Green
4472648998 [ARM] Expand bf16 expanding/rounding fp loads/stores
As with other fp types, these should be expanded to prevent nodes that are
illegal for Arm.
2024-12-20 09:03:28 +00:00
David Green
e020f46027 [ARM] Fix BF16 lowering with FullFP16
This adds test coverage for bf16 instructions, making sure that lowering bf16
works with and without +fullfp16.
2024-12-19 10:20:35 +00:00
Fangrui Song
133352feb3 [test] Remove redundant -march= when target triple is specified in IR 2024-12-15 12:42:17 -08:00
pzhengqc
e48916f615
[ARM][ConstantIslands] Correct MinNoSplitDisp calculation (#114590)
MinNoSplitDisp was first introduced in D16890 to handle cases where the
ConstantIslands pass fails to converge in the presence of big basic
blocks. However, the computation of the variable seems to be wrong as it
currently computes the offset immediately following UserBB. In other
words, it represents the distance from the beginning of the function to
the end of UserBB. The distance from the beginning of the function does
not seem to be a good indicator of how big the basic block is unless the
basic block is close to the beginning of the function. I think
MinNoSplitDisp should compute the distance between UserOffset and the
end of UserBB instead.
2024-12-14 10:14:50 -08:00
Sergei Barannikov
e0ed0333f0
Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887)
Re-landing #116970 after fixing miscompilation error.

The original change made it possible for CMPZ to have multiple uses;
`ARMDAGToDAGISel::SelectCMPZ` was not prepared for this.

Pull Request: https://github.com/llvm/llvm-project/pull/118887


Original commit message:

Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.
2024-12-07 10:14:36 +03:00
Oliver Stannard
f893b47500
[ARM] Fix instruction selection for MVE vsbciq intrinsic (#118284)
There were two bugs in the implementation of the MVE vsbciq (subtract
with carry across vector, with initial carry value) intrinsics:
* The VSBCI instruction behaves as if the carry-in is always set, but we
were selecting it when the carry-in is clear.
* The vsbciq intrinsics should generate IR with the carry-in set, but
they were leaving it clear.

These two bugs almost cancelled each other out, but resulted in
incorrect code when the vsbcq intrinsics (with a carry-in) were used,
and the carry-in was a compile time constant.
2024-12-06 08:46:56 +00:00
Martin Storsjö
2a5e1da57a
Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232)
Reverts llvm/llvm-project#116970.

This change broke Wine compiled for armv7, causing segfaults when
starting Wine. See llvm/llvm-project#116970 for more detailed discussion
about the issue.
2024-12-02 00:02:25 +02:00
Sergei Barannikov
a348f223ca
[ARM] Stop gluing ALU nodes to branches / selects (#116970)
Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.

Pull Request: https://github.com/llvm/llvm-project/pull/116970
2024-11-30 08:14:24 +03:00
Sergei Barannikov
ad9dcd96dc
Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

`TRI::getCrossCopyRegClass` is modified in a way that prevents DAG
scheduler from copying FPSCR into a virtual register. The register
allocator might need to spill the virtual register, but that only seem
to work in Thumb mode.
2024-11-22 22:29:58 +03:00
Sergei Barannikov
5d32a1409d
Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175)
Reverts llvm/llvm-project#116676

Reverting per post-commit feedback (causes miscompilation errors and/or
assertion failures).
2024-11-21 18:26:53 +03:00
Sergei Barannikov
8c56dd3040
[ARM] Stop gluing FP comparisons to FMSTAT (#116676)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.

Pull Request: https://github.com/llvm/llvm-project/pull/116676
2024-11-20 16:07:05 +03:00
Oliver Stannard
aba55809e9
[ARM] Fix operand order for MVE predicated VFMAS (#115908)
For most MVE predicated FMA instructions, disabled lanes will contain
the value in the addend operand. However, The VFMAS instruction takes
the addend in a GPR, and the output register is shared with the first
multiply operand, so disabled lanes will get that value instead. This
means that we can't use the same intrinsic as for the other VFMA
instructions. Instead, we can codegen the vfmas intrinsic to a regular
FMA and select in clang, which the backend already has the patterns to
select VFMAS from.
2024-11-13 12:16:28 +00:00
David Green
750247bc1c [ARM] Fix APInt assert for CSNEG known bits.
Use APInt::getAllOnes(32), to avoid the assert when constructing an APInt
from -1.
2024-11-11 19:39:02 +00:00
Oliver Stannard
9f02950a15
[ARM] Allow spilling FPSCR for MVE adc/sbc intrinsics (#115174)
The MVE VADC and VSBC instructions read and write a carry bit in FPSCR,
which is exposed through the intrinsics. This makes it possible to write
code which has the FPSCR live across a function call, or which uses the
same value twice, so it needs to be possible to spill and reload it.

There is a missed optimisation in one of the test cases, where we reload
the FPSCR from the stack despite it still being live, I've not found a
simple way to prevent the register allocator from doing this.
2024-11-07 11:23:49 +00:00