2351 Commits

Author SHA1 Message Date
Matt Arsenault
4455d9d3e3
RuntimeLibcalls: Use iterable enum for libcall types (#143075) 2025-06-06 20:58:18 +09:00
Matt Arsenault
18e94550e1
ARM: Remove unused CondCode field from libcall table (#142616) 2025-06-03 23:02:01 +02:00
David Green
7a688c080f [ARM] Add vector vrint tests and fix FP16 to expand. 2025-05-31 12:21:46 +01:00
Folkert de Vries
46b389218b
[ARM]: codegen llvm.roundeven.v* (#141786)
fixes https://github.com/llvm/llvm-project/issues/73588

The aarch64 version of `frintn.ll` notes the intention to auto-upgrade
`frintn` to `roundeven`. I haven't been able to figure out how to make
that happen though (either for arm or aarch64).

The original issue came up in
https://github.com/rust-lang/stdarch/pull/1807
2025-05-30 19:36:29 +01:00
Marius Kamp
10647685ca
[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554)
This change adds new parameters to the method
`shouldFoldSelectWithIdentityConstant()`. The method now takes the
opcode of the select node and the non-identity operand of the select
node. To gain access to the appropriate arguments, the call of
`shouldFoldSelectWithIdentityConstant()` is moved after all other checks
have been performed. Moreover, this change adjusts the precondition of
the fold so that it would work for `SELECT` nodes in addition to
`VSELECT` nodes.
    
No functional change is intended because all implementations of
`shouldFoldSelectWithIdentityConstant()` are adjusted such that they
restrict the fold to a `VSELECT` node; the same restriction as before.
    
The rationale of this change is to make more fine grained decisions
possible when to revert the InstCombine canonicalization of
`(select c (binop x y) y)` to `(binop (select c x idc) y)` in the
backends.
2025-05-29 09:46:39 +02:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Craig Topper
dcd62f3674
[SelectionDAG] Rename MemSDNode::getOriginalAlign to getBaseAlign. NFC (#139930)
This matches the underlying function in MachineMemOperand and how it is
printed when BaseAlign differs from Align.
2025-05-16 09:37:02 -07:00
Kazu Hirata
18ecff4f65
[llvm] Use llvm::stable_sort (NFC) (#140067) 2025-05-15 12:18:18 -07:00
Kazu Hirata
58014a506d
[IR] Teach getConstraintString to return StringRef (NFC) (#139401)
With this change, some callers get to use StringRef::starts_with.

I'm planning to teach getAsmString to return StringRef also, but
I'ld like to keep that separate from this patch.
2025-05-10 13:00:47 -07:00
Kazu Hirata
aa15596b5f
[llvm] Remove unused local variables (NFC) (#138478) 2025-05-04 21:33:54 -07:00
Sergei Barannikov
7af555e524
[ARM][RISCV] Partially revert #101786 (#137120)
The change as is breaks the Linux kernel build as pointed out in the
comments.
2025-04-24 10:13:05 +03:00
Sergei Barannikov
11a3de7e98
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786)
This is a reland of #99752 with the bug fixed (see test diff in the
third commit in this PR).
All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type
of the argument, which can be wider than `int`. The fix is to make DAG
legalizer pass the correct return type to `makeLibCall` and sign-extend
the result afterwards.

Original commit message:
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.

Pull Request: https://github.com/llvm/llvm-project/pull/101786
2025-04-23 12:43:05 +03:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
David Green
2e54b4f9ea [ARM] Silence signed comparison warning. NFC
After f4ec179bf5295f92aa0346392a58fad54f9b458e, AbsImm is no longer signed and cannot be < 0.
2025-03-31 13:59:58 +01:00
Kazu Hirata
fac8fe9cf9
[Target] Use *Set::insert_range (NFC) (#132879)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);

In some cases, we can further fold that into the set declaration.
2025-03-24 22:42:04 -07:00
Kazu Hirata
1b189cab5e
[llvm] Use *Set::insert_range (NFC) (#132509)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
2025-03-22 08:07:33 -07:00
Simon Tatham
f4ec179bf5
[ARM] Fix undefined behavior in isLegalAddImmediate (#132219)
Building clang under UBsan, it reported an integer overflow in this
function when the input value was -2^63, because it's UB to pass the
maximum negative value of an integer type to `std::abs`.

Fixed by adding a new absolute-value function in `MathExtras.h` whose
return type is the unsigned version of the argument type, and using that
instead.

(This seems like the kind of thing C++ should have already had, but
apparently it doesn't, and I couldn't find an existing one in LLVM
Support either.)
2025-03-21 09:03:50 +00:00
Craig Topper
d7879e524f
[ARM] Use DenseSet instead of DenseMap. NFC (#131978)
The value in the map is set to "true" when something is added to the
map.

Techncally this:

  if (!DefRegs.contains(Reg))

will set the value in the map to false if it didn't already exist, and
this is used to indicate the value wasn't in the map. This only occurs
after all the "true" values have already been added to the map.
2025-03-19 08:32:09 -07:00
David Green
86cf4ed7e9
[ARM] Speedups for CombineBaseUpdate. (#129725)
This attempts to put limits onto CombineBaseUpdate for degenerate cases
like #127477. The biggest change is to add a limit to the number of base
updates to check in CombineBaseUpdate. 64 is hopefully plenty high
enough for most runtime unrolled loops to generate postinc where they
are beneficial.

It also moves the check for isValidBaseUpdate later so that it only
happens if we will generate a valid instruction. The 1024 limit to
hasPredecessorHelper comes from the X86 backend, which uses the same
limit.

I haven't added a test case as it would need to be very big and my
attempts at generating a smaller version did not show anything useful.

Fixes #127477.
2025-03-06 09:35:12 +00:00
Kazu Hirata
f842a00b92
[ARM] Avoid repeated hash lookups (NFC) (#128994) 2025-02-27 01:44:58 -08:00
Nikita Popov
cc539138ac
[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880)
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.

Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
2025-02-19 10:16:57 +01:00
Nikita Popov
03cb46d248
[CodeGen] Use getSignedConstant() in more places (#127501)
Use getSignedConstant() in a few more places, based on a search of
`\bgetConstant(-`. Most of these were fine as-is (e.g. because they work
on 64-bits), but I think it's better to use getSignedConstant()
consistently for negative numbers.
2025-02-18 09:29:25 +01:00
Oliver Stannard
308ce8d524
[ARM] Fix calling convention for __fp16 with big-endian (#126741)
AAPCS32 defines the fp16 and bf16 types as being passed as if they were
extended to 32 bits, with the high 16 bits being unspecified. The
extension is specified as happening as-if it was done in a register,
which means that for big endian targets, the actual value gets passed in
the higher addressed half of the stack slot, instead of the lower
addressed half as for little endian. Previously, for targets with the
fp16 extension, we were passing these types as a 16 bit stack slot,
which worked for little endian because every later stack slot would be
4-byte aligned leaving the 2 byte gap, but was incorrect for big endian.
2025-02-13 09:09:29 +00:00
Philip Reames
e4016bf5c3 [DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask 2025-02-11 12:47:10 -08:00
yingopq
754ed95b66
[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525)
…on returning { i8, i128 }

Fixes https://github.com/llvm/llvm-project/issues/96432.
2025-01-20 16:47:40 +08:00
David Green
f87a9db832 [ARM] Expand fp64 bf16 converts similarly to f32
This helps with +fp64 targets where the f64s are legal and not previously
lowered. It can treat fpextends as a shift + cvt and fptrunc can use a libcall.
2025-01-03 11:28:31 +00:00
David Green
4472648998 [ARM] Expand bf16 expanding/rounding fp loads/stores
As with other fp types, these should be expanded to prevent nodes that are
illegal for Arm.
2024-12-20 09:03:28 +00:00
Craig Topper
f139bde8d8
[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536)
This allows us to write more range based for loops because we no
longer need the iterator. It also matches IR's Use class.
2024-12-19 09:07:42 -08:00
Craig Topper
e6b2495545
[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531)
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.

We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.

Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
2024-12-19 08:35:32 -08:00
David Green
e020f46027 [ARM] Fix BF16 lowering with FullFP16
This adds test coverage for bf16 instructions, making sure that lowering bf16
works with and without +fullfp16.
2024-12-19 10:20:35 +00:00
Craig Topper
bd261ecc5a
[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509)
Most of these are just places that want the first user and aren't
iterating over the whole list.

While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.

This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
2024-12-18 22:13:04 -08:00
Craig Topper
104ad9258a
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
2024-12-18 20:09:33 -08:00
David Green
a35db2880a [NFC] Remove some unnecessary semicolons
All inside LLVM_DEBUG, some of which have been cleaned up by adding block
scopes to allow them to format more nicely.
2024-12-16 08:48:57 +00:00
LiqinWeng
3083acc215
[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294)
This patch remove the restriction for folding (shl (add_nsw x, c1)), c2)
and folding (shl(sext(add x, c1)), c2), and test case from dhrystone ,
see this link:
riscv32: https://godbolt.org/z/o8GdMKrae
riscv64: https://godbolt.org/z/Yh5bPz56z
2024-12-10 11:17:54 +08:00
Sergei Barannikov
e0ed0333f0
Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887)
Re-landing #116970 after fixing miscompilation error.

The original change made it possible for CMPZ to have multiple uses;
`ARMDAGToDAGISel::SelectCMPZ` was not prepared for this.

Pull Request: https://github.com/llvm/llvm-project/pull/118887


Original commit message:

Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.
2024-12-07 10:14:36 +03:00
Martin Storsjö
2a5e1da57a
Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232)
Reverts llvm/llvm-project#116970.

This change broke Wine compiled for armv7, causing segfaults when
starting Wine. See llvm/llvm-project#116970 for more detailed discussion
about the issue.
2024-12-02 00:02:25 +02:00
Sergei Barannikov
a348f223ca
[ARM] Stop gluing ALU nodes to branches / selects (#116970)
Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.

Pull Request: https://github.com/llvm/llvm-project/pull/116970
2024-11-30 08:14:24 +03:00
Sergei Barannikov
ad9dcd96dc
Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

`TRI::getCrossCopyRegClass` is modified in a way that prevents DAG
scheduler from copying FPSCR into a virtual register. The register
allocator might need to spill the virtual register, but that only seem
to work in Thumb mode.
2024-11-22 22:29:58 +03:00
Sergei Barannikov
5d32a1409d
Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175)
Reverts llvm/llvm-project#116676

Reverting per post-commit feedback (causes miscompilation errors and/or
assertion failures).
2024-11-21 18:26:53 +03:00
Sergei Barannikov
8c56dd3040
[ARM] Stop gluing FP comparisons to FMSTAT (#116676)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.

Pull Request: https://github.com/llvm/llvm-project/pull/116676
2024-11-20 16:07:05 +03:00
Sergei Barannikov
aff98e4be0
[ARM] Stop gluing 1-bit shifts (#116547)
1. When two (or more) nodes are glued, DAG scheduler will always
schedule them as one piece, i.e. it will not allow any instructions to
be scheduled between them. It does so because if nodes are glued this
usually means that there is an implicit register dependency between
them, and an intervening node could clobber this physical register. When
emitting such nodes into machine IR, they will also be stuck together,
e.g.:
```
    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr
```

2. If a node has Glue result, SelectionDAG will not try to CSE this
node. If it did, it would break the implicit physical register
dependency. In practice this means that if a node with Glue result has
multiple uses, it has to be duplicated before each use. This the reason
for `ARMTargetLowering::duplicateCmp` to exist.

When using normal data dependency, dependent nodes can freely be
scheduled around. If there is a physical register dependency between
nodes, the physical register will be copied to/from a virtual register,
allowing other nodes to intervene between them. The resulting machine IR
might look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr
```

The two copies are likely to be eliminated by register coalescer, given
that there are no instructions between them that clobber this physical
register. If the copies are unwanted in the first place (they could be
expensive or impossible), DAG scheduler will try to avoid inserting them
wherever possible, and the resulting machine IR will look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr
```

On ARM, arithmetic operations and LSLS already use the new data flow
approach. This patch extends it to include 1-bit shifts.

Pull Request: https://github.com/llvm/llvm-project/pull/116547
2024-11-19 17:46:48 +03:00
Craig Topper
cde4ae789e [ARM] Use getSignedTargetConstant. NFC 2024-11-18 13:12:23 -08:00
Sergei Barannikov
baf59be89b
[SelectionDAG] Fix return types of TC_RETURN for several targets (#116504)
TC_RETURN nodes do not have a glue result.
2024-11-17 02:14:05 +03:00
Kazu Hirata
9571cc2b28
[ARM] Remove unused includes (NFC) (#115995)
Identified with misc-include-cleaner.
2024-11-12 23:15:21 -08:00
David Green
750247bc1c [ARM] Fix APInt assert for CSNEG known bits.
Use APInt::getAllOnes(32), to avoid the assert when constructing an APInt
from -1.
2024-11-11 19:39:02 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Craig Topper
50896e7ef5 [ARM] Use getSignedConstant. NFC 2024-10-30 21:43:16 -07:00
Oliver Stannard
376d7b27fa [ARM] Optimise byval arguments in tail-calls
We don't need to copy byval arguments to tail calls via a temporary, if
we can prove that we are not copying from the outgoing argument area.
This patch does this when the source if the argument is one of:
* Memory in the local stack frame, which can't be used for tail-call
  arguments.
* A global variable.

We can also avoid doing the copy completely if the source and
destination are the same memory location, which is the case when the
caller and callee have the same signature, and pass some arguments
through unmodified.
2024-10-25 09:34:09 +01:00
Oliver Stannard
914a3990d1 [ARM] Avoid clobbering byval arguments when passing to tail-calls
When passing byval arguments to tail-calls, we need to store them into
the stack memory in which this the caller received it's arguments. If
any of the outgoing arguments are forwarded from incoming byval
arguments, then the source of the copy is from the same stack memory.
This can result in the copy corrupting a value which is still to be
read.

The fix is to first make a copy of the outgoing byval arguments in local
stack space, and then copy them to their final location. This fixes the
correctness issue, but results in extra copying, which could be
optimised.
2024-10-25 09:34:09 +01:00