2417 Commits

Author SHA1 Message Date
AZero13
733c1aded1
[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717)
Just do a custom lowering instead.

Also copy paste the cmov-neg fold to prevent regressions in nabs.
2025-09-19 19:43:36 +01:00
Nikita Popov
1723f80b08
[ARM] Allow s constraints on half (#157860)
Fix a regression from https://github.com/llvm/llvm-project/pull/147559.
2025-09-11 08:50:32 +02:00
paperchalice
667f919214
[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647)
Factor out from #151275.
2025-09-06 20:44:25 +08:00
woruyu
22fb21a64e
[DAG][ARM] canCreateUndefOrPoisonForTargetNode - ARMISD VORRIMM\VBICIMM nodes can't create poison/undef (#156831)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/156640
2025-09-05 16:40:02 +08:00
woruyu
010f1ea3b3
[DAG][ARM] ComputeKnownBitsForTargetNode - add handling for ARMISD VORRIMM\VBICIMM nodes (#149494)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/147179
2025-09-04 15:56:31 +08:00
Nikita Popov
3f757a39f2
[CodeGen] Remove ExpandInlineAsm hook (#156617)
This hook replaces inline asm with LLVM intrinsics. It was intended to
match inline assembly implementations of bswap in libc headers and
replace them more optimizable implementations.

At this point, it has outlived its usefulness (see
https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412),
as libc implementations no longer use inline assembly for this purpose.

Additionally, it breaks the "black box" property of inline assembly,
which some languages like Rust would like to guarantee.

Fixes https://github.com/llvm/llvm-project/issues/156571.
2025-09-04 09:28:11 +02:00
Daniel Paoliello
f99b0f3de4
[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850)
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.

To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.

`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.

While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
2025-09-02 09:57:33 -07:00
AZero13
2259a80c7d
[ARM] Simplify LowerCMP (NFC) (#156198)
Pass the opcode directly.
2025-08-31 15:45:12 +01:00
Min-Yih Hsu
acaa925cb2
[IA][RISCV] Recognize interleaving stores that could lower to strided segmented stores (#154647)
This is a sibling patch to #151612: passing gap masks to the renewal TLI
hooks for lowering interleaved stores that use shufflevector to do the
interleaving.
2025-08-26 13:22:42 -07:00
AZero13
79dfe48865
[ARM] Set isCheapToSpeculateCtlz as true for hasV5TOps and no Thumb 1 (#154848)
This is so that we don't expand to include unneeded 0 checks.

Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1
in Fast-ISEL.

Finally, Remove the README entry regarding this issue.
2025-08-25 12:43:48 -07:00
Kazu Hirata
e9045b3cea
[ARM] Remove an unnecessary cast (NFC) (#155206)
getType() already returns Type *.
2025-08-25 07:33:34 -07:00
Matt Arsenault
65d12622fa
RuntimeLibcalls: Add entries for stackprotector globals (#154930)
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
2025-08-23 10:21:00 +09:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Matt Arsenault
4aae7bc625
ARM: Move half convert libcall config to tablegen (#153389) 2025-08-14 17:35:58 +09:00
Matt Arsenault
bbcac029db
ARM: Move more aeabi libcall config into tablegen (#152109) 2025-08-14 15:43:15 +09:00
Matt Arsenault
32f1fe3770
ARM: Move calling conv config to RuntimeLibcalls (#152065)
Consolidate module level ABI into RuntimeLibcalls
2025-08-14 08:36:03 +09:00
David Green
06d2d1e156
[ARM] Protect against odd sized vectors in isVTRNMask and friends (#153413)
Fixes the issue reported on #153138, where odd-sized vectors would cause
the checks to iterate off the end of the mask.
2025-08-13 20:57:46 +01:00
Min-Yih Hsu
ca05058b49
[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612)
Turn the following deinterleaved load patterns
```
%l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
%f2 = shufflevector %l, [2, 5, 8, 11]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
%f2 = poison
```
The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component.

Similarly,  turning the following snippet
```
%l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
```

Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-08-12 14:08:18 -07:00
AZero13
6a425f1e54
[ARM] Have custom lowering for ucmp and scmp (#149315)
Limited to non-thumb1 for scmp at the moment, since there is no good way
to do it.
2025-08-08 06:51:18 +01:00
Kazu Hirata
62fc0028bf
[Target] Remove unnecessary casts (NFC) (#152262)
value() already returns uint64_t.
2025-08-06 07:11:07 -07:00
eleviant
907b7d0f07
[ARM] Fix inline asm register validation for vector types (#152175)
Patch allows following piece of code to be successfully compiled:
```
register uint8x8_t V asm("d3") = vdup_n_u8(0xff);
```
2025-08-06 10:30:49 +02:00
Matt Arsenault
342bf58f93
RuntimeLibcalls: Add entries for __security_check_cookie (#151843)
Avoids hardcoding string name based on target, and gets
the entry in the centralized list of emitted calls.
2025-08-06 10:26:36 +09:00
Matt Arsenault
d44754c344
ARM: Remove redundant or buggy config of __aeabi_d2h (#152126)
This was set if `TT.isTargetAEABI()`. This was previously set above
if `TM.isAAPCS_ABI() && (TT.isTargetAEABI() || TT.isTargetGNUAEABI() ||
                         TT.isTargetMuslAEABI() || TT.isAndroid())`.

So this could differ based on a manually specified -target-abi flag due
to the `isAAPCS_ABI` part of the original condition. I'm guessing
these should be consistent, so either this second group of
setLibcallImpl
calls should have been guarded by the `isAAPCS_ABI` check, or the first
condition should remove it.

There doesn't appear to be any meaningful test coverage using the
manually specified ABI option, so #152108 tries to remove it
2025-08-06 08:48:01 +09:00
Matt Arsenault
1392edcc07
ARM: Remove idiv runtime call aliases (#152098)
Really only the i32 variants exist. We don't need synthetic
aliases for illegal types which will be promoted.
2025-08-05 17:49:22 +09:00
AZero13
23022a4683
[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736)
This works on all cases much like the XOR case above it in SelectionDAG.
2025-08-01 14:46:51 -07:00
Prabhu Rajasekaran
17ccb849f3
[llvm] Extract and propagate callee_type metadata
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
2025-07-30 14:56:39 -07:00
Nikita Popov
fe0dbe0f29
[CodeGen] More consistently expand float ops by default (#150597)
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.

Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.

This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.

Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-28 09:46:00 +02:00
eleviant
a4796b14fc
[ARM] Emit error message when incompatible reg is specified (#147559)
At the moment the following piece of code causes undefined behavior:
```
int a;
void b() {
   register float d2 asm("d2") = a;
   asm("" ::"r"(d2));
}
```
This happens because variable and register types are incompatible.
2025-07-24 19:19:25 +02:00
Philip Reames
dbd9eae95a
[IA] Support vp.store in lowerinterleavedStore (#149605)
Follow up to 28417e64, and the whole line of work started with 4b81dc7.

This change merges the handling for VPStore - currently in
lowerInterleavedVPStore - into the existing dedicated routine used in
the shuffle lowering path. This removes the last use of the dedicated
lowerInterleavedVPStore and thus we can remove it.

This contains two changes which are functional.

First, like in 28417e64, merging support for vp.store exposes the
strided store optimization for code using vp.store.

Second, it seems the strided store case had a significant missed
optimization. We were performing the strided store at the full unit
strided store type width (i.e. LMUL) rather than reducing it to match
the input width. This became obvious when I tried to use the mask
created by the helper routine as it caused a type incompatibility.

Normally, I'd try not to include an optimization in an API rework, but
structuring the code to both be correct for vp.store and not optimize
the existing case turned out be more involved than seemed worthwhile. I
could pull this part out as a pre-change, but its a bit awkward on it's
own as it turns out to be somewhat of a half step on the possible
optimization; the full optimization is complex with the old code
structure.

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-07-22 15:50:17 -07:00
Philip Reames
28417e6459
[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174)
This continues in the direction started by commit 4b81dc7. We
essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad - into the existing dedicated routine. This
removes the last use of the dedicate lowerInterleavedVPLoad and thus we
can remove it.

This isn't quite NFC as the main callback has support for the strided
load optimization whereas the VPLoad specific version didn't. So this
adds the ability to form a strided load for a vp.load deinterleave with
one shuffle used.
2025-07-17 17:29:28 -07:00
Kazu Hirata
2da59287aa
[Target] Remove unnecessary casts (NFC) (#149342)
getFunction().getParent() already returns Module *.
2025-07-17 15:24:25 -07:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
Simon Pilgrim
82a276e610
[ARM][WebAssembly] Remove unused PatternMatch namespace. NFC. (#147984)
Avoid file-level "using namespace llvm::PatternMatch" to make it easier to potentially use SDPatternMatch in the future.
2025-07-11 10:24:43 +01:00
AZero13
0edc98cd6d
[ARM] Copy SMAX(lhs, 0) and SMIN(lhs, 0) patterns from AArch64 to ARM (#146565)
They work on ARM too.
2025-07-10 21:06:52 +01:00
Matt Arsenault
d801b54bcd
ARM: Fix calling convention for gnu half conversion functions (#147951)
I'm surprised at how bad the test coverage is here. There is some
overlap with existing tests, but they aren't comprehensive and do
not cover all the ABIs, or all the different types.

Fixes #147935
2025-07-10 22:47:44 +09:00
Boyao Wang
697beb3f17
[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
2025-07-10 11:11:09 +08:00
Matt Arsenault
deade03910
ARM: Unconditionally set eabi libcall calling convs in RuntimeLibcalls (#146083)
This fully consolidates all the calling convention configuration into
RuntimeLibcallInfo. I'm assuming that __aeabi functions have a universal
calling convention, and on other ABIs just don't use them. This will
enable splitting of RuntimeLibcallInfo into the ABI and lowering component.
2025-07-09 17:16:48 +09:00
Matt Arsenault
dc69b00b0a
RuntimeLibcalls: Remove table of soft float compare cond codes (#146082)
Previously we had a table of entries for every Libcall for
the comparison to use against an integer 0 if it was a soft
float compare function. This was only relevant to a handful of
opcodes, so it was wasteful. Now that we can distinguish the
abstract libcall for the compare with the concrete implementation,
we can just directly hardcode the comparison against the libcall
impl without this configuration system.
2025-07-09 17:13:58 +09:00
Matt Arsenault
dd9646565e ARM: Move sjlj libcall configuration to RuntimeLibcalls
Manually submitting, closes #147227
2025-07-08 13:52:32 +09:00
Matt Arsenault
591b0d0fdf
RuntimeLibcalls: Associate calling convention with libcall impls (#144979)
Instead of associating the libcall with the RTLIB::Libcall, put it
into a table indexed by the RTLIB::LibcallImpl. The LibcallImpls
should contain all ABI details for a particular implementation, not
the abstract Libcall. In the future the wrappers in terms of the
RTLIB::Libcall should be removed.
2025-07-08 10:20:52 +09:00
Matt Arsenault
2bd31edd57 ARM: Remove subtarget field tracking SjLj
This is a module level property that needs to be globally
consistent, so it does not belong in the subtarget.

Now that the Triple knows the default exception handling type,
consolidate the interpretation of None as select target default
exception handling in TargetMachine and use that. This enables
moving the configuration of UNWIND_RESUME to RuntimeLibcalls.

Manually submitting, closes #147226
2025-07-08 09:56:34 +09:00
Jay Foad
17d6aa01ec
[ARM] Fix expansion of ABS in a call sequence (#147270)
Fixes #147162
2025-07-07 15:52:37 +01:00
Matt Arsenault
d8ef156379
DAG: Remove verifyReturnAddressArgumentIsConstant (#147240)
The intrinsic argument is already marked with immarg so non-constant
values are rejected by the IR verifier.
2025-07-07 16:28:47 +09:00
AZero13
7d65cb1952
[ARM] Copy (SELECT_CC setgt, iN lhs, -1, 1, -1) -> (OR (ASR lhs, N-1), 1 from AArch64 to ARM (#146561)
It works perfectly for ARM too.
2025-07-05 18:17:33 +01:00
David Green
9fcea2e465 [ARM] Add neon vector support for roundeven
As per #142559, this marks froundeven as legal for Neon and upgrades the
existing arm.neon.vrintn intrinsics.
2025-07-04 15:27:33 +01:00
David Green
ec35065789 [ARM] Add neon vector support for rint
As per #142559, this marks frint as legal for Neon and upgrades the existing
arm.neon.vrintx intrinsics.
2025-07-03 21:27:48 +01:00
David Green
1f8f477bd0 [ARM] Add neon vector support for trunc
As per #142559, this marks ftrunc as legal for Neon and upgrades the existing
arm.neon.vrintz intrinsics.
2025-07-03 07:41:13 +01:00
David Green
5332534b9c [ARM] Add neon vector support for ceil
As per #142559, this marks fceil as legal for Neon and upgrades the existing
arm.neon.vrintp intrinsics.
2025-07-01 15:41:10 +01:00
David Green
6bd9ff04af [ARM] Add neon vector support for round
As per #142559, this marks fround as legal for Neon and upgrades the existing
arm.neon.vrinta intrinsics.
2025-06-30 17:15:26 +01:00
David Green
dcc9e36b18
[ARM] Add neon vector support for floor (#142559)
This marks ffloor as legal providing that armv8 and neon is present (or
fullfp16 for the fp16 instructions). The existing arm_neon_vrintm
intrinsics are auto-upgraded to llvm.floor.

If this is OK I will update the other vrint intrinsics.
2025-06-29 11:37:16 +01:00