563 Commits

Author SHA1 Message Date
JaydeepChauhan14
5df173263b
[NFC] Initialize AtomicLoadExtActions array (#180752) 2026-02-10 22:52:35 +05:30
Matt Arsenault
a7d48bd305
DAG: Remove TypePromoteFloat (#177427)
Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.

Really fixes #97975
2026-01-26 22:02:24 +00:00
Chuanqi Xu
524589119a
[LLVM] Update the default value for MaxLargeFPConvertBitWidthSupported to 128 (#176851)
Previously, we can't compile the program which convert 256 bits to
floating points and vice versa(we'll crash). After this, we're able to
compile them.
2026-01-22 16:37:06 +08:00
Matt Arsenault
aa57ee958d
CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829)
Thread LibcallLoweringInfo into the TargetLowering hooks used
by the stack protector passes.
2026-01-20 12:37:31 +01:00
Matt Arsenault
85b6d43bf7
TargetLowering: Avoid getLibcallName in getSafeStackPointerLocation (#176362) 2026-01-16 13:37:02 +00:00
moorabbit
a5fa246435
[Clang] Add __builtin_stack_address (#148281)
Add support for `__builtin_stack_address` builtin. The semantics match
those of GCC's builtin with the same name.

`__builtin_stack_address` returns the starting address of the stack
region that may be used by called functions. It may or may not include
the space used for on-stack arguments passed to a callee (See [GCC
Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)).

Fixes #82632.
2026-01-12 10:01:57 +01:00
Ties Stuij
b28eeb28be
[CodeGen] Generalise Hexagon flags for memop inline thresholds (#172829)
Generalise the Hexagon cmdline options to control if memset, memcpy or memmove intrinsics should be inlined versus calling library functions, so they can be used by all backends:

	•	-max-store-memset
	•	-max-store-memcpy
	•	-max-store-memmove

These flags override the target-specific defaults set in TargetLowering (e.g., MaxStoresPerMemcpy) and allow fine-tuning of the inlining threshold for performance analysis and optimization.

The optsize variants (-max-store-memset-Os, -max-store-memcpy-Os, max-store-memmove-Os) from the Hexagon backend were removed, and now the above options control both.

The threshold is specified as a number of store operations, which is backend-specific. Operations requiring more stores than the threshold will call the corresponding library function instead of being inlined.
2026-01-09 12:08:35 +00:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra
9e5e267a03
[ISel] Introduce llvm.clmul intrinsic (#168731)
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.

Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.

Ref: https://isocpp.org/files/papers/P3642R3.html

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2026-01-05 20:24:06 +00:00
Craig Topper
1b43f5cec6
[RISCV][SelectionDAG] Add a ISD::CTLS node for count leading redundant sign bits. Use it to select CLS(W). (#173417)
The RISC-V P extension adds an instruction equivalent to
__builtin_clrsb. AArch64 has a similar instruction that we currently fail to
select when using the builtin.

This patch adds a combine based on the canonical version of the pattern
emitted by clang for the builtin, (add (ctlz (xor x, (sra x, bw-1)))),
-1). I'm starting the combine at the ctlz because the outer add can
easily be combined into other nodes obscuring the full pattern. So we
generate (add (ctls x), 1) and hope the add will be combined away.

I've also added a combine for the pattern AArch64 recognizes
(ctlz_zero_undef (or (shl (xor x, (sra x, bw-1)), 1), 1)).

I've only enabled the combines when the target has a Legal or Custom
action for the operation, taking into account type promotion. We
can relax this in the future by adding a default expansion to
LegalizeDAG and adding more type legalization rules.
2026-01-04 18:00:00 -08:00
Folkert de Vries
a587ccd87d
fix llvm.fma.f16 double rounding issue when there is no native support (#171904)
fixes https://github.com/llvm/llvm-project/issues/98389

As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.

I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
2025-12-17 22:03:01 +01:00
Matt Arsenault
a3aaa1a391
DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719)
This continues the replacement of TargetLibraryInfo uses in codegen
with RuntimeLibcallsInfo started in
821d2825a4f782da3da3c03b8a002802bff4b95c.
The series there handled all of the multiple result calls. This
extends for the other handled case, which happened to be frem.

For some reason the Libcall for these are prefixed with "REM_", for
the instruction "frem", which maps to the libcall "fmod".
2025-12-11 13:33:27 +00:00
Matt Arsenault
62832593b7
DAG: Use more RTLIB helper functions for getting libcall from type (#170563)
We had a set of utilities which was only used for some set of
floating point libcalls. Add more, most of which are for integer
operations.

Ideally we would generate these functions from tablegen.
2025-12-04 17:49:27 +01:00
Matt Arsenault
1c5b1501ca
CodeGen: Move libcall lowering configuration to subtarget (#168621)
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
2025-11-25 11:59:56 -05:00
Matt Arsenault
1d73b68463
TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)
Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
2025-11-20 20:44:25 -05:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Matt Arsenault
862d34666f
opt: Fix bad merge of #167996 (#168110)
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.

Also fix other build error with gcc bots.
2025-11-14 12:03:26 -08:00
Matt Arsenault
590ab43e8a
RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996)
This fixes the -fveclib flag getting lost on its way to the backend.

Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.

Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.

RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes 
are needed to thread that analysis through codegen. Ideally this would come 
from an IR module flag, and nothing would be in TargetOptions. For now, it's 
better for all of these sorts of controls to be consistent.
2025-11-14 11:19:21 -08:00
Matt Arsenault
4b9771e41a
DAG: Use modf vector libcalls through RuntimeLibcalls (#166986)
Copy new process from sincos/sincospi
2025-11-11 18:05:35 -08:00
Matt Arsenault
de68181d7f
DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984)
Copy new process from sincospi.
2025-11-11 10:51:23 -08:00
serge-sans-paille
af146462f9
Remove unused <iterator> inclusion
Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible
this improves compilation time, while not being too intrusive on the
codebase.
2025-11-11 13:33:38 +01:00
Matt Arsenault
b7423af8da
RuntimeLibcalls: Add entries for vector sincospi functions (#166981)
Add libcall entries for sleef and armpl sincospi implementations.
This is the start of adding the vector library functions; eventually
they should all be tracked here.

I'm starting with this case because this is a prerequisite to fix
reporting sincospi calls which do not exist on any common targets
without
regressing vector codegen when these libraries are available.
2025-11-10 10:56:33 -08:00
Matt Arsenault
056d2c12f7
RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (#164987)
Introduce a new class for the TargetLowering usage. This tracks the
subtarget specific lowering decisions for which libcall to use.
RuntimeLibcallsInfo is a module level property, which may have multiple
implementations of a particular libcall available. This attempts to be
a minimum boilerplate patch to introduce the new concept.

In the future we should have a tablegen way of selecting which
implementations should be used for a subtarget. Currently we
do have some conflicting implementations added, it just happens
to work out that the default cases to prefer is alphabetically
first (plus some of these still are using manual overrides
in TargetLowering constructors).
2025-11-05 17:10:36 +00:00
Matt Arsenault
28e9a2832f
DAG: Consider __sincos_stret when deciding to form fsincos (#165169) 2025-10-28 08:28:09 -07:00
Shimin Cui
531fd45e92
[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910)
Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`

However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.

---------

Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
2025-10-28 10:24:32 -04:00
Matt Arsenault
f5a2e6bb8f
CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044)
All 3 implementations are just checking if this has the
windows check function, so merge that as the only implementation.
2025-10-24 21:17:34 +09:00
Sam Parker
1820102167
Wasm fmuladd relaxed (#163177)
Reland #161355, after fixing up the cross-projects-tests for the wasm
simd intrinsics.

Original commit message:
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 16:50:53 +01:00
Sam Parker
30d3441cf0
Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)
Reverts llvm/llvm-project#161355

Looks like I've broken some intrinsic code generation.
2025-10-13 11:53:40 +01:00
Sam Parker
a4eb7ea225
[WebAssembly] Lower fmuladd to madd and nmadd (#161355)
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions.
If we have FP16, then lower v8f16 fmuladds to FMA.

I've introduced an ISD node for fmuladd to maintain the rounding
ambiguity through legalization / combine / isel.
2025-10-13 10:36:08 +01:00
Daniel Paoliello
f99b0f3de4
[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850)
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.

To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.

`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.

While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
2025-09-02 09:57:33 -07:00
Sam Tebbs
569d738d4e
[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007)
It can be unsafe to load a vector from an address and write a vector to
an address if those two addresses have overlapping lanes within a
vectorised loop iteration.

This PR adds intrinsics designed to create a mask with lanes disabled if
they overlap between the two pointer arguments, so that only safe lanes
are loaded, operated on and stored. The `loop.dependence.war.mask`
intrinsic represents cases where the store occurs after the load, and
the opposite for `loop.dependence.raw.mask`. The distinction between
write-after-read and read-after-write is important, since the ordering
of the read and write operations affects if the chain of those
instructions can be done safely.

Along with the two pointer parameters, the intrinsics also take an
immediate that represents the size in bytes of the vector element types.

This will be used by #100579.
2025-09-02 15:35:15 +01:00
daniel-trujillo-bsc
658a931c5b
[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication instructions. (#153033)
This PR adds support for VP intrinsics to be aware of the nontemporal
metadata information.
2025-08-27 15:48:33 -07:00
Matt Arsenault
65d12622fa
RuntimeLibcalls: Add entries for stackprotector globals (#154930)
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
2025-08-23 10:21:00 +09:00
Nikita Popov
498ef361fe
[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR
argument type to the CC lowering logic. However, in SDAG, this used the
raw type, prior to aggregate splitting. This PR changes it to use the
non-aggregate type instead. (This matches what happened in the
GlobalISel case already.)

I've also added some more detailed documentation on the
InputArg/OutputArg fields, to explain how they differ. In most cases
ArgVT is going to be the EVT of OrigTy, so they encode very similar
information (OrigTy just preserves some additional information lost in
EVTs, like pointer types). One case where they do differ is in
post-legalization lowering of libcalls, where ArgVT is going to be a
legalized type, while OrigTy is going to be the original non-legalized
type.
2025-08-13 18:42:26 +02:00
Stephen Long
19ada02086
PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744)
Similar to ab976a1, but for llvm.log.
2025-08-12 01:04:28 +09:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
Alexander Richardson
3a4b351ba1
[IR] Introduce the ptrtoaddr instruction
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:

1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
   index-width bits of the pointer

For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.

This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.

RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08 10:12:39 -07:00
Kazu Hirata
4be22dabc5
[CodeGen] Remove an unnecessary cast (NFC) (#152441)
getActiveBits() already returns unsigned.
2025-08-07 07:22:42 -07:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Abhishek Kaushik
1c0ac80d4a
[DAG] Combine store + vselect to masked_store (#145176)
Add a new combine to replace
```
(store ch (vselect cond truevec (load ch ptr offset)) ptr offset)
```
to
```
(mstore ch truevec ptr offset cond)
```

This saves a blend operation on targets that support conditional stores.
2025-08-04 19:05:36 +05:30
jeremyd2019
28b3190053
[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638)
Cygwin and MinGW share the auto import behavior that could result in
__stack_check_guard being non-dso-local. Allow windres to assume a
Cygwin target as well as a MinGW one, so defines like _WIN32 would not
be present on Cygwin.
2025-07-29 10:01:04 -07:00
Nikita Popov
fe0dbe0f29
[CodeGen] More consistently expand float ops by default (#150597)
These float operations were expanded for scalar f32/f64/f128, but not
for f16 and more problematically, not for vectors. A small subset of
them was separately set to expand for vectors.

Change these to always expand by default, and adjust targets to mark
these as legal where necessary instead.

This is a much safer default, and avoids unnecessary legalization
failures because a target failed to manually mark them as expand.

Fixes https://github.com/llvm/llvm-project/issues/110753.
Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-28 09:46:00 +02:00
Matt Arsenault
f4a394fc0c
SafeStack: Check if __safestack_pointer_address is available (#147917)
Start using RuntimeLibcalls in the base implementation of
getSafeStackPointerLocation instead of hardcoding the function
names.
2025-07-15 23:26:52 +09:00
Matt Arsenault
a446300d1b
TargetLowering: Avoid a use of PointerType::getUnqual (#147884)
Use the default globals address space
2025-07-10 19:00:59 +09:00
Matt Arsenault
dc69b00b0a
RuntimeLibcalls: Remove table of soft float compare cond codes (#146082)
Previously we had a table of entries for every Libcall for
the comparison to use against an integer 0 if it was a soft
float compare function. This was only relevant to a handful of
opcodes, so it was wasteful. Now that we can distinguish the
abstract libcall for the compare with the concrete implementation,
we can just directly hardcode the comparison against the libcall
impl without this configuration system.
2025-07-09 17:13:58 +09:00
Matt Arsenault
3697d6dd98
DAG: Fall back to separate sin and cos when softening sincos (#147468)
Fix asserting in the error case.
2025-07-09 01:52:46 +09:00
Dominik Steenken
acdf1c7526
[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.

Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-08 16:12:17 +01:00
Matt Arsenault
b5401624e1
DAG: Add RTLIB::getPOW helper (#147274)
Co-authored-by: Paul Walker <paul.walker@arm.com>
2025-07-07 21:31:49 +09:00
Austin
a550fef906
[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)
Use llvm::fill instead of std::fill
2025-07-04 14:10:28 +08:00