2246 Commits

Author SHA1 Message Date
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
Qinkun Bao
2a74e02a90
Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#180352)
Reverts llvm/llvm-project#174341

Break https://lab.llvm.org/buildbot/#/builders/24/builds/17324
2026-02-07 16:47:17 +00:00
Haoren Wang
9e8caa7834
[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341)
## Summary
Fix null pointer dereference in
`SelectionDAGBuilder::resolveDanglingDebugInfo`.

## Problem
`Val.getNode()->getIROrder()` is called before checking if
`Val.getNode()` is null, causing crashes when compiling code with debug
info that contains aggregate constants with nested empty structs.

## Solution
Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())`
block.

## Test Case
Reproduces with aggregate types containing nested empty structs:
```llvm
%3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } 
     { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, 
     ptr %2, 1, !dbg !893

## Crash stack
0.      Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null
1.      Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1  libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80
2  libLLVM.so.20.1 0x00007ff87ebbe640
3  libpthread.so.0 0x00007ff87db79140
4  libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const*, llvm::SDValue) + 303
5  libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 142
6  libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*) + 3343
7  libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 100
8  libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603
9  libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327
10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72
11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956
12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372
13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169
14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610
15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638
16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51
17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105
18 llc-20          0x000055972ce77dc1 main + 9649
19 libc.so.6       0x00007ff87d68ad7a __libc_start_main + 234
20 llc-20          0x000055972ce7247a _start + 42
```

## Testing

Added regression tests in:
- `CodeGen/X86/selectiondag-dbgvalue-null-crash.ll`
- `CodeGen/AArch64/selectiondag-dbgvalue-null-crash.ll`

**Note:** Tests appear to expose deeper issues in DWARF generation on
certain targets (Darwin targets for example) that require further
investigation.

## Related PRs

This supersedes:
- #173500 - Initial fix, reverted due to test failures on Darwin and
other platforms
- #173836 - Second attempt with `UNSUPPORTED: system-darwin`, still
failed on some targets
2026-02-07 13:00:30 +01:00
Peter Collingbourne
191af6c254
Add llvm.cond.loop intrinsic.
The llvm.cond.loop intrinsic is semantically equivalent to a conditional
branch conditioned on ``pred`` to a basic block consisting only of an
unconditional branch to itself. Unlike such a branch, it is guaranteed
to use specific instructions. This allows an interrupt handler or
other introspection mechanism to straightforwardly detect whether
the program is currently spinning in the infinite loop and possibly
terminate the program if so. The intent is that this intrinsic may
be used as a more efficient alternative to a conditional branch to
a call to ``llvm.trap`` in circumstances where the loop detection
is guaranteed to be present. This construct has been experimentally
determined to be executed more efficiently (when the branch is not taken)
than a conditional branch to a trap instruction on AMD and older Intel
microarchitectures, and is also more code size efficient by avoiding the
need to emit a trap instruction and possibly a long branch instruction.

On i386 and x86_64, the infinite loop is guaranteed to consist of a short
conditional branch instruction that branches to itself. Specifically,
the first byte of the instruction will be between 0x70 and 0x7F, and
the second byte will be 0xFE.

Part of this RFC:
https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456

Reviewers: arsenm, RKSimon, fmayer, vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/177686
2026-02-06 17:11:15 -08:00
keremsahn
f6e130682f
[SelectionDAG] Mark LowerTypeTests as required and remove intrinsic handling from #142939 (#179249)
Fixes #179125
2026-02-05 11:16:48 +01:00
Nicolai Hähnle
af836ff60c
[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843)
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.

Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.

GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.

Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
2026-02-02 21:58:42 +00:00
zhijian lin
dc520ea4af
[PowerPC] using milicode call for strcmp instead of lib call (#177009)
1. AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcmp routine is a millicode implementation;
we use millicode for the strcmp function instead of a library call to
improve performance.
2026-02-02 09:34:53 -05:00
Wei Xiao
ea251669ba
[CodeGen] Fix MachineMemOperand Size of MaskedLoad (#156398)
Fix MIR printing unknown-size issue of MaskedLoad.
2026-01-29 18:37:49 +00:00
Jameson Nash
b7c1a6f8b4
[CodeGen] Only use actual alloca alignment (#178361)
Remove getPrefTypeAlign calls and use only the alloca's explicit
alignment, since the type may not be semantically useful, there is no
useful reason to change alignment to support it.

The alloca's explicit alignment (from getAlign()) is already optimally
correct; we don't need to derive alignment from the allocated type.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 22:49:19 -05:00
Nikita Popov
1bad00adc4
[SDAG] Remove non-canonical fabs libcall handling (#177967)
This is a followup to https://github.com/llvm/llvm-project/pull/171288,
which removed lowering of libcalls to SDAG nodes for most libcalls that
get unconditionally canonicalized to intrinsics. This handles the
remaining fabs case, which I originally skipped due to larger test
impact.
2026-01-26 15:11:17 +00:00
Luke Lau
cee36b23cc
[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693)
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.

Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.

@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.

In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.

The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
2026-01-21 10:58:40 +00:00
Matt Arsenault
0d4a35d560
IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484)
These are long overdue for removal. These were originally a hack
to support loading half values before there was any / decent support
for the half type through the backend. There's no reason to continue
supporting these, they're equivalent to fpext/fptrunc with a bitcast.

SelectionDAG stopped translating these directly, and used the
bitcast + fp cast since f7a02c17628e825, so there's been no reason
to use these since 2014.
2026-01-21 09:50:28 +00:00
Matt Arsenault
aa57ee958d
CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829)
Thread LibcallLoweringInfo into the TargetLowering hooks used
by the stack protector passes.
2026-01-20 12:37:31 +01:00
Jameson Nash
ba2bd3fbba
Use AllocaInst::getAllocationSize instead of manual size calculations (#176486)
Replace patterns that manually compute allocation sizes by multiplying
getTypeAllocSize(getAllocatedType()) by the array size with calls to the
getAllocationSize(DL) API, which handles this correctly and concisely,
returning nullopt for VLAs.

This fixes several places that were not accounting for array allocations
when computing sizes, simplifies code that was doing this manually, and
adds some explicit isFixed checks where implied convert was being used.

This PR is because now that we have opaque pointers, I hate that some
AllocaInst still has type information being consumed by some passes
instead of just using the size, since passes rarely handle that type
information well or correctly. I hope this will grow into a sequence of
commits to slowly eliminate uses of getAllocatedType from AllocaInst.
And similarly later to remove type information from GlobalValue too (it
can be replaced with just dereferenceable bytes, similar to arguments).

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 09:55:52 -05:00
Nikita Popov
792670a400
[X86][WinEH] Insert nop after unwinding inline assembly (#176393)
As discussed on https://github.com/llvm/llvm-project/pull/144745, insert
a nop after unwinding inline assembly, as it may end on a call.

While the change itself is trivial, I ended up having to do two
infrastructure changes:
* The unwind flag needs to be propagated to ExtraInfo of the
MachineInstr.
* The MachineInstr needs to be passed through to emitInlineAsmEnd(), and
the method needs to be non-const.

Fixes https://github.com/llvm/llvm-project/issues/157073.
2026-01-19 09:09:04 +01:00
zhijian lin
7b90f426a6
[PowerPC] using milicode call for strstr instead of lib call (#176002)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strstr routine is a millicode implementation;
we use millicode for the strstr function instead of a library call to
improve performance.

I add a helper function `getRuntimeCallSDValueHelper` in the patch. I
will refactor the function `SelectionDAG::getStrlen`
`SelectionDAG::getStrcpy` etc later in another patch.
2026-01-15 14:58:17 -05:00
Ramkumar Ramachandra
d69335bac9
[LLVM] Clean up code using [not_]equal_to (NFC) (#175824)
Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce
bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner
code.
2026-01-13 21:19:39 +00:00
zhijian lin
b983b0e92a
[PowerPC] using milicode call for strcpy instead of lib call (#174782)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcpy routine is a millicode implementation;
we use millicode for the strcpy function instead of a library call to
improve performance.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2026-01-12 08:58:45 -05:00
moorabbit
a5fa246435
[Clang] Add __builtin_stack_address (#148281)
Add support for `__builtin_stack_address` builtin. The semantics match
those of GCC's builtin with the same name.

`__builtin_stack_address` returns the starting address of the stack
region that may be used by called functions. It may or may not include
the space used for on-stack arguments passed to a callee (See [GCC
Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)).

Fixes #82632.
2026-01-12 10:01:57 +01:00
Luke Lau
ad4bfac732
[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel

In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.

The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.

This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.

Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.

Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra
9e5e267a03
[ISel] Introduce llvm.clmul intrinsic (#168731)
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.

Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.

Ref: https://isocpp.org/files/papers/P3642R3.html

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2026-01-05 20:24:06 +00:00
Benjamin Maxwell
fe3b4f0e0d
[SDAG] Use reference type in loop (NFC) (#174379)
Fixes a -Wrange-loop-construct warning.
2026-01-05 10:42:41 +00:00
Benjamin Maxwell
a9fee3127a
[SDAG] Avoid crash when creating debug fragments for scalable vectors (#165233)
Previously, we would crash in the SelectionDAGBuilder when attempting to
create debug fragments for scalable vectors split across multiple
registers.

It does not seem like DW_OP_LLVM_fragment supports any notion of
scalable type sizes. It takes both an offset and typesize as literals,
with no indication of scalability (and it also does not seem to be
considered in any of the places that handle DW_OP_LLVM_fragment). So the
workaround here is to drop the debug info.

Note: This is not usually an issue for IR that comes from the SVE ACLE,
as we generally stick to using legal types there (that don't end up
getting split).

Workaround for: #161289
2026-01-04 09:53:58 +00:00
Sergei Barannikov
4534edb3f7
[SelectionDAG] Fix operand of BRCOND in visitSPDescriptorParent (#174230)
The first operand should be a chain, but `GuardVal.getOperand(0)` isn't
always a chain (i.e. if `TLI.emitStackGuardXorFP()` is called). Use
`getControlRoot()` instead like in other places when creating terminator
nodes.

Extracted from #168421.
2026-01-02 19:08:28 +00:00
Leandro Lupori
25acd42fcc
Revert "[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114)" (#173987)
This reverts commit b6bfa856860bb4304e635102872a4c994af101b4.

This commit broke Windows on Arm bots.
2025-12-30 10:58:01 -03:00
Nikita Popov
8ea8f682f7
Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#173925)
Reverts llvm/llvm-project#173500.

Test fails depending on the host system.
2025-12-29 22:05:17 +00:00
Mikołaj Piróg
25d2a5b51f
[NFC] Rename variables to FPOp (#173792)
In my earlier PR (https://github.com/llvm/llvm-project/pull/167574),
I've named a variable in fpext function wrong. I've changed the name in
both functions to generic FPOp
2025-12-28 22:00:01 +01:00
Islam Imad
7ceecfad40
[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413)
Fixes #171608
2025-12-28 18:51:18 +00:00
MetalOxideSemi
7a3bbf724d
[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#173500)
## Summary
Fix null pointer dereference in
`SelectionDAGBuilder::resolveDanglingDebugInfo`.

## Problem
`Val.getNode()->getIROrder()` is called before checking if
`Val.getNode()` is null, causing crashes when compiling code with debug
info that contains aggregate constants with nested empty structs.

## Solution
Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())`
block.

## Test Case
Reproduces with aggregate types containing nested empty structs:
```llvm
%3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } 
     { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, 
     ptr %2, 1, !dbg !893

## Crash stack
0.      Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null
1.      Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1  libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80
2  libLLVM.so.20.1 0x00007ff87ebbe640
3  libpthread.so.0 0x00007ff87db79140
4  libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const*, llvm::SDValue) + 303
5  libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 142
6  libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*) + 3343
7  libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 100
8  libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603
9  libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327
10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72
11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956
12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372
13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169
14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610
15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638
16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51
17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105
18 llc-20          0x000055972ce77dc1 main + 9649
19 libc.so.6       0x00007ff87d68ad7a __libc_start_main + 234
20 llc-20          0x000055972ce7247a _start + 42
2025-12-28 18:00:46 +00:00
Craig Topper
877df9e4b9
[SelectionDAG] Make SSHLSAT/USHLSAT obey getShiftAmountTy(). (#173216)
Treat these like other shift operations by allowing the shift amount to
be a different type than the result.

The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack
of target support.

I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need
this support for that since RISC-V only has one legal type. I just thought it
was odd that they weren't like other shifts.
2025-12-22 10:28:04 -08:00
Jonas Paulsson
100077dbff
[SelectionDAGBuilder] Don't add base offset in LowerFormalArguments(). (#170732)
LowerCallTo() and LowerArguments() are both providing the PartOffset field for
each split argument part. As these two methods are intended to work together,
they should both provide the same offsets. However, LowerArguments()  has been
providing the offset from the beginning of the struct while LowerCallTo() sets it
relative to the first split part.

This patch removes the PartBase variable in LowerArguments() so that the behavior
matches LowerCallTo(): offsets to split parts of an argument are relative to the first
part of the argument.
2025-12-19 11:27:07 -06:00
Pan Tao
b6bfa85686
[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114)
This strengthens the guard and matches MSVC.

Fixes #156573 .
2025-12-17 12:52:28 -08:00
Sam Tebbs
19e1011df5
[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565)
Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.

The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.

It is defined as the OR of:

* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`

Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:

* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`

However, with the current opcode definitions, we can only modify ptrA or
ptrB, which may change the result of the first case, which should be
invariant to the lane.

This patch resolves these cases by adding a "lane offset" to the ISD
opcodes. The lane offset is always a constant. For scalable masks, it is
implicitly multiplied by vscale.

This makes splitting trivial as we increment the lane offset by
`LoVT.getElementCount()` now.

Note: In the AArch64 backend, we only support zero lane offsets (as
other cases are tricky to lower to whilewr/rw).

---------

Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
2025-12-12 08:44:33 +00:00
Nikita Popov
5a24dfa339
[SDAG] Remove most non-canonical libcall handing (#171288)
This is a followup to https://github.com/llvm/llvm-project/pull/171114,
removing the handling for most libcalls that are already canonicalized
to intrinsics in the middle-end. The only remaining one is fabs, which
has more test coverage than the others.
2025-12-10 11:45:26 +01:00
Nikita Popov
d5b3ba6596
[SDAG] Don't handle non-canonical libcalls in SDAG lowering (#171114)
SDAG currently tries to lower certain libcalls to ISD opcodes. However,
many of these are already canonicalized from libcalls to intrinsic in
the middle-end (and often already emitted as intrinsics in the
front-end).

I believe that SDAG should not be doing anything for such libcalls. This
PR just drops a single libcall to get consensus on the direction, as
these changes need a non-trivial amount of test updates.

A lot of the remaining libcalls *should* probably also be canonicalized
to intrinsics in the middle-end when annotated with `memory(none)`, but
that would require additional work in SimplifyLibCalls.
2025-12-09 08:07:33 +01:00
Robert Imschweiler
e84fdbe1ef
[IR] Add CallBr intrinsics support (#133907)
This commit adds support for using intrinsics with callbr. The uses of
this will most of the time look like this example:
```llvm
  callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill]
kill:
  unreachable
cont:
  ...
```
2025-12-04 10:21:00 +01:00
Luke Lau
d1500d12be
[SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764)
Similar to how getElementCount avoids the need to reason about fixed and
scalable ElementCounts separately, this patch adds getTypeSize to do the
same for TypeSize.

It also goes through and replaces some of the manual uses of getVScale
with getTypeSize/getElementCount where possible.
2025-12-01 10:33:50 +00:00
Peter Collingbourne
6227eb90da
Add IR and codegen support for deactivation symbols.
Deactivation symbols are a mechanism for allowing object files to disable
specific instructions in other object files at link time. The initial use
case is for pointer field protection.

For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/133536
2025-11-26 12:37:09 -08:00
Drew Kersnar
17852deda7
[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387)
This backend support will allow the LoadStoreVectorizer, in certain
cases, to fill in gaps when creating load/store vectors and generate
LLVM masked load/stores
(https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To
accomplish this, changes are separated into two parts. This first part
has the backend lowering and TTI changes, and a follow up PR will have
the LSV generate these intrinsics:
https://github.com/llvm/llvm-project/pull/159388.

In this backend change, Masked Loads get lowered to PTX with `#pragma
"used_bytes_mask" [mask];`
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
And Masked Stores get lowered to PTX using the new sink symbol syntax
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st).

# TTI Changes
TTI changes are needed because NVPTX only supports masked loads/stores
with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to
check that the mask is constant and pass that result into the TTI check.
Behavior shouldn't change for non-NVPTX targets, which do not care
whether the mask is variable or constant when determining legality, but
all TTI files that implement these API need to be updated.

# Masked store lowering implementation details
If the masked stores make it to the NVPTX backend without being
scalarized, they are handled by the following:
* `NVPTXISelLowering.cpp` - Sets up a custom operation action and
handles it in lowerMSTORE. Similar handling to normal store vectors,
except we read the mask and place a sentinel register `$noreg` in each
position where the mask reads as false.

For example, 
```
t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>
t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10

->

STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1);

```

* `NVPTXInstInfo.td` - changes the definition of store vectors to allow
for a mix of sink symbols and registers.
* `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_".

# Masked load lowering implementation details
Masked loads are routed to normal PTX loads, with one difference: a
`#pragma "used_bytes_mask"` is emitted before the load instruction
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
To accomplish this, a new operand is added to every NVPTXISD Load type
representing this mask.
* `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal
NVPTXISD loads with a mask operand in two ways. 1) In type legalization
through replaceLoadVector, which is the normal path, and 2) through
LowerMLOAD, to handle the legal vector types
(v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both
share the same convertMLOADToLoadWithUsedBytesMask helper. Both default
this operand to UINT32_MAX, representing all bytes on. For the latter,
we need a new `NVPTXISD::MLoadV1` type to represent that edge case
because we cannot put the used bytes mask operand on a generic
LoadSDNode.
* `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them
to created machine instructions.
* `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask
isn't all ones.
* `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update
manual indexing of load operands to account for new operand.
* `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to
the MI definitions.
* `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get
tagged as invariant.

Some generic changes that are needed:
* `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting
masked loads.
* `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked
load SDNode creation
2025-11-25 10:26:15 -06:00
Matt Arsenault
db20a7f2bc
DAG: Fix constructing a temporary TargetTransformInfo instance (#168480) 2025-11-20 01:19:23 -05:00
Mikołaj Piróg
e7b41df10e
[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574)
As in title. Without this, fpext behaves in selectionDAG as always
having no fast-math flags.
2025-11-14 20:50:59 -08:00
Matt Arsenault
24be0ba39b
DAG: Fix assert on nofpclass call with aggregate return (#167725) 2025-11-12 18:12:20 +00:00
zhijian lin
85d2b10838
[DAG] Make strictfp attribute only restricts for libm and make non-math optimizations possible (#165464)
the patch 

[Add strictfp attribute to prevent unwanted optimizations of libm
calls](https://reviews.llvm.org/D34163)


  add `I.isStrictFP()` into 
```
  if (!I.isNoBuiltin() && !I.isStrictFP() && !F->hasLocalLinkage() &&
        F->hasName() && LibInfo->getLibFunc(*F, Func) &&
        LibInfo->hasOptimizedCodeGen(Func)) 
```

it prevents the backend from optimizing even non-math libcalls such as
`strlen` and `memcmp` if a call has the strict floating-point attribute.
For example, it prevent converting strlen and memcmp to milicode call
__strlen and __memcmp.
2025-11-11 13:34:14 -05:00
Matt Arsenault
b4f1994280
DAG: Add AssertNoFPClass from call return attributes (#167264)
This defends against regressions in future patches. This excludes
the target intrinsic case for now; I'm worried introducing an
intermediate
AssertNoFPClass is likely to break combines.
2025-11-10 16:42:48 +00:00
Damian Heaton
70f4b596cf
Add llvm.vector.partial.reduce.fadd intrinsic (#159776)
With this intrinsic, and supporting SelectionDAG nodes, we can better
make use of instructions such as AArch64's `FDOT`.
2025-11-07 15:36:54 +00:00
Daniel Thornburgh
5f08fb4d72
[IR] llvm.reloc.none intrinsic for no-op symbol references (#147427)
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.

See issue #146159 for context.
2025-11-06 08:52:46 -08:00
Sergei Barannikov
71927ddb63
[CodeGen] Delete two ComputeValueVTs overloads (NFC) (#166758)
Those have only a few uses.
2025-11-06 19:45:29 +03:00
Robert Imschweiler
cad96ad703
[NFC] Refactor target intrinsic call lowering (#153204)
Refactor intrinsic call handling in SelectionDAGBuilder and IRTranslator
to prepare the addition of intrinsic support to the callbr instruction,
which should then share code with the handling of the normal call
instruction.
2025-11-06 10:51:44 +01:00
Matt Arsenault
3c2c9d5bc1
DAG: Cleanup string bool attribute check for disable-tail-calls (#166237) 2025-11-03 14:18:04 -08:00
Luo Yuanke
9a0a1fadef
[ISel] Use CallBase instead of CallInst (#164769)
This is to follow the discussion in
https://github.com/llvm/llvm-project/pull/164565
CallBase can cover more call-like instructions which carry caling
convention flag.

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
2025-10-25 20:37:20 +08:00