39151 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
058cad9f82
Add "noconvergent" flag to MachineInstr::print() (#180818) 2026-02-10 12:37:40 -08:00
JaydeepChauhan14
5df173263b
[NFC] Initialize AtomicLoadExtActions array (#180752) 2026-02-10 22:52:35 +05:30
Benjamin Maxwell
b91eb9b4e5
[SDAG] Implement missing legalization for ISD::VECTOR_FIND_LAST_ACTIVE (#180290)
This lowers the splitting as:
```
any_active(hi_mask)
  ? (find_last_active(hi_mask) + lo_mask.getVectorElementCount())
  : find_last_active(lo_mask)
```

And trivially lowers `<1 x i1>` scalarization to returning zero. Which
is a natural result of the splitting (and the lack of a sentinel
"none-active" result value).

The lowerings likely can be improved. This patch is for completeness.

Should fix:
https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334
Fixes #180212
2026-02-10 09:01:13 +00:00
Craig Topper
1d1a34ff3e
[TargetLowering] Avoid creating a VTList until we know we need it. NFC (#180599)
Since I was in the area, also use SDValue::getValue() to shorten getting
result 1.
2026-02-09 20:16:08 +00:00
Ryan Mitchell
8bbdac9e52
[MIParser] - Add support for MMRAs (#180320)
Probably just forgotten in #78569
2026-02-09 18:01:02 +01:00
Eliz Habiboullah
3862a4f733
[GlobalISel] Use named constant for impossible repair cost (#180490)
replace magic value `std::numeric_limits<unsigned>::max()` with a named
constant `ImpossibleRepairCost` to improve readability
2026-02-09 10:18:46 +00:00
Gergo Stomfai
2298b8606d
[GISel] computeKnownBits - add CTLS handling (#178063)
Closes llvm/llvm-project#174370
2026-02-09 09:30:45 +00:00
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
paperchalice
5c5677d7b8
[llvm] Remove "no-infs-fp-math" attribute support (#180083)
One of global options in `TargetMachine::resetTargetOptions`, now all
backends no longer support it, remove it.
2026-02-09 08:43:33 +08:00
Aiden Grossman
4d5d2ffd3e
[ProfCheck] Add prof data for lowering of @llvm.cond.loop
When there is no target-specific lowering of @llvm.cond.loop, it is
lowered into a simple loop by PreISelIntrinsicLowering. Mark the branch
weights into the no-return loop as unknown given we do not have value
metadata to fix the profcheck test for this feature.

Reviewers: mtrofin, alanzhao1, snehasish, pcc

Pull Request: https://github.com/llvm/llvm-project/pull/180390
2026-02-08 10:16:58 -08:00
Qinkun Bao
1b0f139f8e
Revert "[NFC][LiveStacks] Use vectors instead of map and unordred_map" (#180421)
Reverts llvm/llvm-project#165477

Break https://lab.llvm.org/buildbot/#/builders/52/builds/14874
2026-02-08 16:54:51 +00:00
Alex Wang
a947599991
[AMDGPU][GlobalISel] Add lowering for G_FMODF (#180152)
Add generic expansion for G_FMODF matching the SelectionDAG
implementation.

Enable G_FMODF lowering for AMDGPU with tests.

Related: #179434
2026-02-07 18:43:55 +00:00
Qinkun Bao
2a74e02a90
Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#180352)
Reverts llvm/llvm-project#174341

Break https://lab.llvm.org/buildbot/#/builders/24/builds/17324
2026-02-07 16:47:17 +00:00
Moritz Zielke
b0cc73d00c
[GlobalISel] add G_ROTL, G_ROTR to computeKnownBits (#166365)
Adresses one of the subtasks of #150515.

The code is ported from `SelectionDAG::computeKnownBits` and tests are
loosely based on `AArch64/GlobalISel/knownbits-shl.mir`.
2026-02-07 15:32:09 +00:00
Ralender
1acc200d88
[NFC][LiveStacks] Use vectors instead of map and unordred_map (#165477) 2026-02-07 15:31:43 +00:00
Haoren Wang
9e8caa7834
[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341)
## Summary
Fix null pointer dereference in
`SelectionDAGBuilder::resolveDanglingDebugInfo`.

## Problem
`Val.getNode()->getIROrder()` is called before checking if
`Val.getNode()` is null, causing crashes when compiling code with debug
info that contains aggregate constants with nested empty structs.

## Solution
Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())`
block.

## Test Case
Reproduces with aggregate types containing nested empty structs:
```llvm
%3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } 
     { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, 
     ptr %2, 1, !dbg !893

## Crash stack
0.      Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null
1.      Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54
1  libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80
2  libLLVM.so.20.1 0x00007ff87ebbe640
3  libpthread.so.0 0x00007ff87db79140
4  libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const*, llvm::SDValue) + 303
5  libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 142
6  libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*) + 3343
7  libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const*) + 100
8  libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603
9  libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327
10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72
11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956
12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372
13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169
14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610
15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638
16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51
17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105
18 llc-20          0x000055972ce77dc1 main + 9649
19 libc.so.6       0x00007ff87d68ad7a __libc_start_main + 234
20 llc-20          0x000055972ce7247a _start + 42
```

## Testing

Added regression tests in:
- `CodeGen/X86/selectiondag-dbgvalue-null-crash.ll`
- `CodeGen/AArch64/selectiondag-dbgvalue-null-crash.ll`

**Note:** Tests appear to expose deeper issues in DWARF generation on
certain targets (Darwin targets for example) that require further
investigation.

## Related PRs

This supersedes:
- #173500 - Initial fix, reverted due to test failures on Darwin and
other platforms
- #173836 - Second attempt with `UNSUPPORTED: system-darwin`, still
failed on some targets
2026-02-07 13:00:30 +01:00
Vladimir Vereschaka
19d681177f
Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321)
Reverts llvm/llvm-project#179652

This PR causes the out-of-memory build failures on many Windows
builders.
2026-02-06 21:58:50 -08:00
Peter Collingbourne
191af6c254
Add llvm.cond.loop intrinsic.
The llvm.cond.loop intrinsic is semantically equivalent to a conditional
branch conditioned on ``pred`` to a basic block consisting only of an
unconditional branch to itself. Unlike such a branch, it is guaranteed
to use specific instructions. This allows an interrupt handler or
other introspection mechanism to straightforwardly detect whether
the program is currently spinning in the infinite loop and possibly
terminate the program if so. The intent is that this intrinsic may
be used as a more efficient alternative to a conditional branch to
a call to ``llvm.trap`` in circumstances where the loop detection
is guaranteed to be present. This construct has been experimentally
determined to be executed more efficiently (when the branch is not taken)
than a conditional branch to a trap instruction on AMD and older Intel
microarchitectures, and is also more code size efficient by avoiding the
need to emit a trap instruction and possibly a long branch instruction.

On i386 and x86_64, the infinite loop is guaranteed to consist of a short
conditional branch instruction that branches to itself. Specifically,
the first byte of the instruction will be between 0x70 and 0x7F, and
the second byte will be 0xFE.

Part of this RFC:
https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456

Reviewers: arsenm, RKSimon, fmayer, vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/177686
2026-02-06 17:11:15 -08:00
sstipano
13d8870d45
[MC][TableGen] Expand Opcode field of MCInstrDesc (#179652)
Increase width of Opcode to `int` from `short` to allow more capacity.
2026-02-06 20:21:48 +01:00
Kyungwoo Lee
8e17489026
[CGData][GMF] Preserve Profile Data (#180126)
Profile data for instructions (e.g., branch weights) is automatically
preserved via `splice()` which moves the basic blocks along with their
instruction metadata. However, entry count is stored as function
metadata, which was dropped when creating merged function and thunks.

The fix is to explicitly set entry count for both merged function (.Tgm)
and thunks via `setEntryCount()`.
2026-02-06 10:03:39 -08:00
Rahul Joshi
b12e3122c8
[NFC][Core][CodeGen] Remove pass initialization from pass constructors (#180153) 2026-02-06 09:05:47 -08:00
David Sherwood
e958bcdd17
[DAGCombiner] Look through freeze for ext(freeze(extload(x))) (#178669)
This patch fixes a regression introduced by PR #175022, where
a freeze was introduced with the following transformation:

  ext(freeze(load(x))) -> freeze(extload(x))

If a new extend is introduced afterwards we then have

  ext(freeze(extload(x)))

which doesn't get picked up by existing DAG combines due to
the freeze getting in the way.
2026-02-06 15:50:17 +00:00
Nikita Popov
0287d789e0
[ExpandIRInsts] Freeze input in itofp expansion (#180157)
We are introducing branches on the value, and branch on undef/poison is
UB, so the value needs to be frozen.
2026-02-06 12:52:31 +01:00
Steffen Larsen
5654ecd5dd
[DAGCombiner] Fix exact power-of-two signed division for large integers (#177340)
Previously, the DAG combiner did not optimize exact signed division by a
power-of-two constant divisor for integer types exceeding the size of
division supported by the target architecture (e.g., i128 on x86-64).
However, such an optimization was expected by the division expansion
logic, leading to unsupported division operations making it to
instruction selection.
This commit addresses this issue by making an exception to the existing
exclusion of signed division with the exact flag for the aforementioned
operations. That is, the DAG combiner will now optimize exact signed
division if the divisor is a power-of-two constant and the integer type
exceeds the size of division supported by the target architecture.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
2026-02-06 09:40:32 +01:00
Folkert de Vries
9639e9669e
[AArch64] fix copy from GPR32 to FPR16 (#176594)
fixes https://github.com/llvm/llvm-project/issues/79822
cc https://github.com/rust-lang/rust/issues/120374

The example fails on nightly https://godbolt.org/z/zEojPzqWc.
2026-02-05 21:13:03 +01:00
Jameson Nash
d762cc2f03
[GlobalISel] Add SVE support for alloca (#178976)
Complementary to the same handling code in SelectionDAG:

f3d81d4110/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp (L160-L165)

f3d81d4110/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L4613-L4623)

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 14:00:34 -05:00
Jay Foad
77034cd325
[CodeGen] Make use of TargetRegisterInfo::findCommonRegClass. NFC. (#179981) 2026-02-05 17:22:46 +00:00
Nikita Popov
722c2f0221
[ExpandIRInsts] Support int bw < float bw in itofp expansion (#179963)
Handle this case by extending the integer to a wider type. This can
probably be handled more optimally, but this is conservatively correct.

Proof: https://alive2.llvm.org/ce/z/0RwDO1
2026-02-05 17:26:12 +01:00
Matt Arsenault
a9adf7d1e3
GlobalISel: Remove unused argument from CSEInfo (#179962)
Nothing uses this force recomputation.
2026-02-05 16:03:08 +00:00
Nikita Popov
d3fb3c5d36
[GISel][CallLowering] Keep IR types longer (#179946)
GISel CallLowering currently does a Type -> EVT -> Type roundtrip early
on when populating ArgInfo in splitToValueType(). This is a bit odd as
this structure operates at the IR Type level. Keep the original type
there and only convert to EVT when performing assignments.
2026-02-05 16:37:08 +01:00
Nikita Popov
d737229efd
[ExpandIRInsts] Allow int bw == float bw in itofp (#179943)
I don't think anything here requires the integer bit width to be
strictly larger. It's fine if it's the same (in which case some zexts
just go away).

Add tests on half + i32 that can be verified by alive2. Note that half
is handled via float, so the minimum supported type is i32 rather than
i16.

Proof (uitofp): https://alive2.llvm.org/ce/z/CsMfkU
Proof (sitofp): https://alive2.llvm.org/ce/z/jzuxyt
2026-02-05 16:21:19 +01:00
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
Kai Nacke
f3bd1b9526
[SystemZ][z/OS] Use the text section for jump tables (#179793)
Jump tables are read only data, and the text section is the best choice
for them.
2026-02-05 08:18:17 -05:00
keremsahn
f6e130682f
[SelectionDAG] Mark LowerTypeTests as required and remove intrinsic handling from #142939 (#179249)
Fixes #179125
2026-02-05 11:16:48 +01:00
Ryotaro Kasuga
2ca54b41a4
[MachinePipeliner] Remove isLoopCarriedDep calls in computeStart (#174393)
When computing the viable cycles for scheduling an instruction,
`computeStart` used to include special-case logic to handle loop-carried
dependencies. This special handling was necessary because loop-carried
dependencies were represented by reversed forward-direction edges in the
DAG. Now that we have the DDG, which explicitly models loop-carried
dependencies, this special handling is no longer required. As a first
step towards completely removing `isLoopCarriedDep`, this patch
eliminates the special-case logic from `computeStart` and some related
functions.

Split off from https://github.com/llvm/llvm-project/pull/135148
2026-02-05 06:05:48 +00:00
Ryotaro Kasuga
82c0607ffd
[MachinePipeliner] Add loop-carried dependences for FPExceptions (#174392)
As with loads and stores, instructions that may trigger floating‑point
exceptions must not be reordered across a barrier instruction. This
patch adds the missing loop‑carried dependencies between such
instructions and the barrier, preventing reordering that could
previously occur. Same as #174391, the implementation is based on that
of `ScheduleDAGInstrs::buildSchedGraph`.

Split off from #135148
2026-02-05 05:32:10 +00:00
Ryotaro Kasuga
dfdc3b72d2
[MachinePipelner] Add loop-carried dependencies for global barriers (#174391)
The loads/stores must not be reordered across barrier instructions.
However, in MachinePipeliner, it potentially could happen since
loop-carried dependencies from loads/stores to a barrier instruction
were not considered. The same problem exists for barrier-to-barrier
dependencies. This patch adds the handling for those cases. The
implementation is based on that of `ScheduleDAGInstrs::buildSchedGraph`.

Split off from https://github.com/llvm/llvm-project/pull/135148
2026-02-05 04:17:26 +00:00
Akshay Deodhar
fab5b1858d
Reland "[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015)" (#179553)
This PR adds full support for atomicrmw in NVPTX. This includes:

- Memory order and syncscope support (changes in AtomicExpandPass.cpp,
NVPTXIntrinsics.td)

- Script-generated tests for integer and atomic operations
(atomicrmw.py, atomicrmw-sm*.ll in tests/CodeGen/NVPTX). Existing
atomics tests which are subsumed by these have been removed
(atomics-sm*.ll, atomics.ll, atomicrmw-expand.ll).

- ~~Changes shouldExpandAtomicRMWInIR to take a constant argument: This
is to allow some other TargetLowering constant-argument functions to
call it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.~~ Has been split out into
https://github.com/llvm/llvm-project/pull/176073. Rebased.

- NOTE: The initial load issued for atomicrmw emulation loops (and
cmpxchg emulation loops) must be a strong load. Currently,
AtomicExpandPass issues a weak load. Fixing this breaks several
backends. I'm planning to follow up with a separate PR.

Initially failed due to error: ptxas fatal   : Value 'sm_60' is not
defined for option 'gpu-name'. Updated RUN lines in atomicrmw-sm*.py to
skip the ptxas-verify check if ptxas does not support that SM version.
2026-02-04 16:15:49 -08:00
Sam Elliott
0cac3e381d
[CodeGen][TII] Delete analyzeSelect hook (#175828)
The only caller of this function (`PeepholeOptimizer::optimizeSelect`)
did not use most of the parameters, was broadly equivalent to
`MI->isSelect()`, and the `optimizeSelect` hook can return `nullptr`
anyway.

Update `optimizeSelect` to return `nullptr` by default rather than
asserting when not implemented.
2026-02-04 14:14:45 -08:00
Alex Wang
b33a0e6101
[SelectionDAG] Add expansion for llvm.modf intrinsic (#179434)
Targets without a `modf` libcall lower the intrinsic directly, matching
the existing `llvm.frexp` expansion. Targets with an existing libcall
are unchanged.

Fixes #173021
2026-02-04 21:25:47 +01:00
Stanislav Mekhanoshin
ba8df39898
Add SDNodeFlag::NoConvergent (#179323) 2026-02-04 10:21:45 -08:00
weiguozhi
9a47c3bcba
[RegAlloc] Change the computation of CSRCost (#177226)
This patch fixes https://github.com/llvm/llvm-project/issues/150737.

The original computed CSRCost is too small, so the optimization of
spilling instead of using CSR is rarely triggered.
    
Also the original cost model is too difficult to be understood and too
hard to be tuned by backend developers and users.
    
So this patch changes the CSRCost to be

        CSRCost = TRI->getCSRFirstUseCost() * EntryFreq * Scale
    
TRI->getCSRFirstUseCost() is the raw cost of save/restore a CSR. Usually
we don't need to tune this number.
   EntryFreq is the BlockFrequency of the entry block.
Scale is used to scale down the CSRCost, because we usually prefer a CSR
register instead of spilling if we have similar CSRCost and spill cost,
so it should be less than 100%. We usually tune this number.
    
Another problem is the original function RAGreedy::calcSpillCost()
actually computes a cost for block split, so this patch also implements
a correct RAGreedy::calcSpillCost() function.

This new behavior is not enabled by default. This optimization is used
by 3 targets (AArch64 / AMDGPU / RISCV), I will change them one by one
in following patches.
2026-02-04 10:08:57 -08:00
Jay Foad
7ea33e6848
[CodeGen] Remove unused first operand of SUBREG_TO_REG (#179690)
The first input operand of SUBREG_TO_REG was an immediate that most
targets set to 0. In practice it had no effect on codegen. Remove it.
2026-02-04 17:35:21 +00:00
Nikita Popov
516eb3820d
[ExpandIRInsts] Freeze value before fptoi expansion (#179659)
We're going to introduce new branches, and branch on undef/poison
is immediate UB.
2026-02-04 14:49:34 +01:00
Fabian Ritter
d24a6754ce
[LowerMemIntrinsics] Optimize memset lowering (#169040)
This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
2026-02-04 13:35:13 +01:00
Jay Foad
a13c6ea80d
[CodeGen] Simplify ExpandPostRA::LowerSubregToReg. NFC. (#179634)
SUBREG_TO_REG always has a non-zero subreg index so DstSubReg can never
be the same as DstReg.
2026-02-04 12:06:41 +00:00
Juan Manuel Martinez Caamaño
04c56505f8
[NFC][LLVM] Make constrainSelectedInstRegOperands return void (#179501)
`constrainSelectedInstRegOperands` always returns `true`; so it can be
safely transformed to return `void` instead.

A follow-up patch should update `MachineInstrBuilder::constrainAllUses`.
2026-02-04 08:59:16 +01:00
Luke Lau
653b336e66
[LegalizeVectorTypes] Don't emit VP_SELECT when widening MLOAD to VP_LOAD (#179478)
This is part of the work to remove trivial VP intrinsics.

When widening an MLOAD we may use a VP_LOAD if it's supported. We use a
VP_SELECT to merge in the passthru, but we don't check if it's supported
by the target. This changes it to just emit a regular VSELECT instead to
prevent crashing in that case, and a VP_MERGE to keep the lanes past EVL
poison.
2026-02-04 07:11:30 +00:00
serge-sans-paille
dcf853df8f
[perf] Replace extra copy-assign by move-assign in llvm/lib/ (#179465)
Co-authored-by: Nikita Popov <github@npopov.com>
2026-02-04 06:36:30 +00:00
Vladislav Dzhidzhoev
b9cecee3fb
Reland "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" (#165032)
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.

The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when 
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.

If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.

To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.

Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,

Original commit message follows.
---------

RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.

The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
2026-02-04 00:34:52 +01:00