llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	058cad9f82	Add "noconvergent" flag to MachineInstr::print() (#180818 )	2026-02-10 12:37:40 -08:00
JaydeepChauhan14	5df173263b	[NFC] Initialize AtomicLoadExtActions array (#180752 )	2026-02-10 22:52:35 +05:30
Benjamin Maxwell	b91eb9b4e5	[SDAG] Implement missing legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` (#180290 ) This lowers the splitting as: ``` any_active(hi_mask) ? (find_last_active(hi_mask) + lo_mask.getVectorElementCount()) : find_last_active(lo_mask) ``` And trivially lowers `<1 x i1>` scalarization to returning zero. Which is a natural result of the splitting (and the lack of a sentinel "none-active" result value). The lowerings likely can be improved. This patch is for completeness. Should fix: https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334 Fixes #180212	2026-02-10 09:01:13 +00:00
Craig Topper	1d1a34ff3e	[TargetLowering] Avoid creating a VTList until we know we need it. NFC (#180599 ) Since I was in the area, also use SDValue::getValue() to shorten getting result 1.	2026-02-09 20:16:08 +00:00
Ryan Mitchell	8bbdac9e52	[MIParser] - Add support for MMRAs (#180320 ) Probably just forgotten in #78569	2026-02-09 18:01:02 +01:00
Eliz Habiboullah	3862a4f733	[GlobalISel] Use named constant for impossible repair cost (#180490 ) replace magic value `std::numeric_limits<unsigned>::max()` with a named constant `ImpossibleRepairCost` to improve readability	2026-02-09 10:18:46 +00:00
Gergo Stomfai	2298b8606d	[GISel] computeKnownBits - add CTLS handling (#178063 ) Closes llvm/llvm-project#174370	2026-02-09 09:30:45 +00:00
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
paperchalice	5c5677d7b8	[llvm] Remove "no-infs-fp-math" attribute support (#180083 ) One of global options in `TargetMachine::resetTargetOptions`, now all backends no longer support it, remove it.	2026-02-09 08:43:33 +08:00
Aiden Grossman	4d5d2ffd3e	[ProfCheck] Add prof data for lowering of @llvm.cond.loop When there is no target-specific lowering of @llvm.cond.loop, it is lowered into a simple loop by PreISelIntrinsicLowering. Mark the branch weights into the no-return loop as unknown given we do not have value metadata to fix the profcheck test for this feature. Reviewers: mtrofin, alanzhao1, snehasish, pcc Pull Request: https://github.com/llvm/llvm-project/pull/180390	2026-02-08 10:16:58 -08:00
Qinkun Bao	1b0f139f8e	Revert "[NFC][LiveStacks] Use vectors instead of map and unordred_map" (#180421 ) Reverts llvm/llvm-project#165477 Break https://lab.llvm.org/buildbot/#/builders/52/builds/14874	2026-02-08 16:54:51 +00:00
Alex Wang	a947599991	[AMDGPU][GlobalISel] Add lowering for G_FMODF (#180152 ) Add generic expansion for G_FMODF matching the SelectionDAG implementation. Enable G_FMODF lowering for AMDGPU with tests. Related: #179434	2026-02-07 18:43:55 +00:00
Qinkun Bao	2a74e02a90	Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#180352 ) Reverts llvm/llvm-project#174341 Break https://lab.llvm.org/buildbot/#/builders/24/builds/17324	2026-02-07 16:47:17 +00:00
Moritz Zielke	b0cc73d00c	[GlobalISel] add G_ROTL, G_ROTR to computeKnownBits (#166365 ) Adresses one of the subtasks of #150515. The code is ported from `SelectionDAG::computeKnownBits` and tests are loosely based on `AArch64/GlobalISel/knownbits-shl.mir`.	2026-02-07 15:32:09 +00:00
Ralender	1acc200d88	[NFC][LiveStacks] Use vectors instead of map and unordred_map (#165477 )	2026-02-07 15:31:43 +00:00
Haoren Wang	9e8caa7834	[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341 ) ## Summary Fix null pointer dereference in `SelectionDAGBuilder::resolveDanglingDebugInfo`. ## Problem `Val.getNode()->getIROrder()` is called before checking if `Val.getNode()` is null, causing crashes when compiling code with debug info that contains aggregate constants with nested empty structs. ## Solution Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())` block. ## Test Case Reproduces with aggregate types containing nested empty structs: ```llvm %3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, ptr %2, 1, !dbg !893 ## Crash stack 0. Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null 1. Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'. 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54 1 libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80 2 libLLVM.so.20.1 0x00007ff87ebbe640 3 libpthread.so.0 0x00007ff87db79140 4 libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const, llvm::SDValue) + 303 5 libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 142 6 libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const) + 3343 7 libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 100 8 libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603 9 libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327 10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72 11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956 12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372 13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169 14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610 15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638 16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51 17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105 18 llc-20 0x000055972ce77dc1 main + 9649 19 libc.so.6 0x00007ff87d68ad7a __libc_start_main + 234 20 llc-20 0x000055972ce7247a _start + 42 ``` ## Testing Added regression tests in: - `CodeGen/X86/selectiondag-dbgvalue-null-crash.ll` - `CodeGen/AArch64/selectiondag-dbgvalue-null-crash.ll` Note: Tests appear to expose deeper issues in DWARF generation on certain targets (Darwin targets for example) that require further investigation. ## Related PRs This supersedes: - #173500 - Initial fix, reverted due to test failures on Darwin and other platforms - #173836 - Second attempt with `UNSUPPORTED: system-darwin`, still failed on some targets	2026-02-07 13:00:30 +01:00
Vladimir Vereschaka	19d681177f	Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321 ) Reverts llvm/llvm-project#179652 This PR causes the out-of-memory build failures on many Windows builders.	2026-02-06 21:58:50 -08:00
Peter Collingbourne	191af6c254	Add llvm.cond.loop intrinsic. The llvm.cond.loop intrinsic is semantically equivalent to a conditional branch conditioned on ``pred`` to a basic block consisting only of an unconditional branch to itself. Unlike such a branch, it is guaranteed to use specific instructions. This allows an interrupt handler or other introspection mechanism to straightforwardly detect whether the program is currently spinning in the infinite loop and possibly terminate the program if so. The intent is that this intrinsic may be used as a more efficient alternative to a conditional branch to a call to ``llvm.trap`` in circumstances where the loop detection is guaranteed to be present. This construct has been experimentally determined to be executed more efficiently (when the branch is not taken) than a conditional branch to a trap instruction on AMD and older Intel microarchitectures, and is also more code size efficient by avoiding the need to emit a trap instruction and possibly a long branch instruction. On i386 and x86_64, the infinite loop is guaranteed to consist of a short conditional branch instruction that branches to itself. Specifically, the first byte of the instruction will be between 0x70 and 0x7F, and the second byte will be 0xFE. Part of this RFC: https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456 Reviewers: arsenm, RKSimon, fmayer, vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/177686	2026-02-06 17:11:15 -08:00
sstipano	13d8870d45	[MC][TableGen] Expand Opcode field of MCInstrDesc (#179652 ) Increase width of Opcode to `int` from `short` to allow more capacity.	2026-02-06 20:21:48 +01:00
Kyungwoo Lee	8e17489026	[CGData][GMF] Preserve Profile Data (#180126 ) Profile data for instructions (e.g., branch weights) is automatically preserved via `splice()` which moves the basic blocks along with their instruction metadata. However, entry count is stored as function metadata, which was dropped when creating merged function and thunks. The fix is to explicitly set entry count for both merged function (.Tgm) and thunks via `setEntryCount()`.	2026-02-06 10:03:39 -08:00
Rahul Joshi	b12e3122c8	[NFC][Core][CodeGen] Remove pass initialization from pass constructors (#180153 )	2026-02-06 09:05:47 -08:00
David Sherwood	e958bcdd17	[DAGCombiner] Look through freeze for ext(freeze(extload(x))) (#178669 ) This patch fixes a regression introduced by PR #175022, where a freeze was introduced with the following transformation: ext(freeze(load(x))) -> freeze(extload(x)) If a new extend is introduced afterwards we then have ext(freeze(extload(x))) which doesn't get picked up by existing DAG combines due to the freeze getting in the way.	2026-02-06 15:50:17 +00:00
Nikita Popov	0287d789e0	[ExpandIRInsts] Freeze input in itofp expansion (#180157 ) We are introducing branches on the value, and branch on undef/poison is UB, so the value needs to be frozen.	2026-02-06 12:52:31 +01:00
Steffen Larsen	5654ecd5dd	[DAGCombiner] Fix exact power-of-two signed division for large integers (#177340 ) Previously, the DAG combiner did not optimize exact signed division by a power-of-two constant divisor for integer types exceeding the size of division supported by the target architecture (e.g., i128 on x86-64). However, such an optimization was expected by the division expansion logic, leading to unsupported division operations making it to instruction selection. This commit addresses this issue by making an exception to the existing exclusion of signed division with the exact flag for the aforementioned operations. That is, the DAG combiner will now optimize exact signed division if the divisor is a power-of-two constant and the integer type exceeds the size of division supported by the target architecture. --------- Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>	2026-02-06 09:40:32 +01:00
Folkert de Vries	9639e9669e	[AArch64] fix copy from GPR32 to FPR16 (#176594 ) fixes https://github.com/llvm/llvm-project/issues/79822 cc https://github.com/rust-lang/rust/issues/120374 The example fails on nightly https://godbolt.org/z/zEojPzqWc.	2026-02-05 21:13:03 +01:00
Jameson Nash	d762cc2f03	[GlobalISel] Add SVE support for alloca (#178976 ) Complementary to the same handling code in SelectionDAG: `f3d81d4110/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp (L160-L165)` `f3d81d4110/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L4613-L4623)` Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-05 14:00:34 -05:00
Jay Foad	77034cd325	[CodeGen] Make use of TargetRegisterInfo::findCommonRegClass. NFC. (#179981 )	2026-02-05 17:22:46 +00:00
Nikita Popov	722c2f0221	[ExpandIRInsts] Support int bw < float bw in itofp expansion (#179963 ) Handle this case by extending the integer to a wider type. This can probably be handled more optimally, but this is conservatively correct. Proof: https://alive2.llvm.org/ce/z/0RwDO1	2026-02-05 17:26:12 +01:00
Matt Arsenault	a9adf7d1e3	GlobalISel: Remove unused argument from CSEInfo (#179962 ) Nothing uses this force recomputation.	2026-02-05 16:03:08 +00:00
Nikita Popov	d3fb3c5d36	[GISel][CallLowering] Keep IR types longer (#179946 ) GISel CallLowering currently does a Type -> EVT -> Type roundtrip early on when populating ArgInfo in splitToValueType(). This is a bit odd as this structure operates at the IR Type level. Keep the original type there and only convert to EVT when performing assignments.	2026-02-05 16:37:08 +01:00
Nikita Popov	d737229efd	[ExpandIRInsts] Allow int bw == float bw in itofp (#179943 ) I don't think anything here requires the integer bit width to be strictly larger. It's fine if it's the same (in which case some zexts just go away). Add tests on half + i32 that can be verified by alive2. Note that half is handled via float, so the minimum supported type is i32 rather than i16. Proof (uitofp): https://alive2.llvm.org/ce/z/CsMfkU Proof (sitofp): https://alive2.llvm.org/ce/z/jzuxyt	2026-02-05 16:21:19 +01:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Kai Nacke	f3bd1b9526	[SystemZ][z/OS] Use the text section for jump tables (#179793 ) Jump tables are read only data, and the text section is the best choice for them.	2026-02-05 08:18:17 -05:00
keremsahn	f6e130682f	[SelectionDAG] Mark LowerTypeTests as required and remove intrinsic handling from #142939 (#179249 ) Fixes #179125	2026-02-05 11:16:48 +01:00
Ryotaro Kasuga	2ca54b41a4	[MachinePipeliner] Remove isLoopCarriedDep calls in computeStart (#174393 ) When computing the viable cycles for scheduling an instruction, `computeStart` used to include special-case logic to handle loop-carried dependencies. This special handling was necessary because loop-carried dependencies were represented by reversed forward-direction edges in the DAG. Now that we have the DDG, which explicitly models loop-carried dependencies, this special handling is no longer required. As a first step towards completely removing `isLoopCarriedDep`, this patch eliminates the special-case logic from `computeStart` and some related functions. Split off from https://github.com/llvm/llvm-project/pull/135148	2026-02-05 06:05:48 +00:00
Ryotaro Kasuga	82c0607ffd	[MachinePipeliner] Add loop-carried dependences for FPExceptions (#174392 ) As with loads and stores, instructions that may trigger floating‑point exceptions must not be reordered across a barrier instruction. This patch adds the missing loop‑carried dependencies between such instructions and the barrier, preventing reordering that could previously occur. Same as #174391, the implementation is based on that of `ScheduleDAGInstrs::buildSchedGraph`. Split off from #135148	2026-02-05 05:32:10 +00:00
Ryotaro Kasuga	dfdc3b72d2	[MachinePipelner] Add loop-carried dependencies for global barriers (#174391 ) The loads/stores must not be reordered across barrier instructions. However, in MachinePipeliner, it potentially could happen since loop-carried dependencies from loads/stores to a barrier instruction were not considered. The same problem exists for barrier-to-barrier dependencies. This patch adds the handling for those cases. The implementation is based on that of `ScheduleDAGInstrs::buildSchedGraph`. Split off from https://github.com/llvm/llvm-project/pull/135148	2026-02-05 04:17:26 +00:00
Akshay Deodhar	fab5b1858d	Reland "[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015 )" (#179553 ) This PR adds full support for atomicrmw in NVPTX. This includes: - Memory order and syncscope support (changes in AtomicExpandPass.cpp, NVPTXIntrinsics.td) - Script-generated tests for integer and atomic operations (atomicrmw.py, atomicrmw-sm.ll in tests/CodeGen/NVPTX). Existing atomics tests which are subsumed by these have been removed (atomics-sm.ll, atomics.ll, atomicrmw-expand.ll). - ~~Changes shouldExpandAtomicRMWInIR to take a constant argument: This is to allow some other TargetLowering constant-argument functions to call it. This change touches several backends. An alternative solution exists, but to me, this seems the "right" way.~~ Has been split out into https://github.com/llvm/llvm-project/pull/176073. Rebased. - NOTE: The initial load issued for atomicrmw emulation loops (and cmpxchg emulation loops) must be a strong load. Currently, AtomicExpandPass issues a weak load. Fixing this breaks several backends. I'm planning to follow up with a separate PR. Initially failed due to error: ptxas fatal : Value 'sm_60' is not defined for option 'gpu-name'. Updated RUN lines in atomicrmw-sm*.py to skip the ptxas-verify check if ptxas does not support that SM version.	2026-02-04 16:15:49 -08:00
Sam Elliott	0cac3e381d	[CodeGen][TII] Delete analyzeSelect hook (#175828 ) The only caller of this function (`PeepholeOptimizer::optimizeSelect`) did not use most of the parameters, was broadly equivalent to `MI->isSelect()`, and the `optimizeSelect` hook can return `nullptr` anyway. Update `optimizeSelect` to return `nullptr` by default rather than asserting when not implemented.	2026-02-04 14:14:45 -08:00
Alex Wang	b33a0e6101	[SelectionDAG] Add expansion for llvm.modf intrinsic (#179434 ) Targets without a `modf` libcall lower the intrinsic directly, matching the existing `llvm.frexp` expansion. Targets with an existing libcall are unchanged. Fixes #173021	2026-02-04 21:25:47 +01:00
Stanislav Mekhanoshin	ba8df39898	Add SDNodeFlag::NoConvergent (#179323 )	2026-02-04 10:21:45 -08:00
weiguozhi	9a47c3bcba	[RegAlloc] Change the computation of CSRCost (#177226 ) This patch fixes https://github.com/llvm/llvm-project/issues/150737. The original computed CSRCost is too small, so the optimization of spilling instead of using CSR is rarely triggered. Also the original cost model is too difficult to be understood and too hard to be tuned by backend developers and users. So this patch changes the CSRCost to be CSRCost = TRI->getCSRFirstUseCost() * EntryFreq * Scale TRI->getCSRFirstUseCost() is the raw cost of save/restore a CSR. Usually we don't need to tune this number. EntryFreq is the BlockFrequency of the entry block. Scale is used to scale down the CSRCost, because we usually prefer a CSR register instead of spilling if we have similar CSRCost and spill cost, so it should be less than 100%. We usually tune this number. Another problem is the original function RAGreedy::calcSpillCost() actually computes a cost for block split, so this patch also implements a correct RAGreedy::calcSpillCost() function. This new behavior is not enabled by default. This optimization is used by 3 targets (AArch64 / AMDGPU / RISCV), I will change them one by one in following patches.	2026-02-04 10:08:57 -08:00
Jay Foad	7ea33e6848	[CodeGen] Remove unused first operand of SUBREG_TO_REG (#179690 ) The first input operand of SUBREG_TO_REG was an immediate that most targets set to 0. In practice it had no effect on codegen. Remove it.	2026-02-04 17:35:21 +00:00
Nikita Popov	516eb3820d	[ExpandIRInsts] Freeze value before fptoi expansion (#179659 ) We're going to introduce new branches, and branch on undef/poison is immediate UB.	2026-02-04 14:49:34 +01:00
Fabian Ritter	d24a6754ce	[LowerMemIntrinsics] Optimize memset lowering (#169040 ) This patch changes the memset lowering to match the optimized memcpy lowering. The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred memory access type. If that type is larger than a byte, the memset is lowered into two loops: a main loop that stores a sufficiently wide vector splat of the SetValue with the preferred memory access type and a residual loop that covers the remaining bytes individually. If the memset size is statically known, the residual loop is replaced by a sequence of stores. This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by around 7-20x. I'm planning similar treatment for memset.pattern as a follow-up PR. For SWDEV-543208.	2026-02-04 13:35:13 +01:00
Jay Foad	a13c6ea80d	[CodeGen] Simplify ExpandPostRA::LowerSubregToReg. NFC. (#179634 ) SUBREG_TO_REG always has a non-zero subreg index so DstSubReg can never be the same as DstReg.	2026-02-04 12:06:41 +00:00
Juan Manuel Martinez Caamaño	04c56505f8	[NFC][LLVM] Make `constrainSelectedInstRegOperands` return `void` (#179501 ) `constrainSelectedInstRegOperands` always returns `true`; so it can be safely transformed to return `void` instead. A follow-up patch should update `MachineInstrBuilder::constrainAllUses`.	2026-02-04 08:59:16 +01:00
Luke Lau	653b336e66	[LegalizeVectorTypes] Don't emit VP_SELECT when widening MLOAD to VP_LOAD (#179478 ) This is part of the work to remove trivial VP intrinsics. When widening an MLOAD we may use a VP_LOAD if it's supported. We use a VP_SELECT to merge in the passthru, but we don't check if it's supported by the target. This changes it to just emit a regular VSELECT instead to prevent crashing in that case, and a VP_MERGE to keep the lanes past EVL poison.	2026-02-04 07:11:30 +00:00
serge-sans-paille	dcf853df8f	[perf] Replace extra copy-assign by move-assign in llvm/lib/ (#179465 ) Co-authored-by: Nikita Popov <github@npopov.com>	2026-02-04 06:36:30 +00:00
Vladislav Dzhidzhoev	b9cecee3fb	Reland "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" (#165032 ) This is an attempt to merge https://reviews.llvm.org/D144006 with LTO fix. The last merge attempt was https://github.com/llvm/llvm-project/pull/75385. The issue with it was investigated in https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121. The problem happens when 1. Several modules are being linked. 2. There are several DISubprograms that initially belong to different modules but represent the same source code function (for example, a function included from the same source code file). 3. Some of such DISubprograms survive IR linking. It may happen if one of them is inlined somewhere or if the functions that have these DISubprograms attached have internal linkage. 4. Each of these DISubprograms has a local type that corresponds to the same source code type. These types are initially from different modules, but have the same ODR identifier. If the same (in the sense of ODR identifier/ODR uniquing rules) local type is present in two modules, and these modules are linked together, the type gets uniqued. A DIType, that happens to be loaded first, survives linking, and the references on other types with the same ODR identifier from the modules loaded later are replaced with the references on the DIType loaded first. Since defintion subprograms, in scope of which these types are located, are not deduplicated, the linker output may contain multiple DISubprogram's having the same (uniqued) type in their retainedNodes lists. Further compilation of such modules causes crashes. To tackle that, * previous solution to handle LTO linking with local types in retainedNodes is removed (cloneLocalTypes() function), * for each loaded distinct (definition) DISubprogram, its retainedNodes list is scanned after loading, and DITypes with a scope of another subprogram are removed. If something from a Function corresponding to the DISubprogram references uniqued type, we rely on cross-CU links. Additionally: * a check is added to Verifier to report about local types located in a wrong retainedNodes list, Original commit message follows. --------- RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Similar to imported declarations, the patch tracks function-local types in DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with the aforementioned metadata change and provided a support of function-local types scoped within a lexical block. The patch assumes that DICompileUnit's 'enums field' no longer tracks local types and DwarfDebug would assert if any locally-scoped types get placed there. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>	2026-02-04 00:34:52 +01:00

1 2 3 4 5 ...

39151 Commits