llvm-project

Author	SHA1	Message	Date
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
Qinkun Bao	2a74e02a90	Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#180352 ) Reverts llvm/llvm-project#174341 Break https://lab.llvm.org/buildbot/#/builders/24/builds/17324	2026-02-07 16:47:17 +00:00
Haoren Wang	9e8caa7834	[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341 ) ## Summary Fix null pointer dereference in `SelectionDAGBuilder::resolveDanglingDebugInfo`. ## Problem `Val.getNode()->getIROrder()` is called before checking if `Val.getNode()` is null, causing crashes when compiling code with debug info that contains aggregate constants with nested empty structs. ## Solution Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())` block. ## Test Case Reproduces with aggregate types containing nested empty structs: ```llvm %3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, ptr %2, 1, !dbg !893 ## Crash stack 0. Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null 1. Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'. 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54 1 libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80 2 libLLVM.so.20.1 0x00007ff87ebbe640 3 libpthread.so.0 0x00007ff87db79140 4 libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const, llvm::SDValue) + 303 5 libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 142 6 libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const) + 3343 7 libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 100 8 libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603 9 libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327 10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72 11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956 12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372 13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169 14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610 15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638 16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51 17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105 18 llc-20 0x000055972ce77dc1 main + 9649 19 libc.so.6 0x00007ff87d68ad7a __libc_start_main + 234 20 llc-20 0x000055972ce7247a _start + 42 ``` ## Testing Added regression tests in: - `CodeGen/X86/selectiondag-dbgvalue-null-crash.ll` - `CodeGen/AArch64/selectiondag-dbgvalue-null-crash.ll` Note: Tests appear to expose deeper issues in DWARF generation on certain targets (Darwin targets for example) that require further investigation. ## Related PRs This supersedes: - #173500 - Initial fix, reverted due to test failures on Darwin and other platforms - #173836 - Second attempt with `UNSUPPORTED: system-darwin`, still failed on some targets	2026-02-07 13:00:30 +01:00
Peter Collingbourne	191af6c254	Add llvm.cond.loop intrinsic. The llvm.cond.loop intrinsic is semantically equivalent to a conditional branch conditioned on ``pred`` to a basic block consisting only of an unconditional branch to itself. Unlike such a branch, it is guaranteed to use specific instructions. This allows an interrupt handler or other introspection mechanism to straightforwardly detect whether the program is currently spinning in the infinite loop and possibly terminate the program if so. The intent is that this intrinsic may be used as a more efficient alternative to a conditional branch to a call to ``llvm.trap`` in circumstances where the loop detection is guaranteed to be present. This construct has been experimentally determined to be executed more efficiently (when the branch is not taken) than a conditional branch to a trap instruction on AMD and older Intel microarchitectures, and is also more code size efficient by avoiding the need to emit a trap instruction and possibly a long branch instruction. On i386 and x86_64, the infinite loop is guaranteed to consist of a short conditional branch instruction that branches to itself. Specifically, the first byte of the instruction will be between 0x70 and 0x7F, and the second byte will be 0xFE. Part of this RFC: https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456 Reviewers: arsenm, RKSimon, fmayer, vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/177686	2026-02-06 17:11:15 -08:00
keremsahn	f6e130682f	[SelectionDAG] Mark LowerTypeTests as required and remove intrinsic handling from #142939 (#179249 ) Fixes #179125	2026-02-05 11:16:48 +01:00
Nicolai Hähnle	af836ff60c	[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843 ) There are target intrinsics that logically require two MMOs, such as llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS, so there's both a load and a store to different addresses. Add an overload of getTgtMemIntrinsic that produces intrinsic info in a vector, and implement it in terms of the existing (now protected) overload. GlobalISel and SelectionDAG paths are updated to support multiple MMOs. The main part of this change is supporting multiple MMOs in MemIntrinsicNodes. Converting the backends to using the new overload is a fairly mechanical step that is done in a separate change in the hope that that allows reducing merging pains during review and for downstreams. A later change will then enable using multiple MMOs in AMDGPU.	2026-02-02 21:58:42 +00:00
zhijian lin	dc520ea4af	[PowerPC] using milicode call for strcmp instead of lib call (#177009 ) 1. AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strcmp routine is a millicode implementation; we use millicode for the strcmp function instead of a library call to improve performance.	2026-02-02 09:34:53 -05:00
Wei Xiao	ea251669ba	[CodeGen] Fix MachineMemOperand Size of MaskedLoad (#156398 ) Fix MIR printing unknown-size issue of MaskedLoad.	2026-01-29 18:37:49 +00:00
Jameson Nash	b7c1a6f8b4	[CodeGen] Only use actual alloca alignment (#178361 ) Remove getPrefTypeAlign calls and use only the alloca's explicit alignment, since the type may not be semantically useful, there is no useful reason to change alignment to support it. The alloca's explicit alignment (from getAlign()) is already optimally correct; we don't need to derive alignment from the allocated type. Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-28 22:49:19 -05:00
Nikita Popov	1bad00adc4	[SDAG] Remove non-canonical fabs libcall handling (#177967 ) This is a followup to https://github.com/llvm/llvm-project/pull/171288, which removed lowering of libcalls to SDAG nodes for most libcalls that get unconditionally canonicalized to intrinsics. This handles the remaining fabs case, which I originally skipped due to larger test impact.	2026-01-26 15:11:17 +00:00
Luke Lau	cee36b23cc	[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693 ) Following on from #170796, this PR implements the second part of https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974 by allowing non-constant offsets in the vector splice intrinsics. Previously @llvm.vector.splice had a restriction enforced by the verifier that the offset had to be known to be within the range of the vector at compile time. Because we can't enforce this with non-constant offsets, it's been relaxed so that offsets that would slide the vector out of bounds return a poison value, similar to insertelement/extractelement. @llvm.vector.splice.left also previously only allowed offsets within the range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so that it's consistent with @llvm.vector.splice.right. In lieu of the verifier checks that were removed, InstSimplify has been taught to fold splices to poison when the offset is out of bounds. The cost model isn't implemented in this PR, and just returns invalid for any non-constant offsets for now. I think the correct way to cost these non-constant offets isn't through getShuffleCost because they can't handle variable masks, but instead just through getIntrinsicInstCost.	2026-01-21 10:58:40 +00:00
Matt Arsenault	0d4a35d560	IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484 ) These are long overdue for removal. These were originally a hack to support loading half values before there was any / decent support for the half type through the backend. There's no reason to continue supporting these, they're equivalent to fpext/fptrunc with a bitcast. SelectionDAG stopped translating these directly, and used the bitcast + fp cast since f7a02c17628e825, so there's been no reason to use these since 2014.	2026-01-21 09:50:28 +00:00
Matt Arsenault	aa57ee958d	CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829 ) Thread LibcallLoweringInfo into the TargetLowering hooks used by the stack protector passes.	2026-01-20 12:37:31 +01:00
Jameson Nash	ba2bd3fbba	Use AllocaInst::getAllocationSize instead of manual size calculations (#176486 ) Replace patterns that manually compute allocation sizes by multiplying getTypeAllocSize(getAllocatedType()) by the array size with calls to the getAllocationSize(DL) API, which handles this correctly and concisely, returning nullopt for VLAs. This fixes several places that were not accounting for array allocations when computing sizes, simplifies code that was doing this manually, and adds some explicit isFixed checks where implied convert was being used. This PR is because now that we have opaque pointers, I hate that some AllocaInst still has type information being consumed by some passes instead of just using the size, since passes rarely handle that type information well or correctly. I hope this will grow into a sequence of commits to slowly eliminate uses of getAllocatedType from AllocaInst. And similarly later to remove type information from GlobalValue too (it can be replaced with just dereferenceable bytes, similar to arguments). Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 09:55:52 -05:00
Nikita Popov	792670a400	[X86][WinEH] Insert nop after unwinding inline assembly (#176393 ) As discussed on https://github.com/llvm/llvm-project/pull/144745, insert a nop after unwinding inline assembly, as it may end on a call. While the change itself is trivial, I ended up having to do two infrastructure changes: * The unwind flag needs to be propagated to ExtraInfo of the MachineInstr. * The MachineInstr needs to be passed through to emitInlineAsmEnd(), and the method needs to be non-const. Fixes https://github.com/llvm/llvm-project/issues/157073.	2026-01-19 09:09:04 +01:00
zhijian lin	7b90f426a6	[PowerPC] using milicode call for strstr instead of lib call (#176002 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strstr routine is a millicode implementation; we use millicode for the strstr function instead of a library call to improve performance. I add a helper function `getRuntimeCallSDValueHelper` in the patch. I will refactor the function `SelectionDAG::getStrlen` `SelectionDAG::getStrcpy` etc later in another patch.	2026-01-15 14:58:17 -05:00
Ramkumar Ramachandra	d69335bac9	[LLVM] Clean up code using [not_]equal_to (NFC) (#175824 ) Use llvm::[not_]equal_to landed in d2a521750 ([ADT] Introduce bind_{front,back}, [not_]equal_to, #175056) across LLVM for cleaner code.	2026-01-13 21:19:39 +00:00
zhijian lin	b983b0e92a	[PowerPC] using milicode call for strcpy instead of lib call (#174782 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strcpy routine is a millicode implementation; we use millicode for the strcpy function instead of a library call to improve performance. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2026-01-12 08:58:45 -05:00
moorabbit	a5fa246435	[Clang] Add `__builtin_stack_address` (#148281 ) Add support for `__builtin_stack_address` builtin. The semantics match those of GCC's builtin with the same name. `__builtin_stack_address` returns the starting address of the stack region that may be used by called functions. It may or may not include the space used for on-stack arguments passed to a callee (See [GCC Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)). Fixes #82632.	2026-01-12 10:01:57 +01:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Benjamin Maxwell	fe3b4f0e0d	[SDAG] Use reference type in loop (NFC) (#174379 ) Fixes a -Wrange-loop-construct warning.	2026-01-05 10:42:41 +00:00
Benjamin Maxwell	a9fee3127a	[SDAG] Avoid crash when creating debug fragments for scalable vectors (#165233 ) Previously, we would crash in the SelectionDAGBuilder when attempting to create debug fragments for scalable vectors split across multiple registers. It does not seem like DW_OP_LLVM_fragment supports any notion of scalable type sizes. It takes both an offset and typesize as literals, with no indication of scalability (and it also does not seem to be considered in any of the places that handle DW_OP_LLVM_fragment). So the workaround here is to drop the debug info. Note: This is not usually an issue for IR that comes from the SVE ACLE, as we generally stick to using legal types there (that don't end up getting split). Workaround for: #161289	2026-01-04 09:53:58 +00:00
Sergei Barannikov	4534edb3f7	[SelectionDAG] Fix operand of BRCOND in visitSPDescriptorParent (#174230 ) The first operand should be a chain, but `GuardVal.getOperand(0)` isn't always a chain (i.e. if `TLI.emitStackGuardXorFP()` is called). Use `getControlRoot()` instead like in other places when creating terminator nodes. Extracted from #168421.	2026-01-02 19:08:28 +00:00
Leandro Lupori	25acd42fcc	Revert "[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114 )" (#173987 ) This reverts commit b6bfa856860bb4304e635102872a4c994af101b4. This commit broke Windows on Arm bots.	2025-12-30 10:58:01 -03:00
Nikita Popov	8ea8f682f7	Revert "[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo" (#173925 ) Reverts llvm/llvm-project#173500. Test fails depending on the host system.	2025-12-29 22:05:17 +00:00
Mikołaj Piróg	25d2a5b51f	[NFC] Rename variables to FPOp (#173792 ) In my earlier PR (https://github.com/llvm/llvm-project/pull/167574), I've named a variable in fpext function wrong. I've changed the name in both functions to generic FPOp	2025-12-28 22:00:01 +01:00
Islam Imad	7ceecfad40	[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413 ) Fixes #171608	2025-12-28 18:51:18 +00:00
MetalOxideSemi	7a3bbf724d	[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#173500 ) ## Summary Fix null pointer dereference in `SelectionDAGBuilder::resolveDanglingDebugInfo`. ## Problem `Val.getNode()->getIROrder()` is called before checking if `Val.getNode()` is null, causing crashes when compiling code with debug info that contains aggregate constants with nested empty structs. ## Solution Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())` block. ## Test Case Reproduces with aggregate types containing nested empty structs: ```llvm %3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 } { { i1, {} } zeroinitializer, ptr null, { { {} }, { {} } } zeroinitializer, i64 2 }, ptr %2, 1, !dbg !893 ## Crash stack 0. Program arguments: llc-20 -O3 -mcpu=native -relocation-model=pic -filetype=obj /cloudide/workspace/temp/sf.ll -o /dev/null 1. Running pass 'Function Pass Manager' on module '/cloudide/workspace/temp/sf.ll'. 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@filter_create' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 libLLVM.so.20.1 0x00007ff87ebbdf86 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 54 1 libLLVM.so.20.1 0x00007ff87ebbbb90 llvm::sys::RunSignalHandlers() + 80 2 libLLVM.so.20.1 0x00007ff87ebbe640 3 libpthread.so.0 0x00007ff87db79140 4 libLLVM.so.20.1 0x00007ff87f3fd2ff llvm::SelectionDAGBuilder::resolveDanglingDebugInfo(llvm::Value const, llvm::SDValue) + 303 5 libLLVM.so.20.1 0x00007ff87f3fda5e llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 142 6 libLLVM.so.20.1 0x00007ff87f3fe79f llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const) + 3343 7 libLLVM.so.20.1 0x00007ff87f3fda34 llvm::SelectionDAGBuilder::getValue(llvm::Value const) + 100 8 libLLVM.so.20.1 0x00007ff87f3fc1ab llvm::SelectionDAGBuilder::visitInsertValue(llvm::InsertValueInst const&) + 603 9 libLLVM.so.20.1 0x00007ff87f3eeaf7 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 327 10 libLLVM.so.20.1 0x00007ff87f4904b8 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void, true, llvm::BasicBlock>, false, true>, bool&) + 72 11 libLLVM.so.20.1 0x00007ff87f490304 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5956 12 libLLVM.so.20.1 0x00007ff87f48e2b4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 372 13 libLLVM.so.20.1 0x00007ff87f48c689 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) + 169 14 libLLVM.so.20.1 0x00007ff87efb8e32 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 610 15 libLLVM.so.20.1 0x00007ff87ed104be llvm::FPPassManager::runOnFunction(llvm::Function&) + 638 16 libLLVM.so.20.1 0x00007ff87ed15ff3 llvm::FPPassManager::runOnModule(llvm::Module&) + 51 17 libLLVM.so.20.1 0x00007ff87ed10c11 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1105 18 llc-20 0x000055972ce77dc1 main + 9649 19 libc.so.6 0x00007ff87d68ad7a __libc_start_main + 234 20 llc-20 0x000055972ce7247a _start + 42	2025-12-28 18:00:46 +00:00
Craig Topper	877df9e4b9	[SelectionDAG] Make SSHLSAT/USHLSAT obey getShiftAmountTy(). (#173216 ) Treat these like other shift operations by allowing the shift amount to be a different type than the result. The PromoteIntOp_Shift and LegalizeDAG code are not tested due to lack of target support. I'm looking at adding SSHLSAT for the RISC-V P extension. I don't need this support for that since RISC-V only has one legal type. I just thought it was odd that they weren't like other shifts.	2025-12-22 10:28:04 -08:00
Jonas Paulsson	100077dbff	[SelectionDAGBuilder] Don't add base offset in LowerFormalArguments(). (#170732 ) LowerCallTo() and LowerArguments() are both providing the PartOffset field for each split argument part. As these two methods are intended to work together, they should both provide the same offsets. However, LowerArguments() has been providing the offset from the beginning of the struct while LowerCallTo() sets it relative to the first split part. This patch removes the PartBase variable in LowerArguments() so that the behavior matches LowerCallTo(): offsets to split parts of an argument are relative to the first part of the argument.	2025-12-19 11:27:07 -06:00
Pan Tao	b6bfa85686	[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114 ) This strengthens the guard and matches MSVC. Fixes #156573 .	2025-12-17 12:52:28 -08:00
Sam Tebbs	19e1011df5	[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565 ) Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are currently hard to split correctly, and there are a number of incorrect cases. The difficulty comes from how the intrinsics are defined. For example, take `LOOP_DEPENDENCE_WAR_MASK`. It is defined as the OR of: * `(ptrB - ptrA) <= 0` * `elementSize * lane < (ptrB - ptrA)` Now, if we want to split a loop dependence mask for the high half of the mask we want to compute: * `(ptrB - ptrA) <= 0` * `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)` However, with the current opcode definitions, we can only modify ptrA or ptrB, which may change the result of the first case, which should be invariant to the lane. This patch resolves these cases by adding a "lane offset" to the ISD opcodes. The lane offset is always a constant. For scalable masks, it is implicitly multiplied by vscale. This makes splitting trivial as we increment the lane offset by `LoVT.getElementCount()` now. Note: In the AArch64 backend, we only support zero lane offsets (as other cases are tricky to lower to whilewr/rw). --------- Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>	2025-12-12 08:44:33 +00:00
Nikita Popov	5a24dfa339	[SDAG] Remove most non-canonical libcall handing (#171288 ) This is a followup to https://github.com/llvm/llvm-project/pull/171114, removing the handling for most libcalls that are already canonicalized to intrinsics in the middle-end. The only remaining one is fabs, which has more test coverage than the others.	2025-12-10 11:45:26 +01:00
Nikita Popov	d5b3ba6596	[SDAG] Don't handle non-canonical libcalls in SDAG lowering (#171114 ) SDAG currently tries to lower certain libcalls to ISD opcodes. However, many of these are already canonicalized from libcalls to intrinsic in the middle-end (and often already emitted as intrinsics in the front-end). I believe that SDAG should not be doing anything for such libcalls. This PR just drops a single libcall to get consensus on the direction, as these changes need a non-trivial amount of test updates. A lot of the remaining libcalls should probably also be canonicalized to intrinsics in the middle-end when annotated with `memory(none)`, but that would require additional work in SimplifyLibCalls.	2025-12-09 08:07:33 +01:00
Robert Imschweiler	e84fdbe1ef	[IR] Add CallBr intrinsics support (#133907 ) This commit adds support for using intrinsics with callbr. The uses of this will most of the time look like this example: ```llvm callbr void @llvm.amdgcn.kill(i1 %c) to label %cont [label %kill] kill: unreachable cont: ... ```	2025-12-04 10:21:00 +01:00
Luke Lau	d1500d12be	[SelectionDAG] Add SelectionDAG::getTypeSize. NFC (#169764 ) Similar to how getElementCount avoids the need to reason about fixed and scalable ElementCounts separately, this patch adds getTypeSize to do the same for TypeSize. It also goes through and replaces some of the manual uses of getVScale with getTypeSize/getElementCount where possible.	2025-12-01 10:33:50 +00:00
Peter Collingbourne	6227eb90da	Add IR and codegen support for deactivation symbols. Deactivation symbols are a mechanism for allowing object files to disable specific instructions in other object files at link time. The initial use case is for pointer field protection. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/133536	2025-11-26 12:37:09 -08:00
Drew Kersnar	17852deda7	[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387 ) This backend support will allow the LoadStoreVectorizer, in certain cases, to fill in gaps when creating load/store vectors and generate LLVM masked load/stores (https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To accomplish this, changes are separated into two parts. This first part has the backend lowering and TTI changes, and a follow up PR will have the LSV generate these intrinsics: https://github.com/llvm/llvm-project/pull/159388. In this backend change, Masked Loads get lowered to PTX with `#pragma "used_bytes_mask" [mask];` (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). And Masked Stores get lowered to PTX using the new sink symbol syntax (https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st). # TTI Changes TTI changes are needed because NVPTX only supports masked loads/stores with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to check that the mask is constant and pass that result into the TTI check. Behavior shouldn't change for non-NVPTX targets, which do not care whether the mask is variable or constant when determining legality, but all TTI files that implement these API need to be updated. # Masked store lowering implementation details If the masked stores make it to the NVPTX backend without being scalarized, they are handled by the following: * `NVPTXISelLowering.cpp` - Sets up a custom operation action and handles it in lowerMSTORE. Similar handling to normal store vectors, except we read the mask and place a sentinel register `$noreg` in each position where the mask reads as false. For example, ``` t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1> t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10 -> STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1); ``` * `NVPTXInstInfo.td` - changes the definition of store vectors to allow for a mix of sink symbols and registers. * `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_". # Masked load lowering implementation details Masked loads are routed to normal PTX loads, with one difference: a `#pragma "used_bytes_mask"` is emitted before the load instruction (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). To accomplish this, a new operand is added to every NVPTXISD Load type representing this mask. * `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal NVPTXISD loads with a mask operand in two ways. 1) In type legalization through replaceLoadVector, which is the normal path, and 2) through LowerMLOAD, to handle the legal vector types (v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both share the same convertMLOADToLoadWithUsedBytesMask helper. Both default this operand to UINT32_MAX, representing all bytes on. For the latter, we need a new `NVPTXISD::MLoadV1` type to represent that edge case because we cannot put the used bytes mask operand on a generic LoadSDNode. * `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them to created machine instructions. * `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask isn't all ones. * `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update manual indexing of load operands to account for new operand. * `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to the MI definitions. * `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get tagged as invariant. Some generic changes that are needed: * `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting masked loads. * `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked load SDNode creation	2025-11-25 10:26:15 -06:00
Matt Arsenault	db20a7f2bc	DAG: Fix constructing a temporary TargetTransformInfo instance (#168480 )	2025-11-20 01:19:23 -05:00
Mikołaj Piróg	e7b41df10e	[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574 ) As in title. Without this, fpext behaves in selectionDAG as always having no fast-math flags.	2025-11-14 20:50:59 -08:00
Matt Arsenault	24be0ba39b	DAG: Fix assert on nofpclass call with aggregate return (#167725 )	2025-11-12 18:12:20 +00:00
zhijian lin	85d2b10838	[DAG] Make strictfp attribute only restricts for libm and make non-math optimizations possible (#165464 ) the patch [Add strictfp attribute to prevent unwanted optimizations of libm calls](https://reviews.llvm.org/D34163) add `I.isStrictFP()` into ``` if (!I.isNoBuiltin() && !I.isStrictFP() && !F->hasLocalLinkage() && F->hasName() && LibInfo->getLibFunc(*F, Func) && LibInfo->hasOptimizedCodeGen(Func)) ``` it prevents the backend from optimizing even non-math libcalls such as `strlen` and `memcmp` if a call has the strict floating-point attribute. For example, it prevent converting strlen and memcmp to milicode call __strlen and __memcmp.	2025-11-11 13:34:14 -05:00
Matt Arsenault	b4f1994280	DAG: Add AssertNoFPClass from call return attributes (#167264 ) This defends against regressions in future patches. This excludes the target intrinsic case for now; I'm worried introducing an intermediate AssertNoFPClass is likely to break combines.	2025-11-10 16:42:48 +00:00
Damian Heaton	70f4b596cf	Add `llvm.vector.partial.reduce.fadd` intrinsic (#159776 ) With this intrinsic, and supporting SelectionDAG nodes, we can better make use of instructions such as AArch64's `FDOT`.	2025-11-07 15:36:54 +00:00
Daniel Thornburgh	5f08fb4d72	[IR] llvm.reloc.none intrinsic for no-op symbol references (#147427 ) This intrinsic emits a BFD_RELOC_NONE relocation at the point of call, which allows optimizations and languages to explicitly pull in symbols from static libraries without there being any code or data that has an effectual relocation against such a symbol. See issue #146159 for context.	2025-11-06 08:52:46 -08:00
Sergei Barannikov	71927ddb63	[CodeGen] Delete two ComputeValueVTs overloads (NFC) (#166758 ) Those have only a few uses.	2025-11-06 19:45:29 +03:00
Robert Imschweiler	cad96ad703	[NFC] Refactor target intrinsic call lowering (#153204 ) Refactor intrinsic call handling in SelectionDAGBuilder and IRTranslator to prepare the addition of intrinsic support to the callbr instruction, which should then share code with the handling of the normal call instruction.	2025-11-06 10:51:44 +01:00
Matt Arsenault	3c2c9d5bc1	DAG: Cleanup string bool attribute check for disable-tail-calls (#166237 )	2025-11-03 14:18:04 -08:00
Luo Yuanke	9a0a1fadef	[ISel] Use CallBase instead of CallInst (#164769 ) This is to follow the discussion in https://github.com/llvm/llvm-project/pull/164565 CallBase can cover more call-like instructions which carry caling convention flag. Co-authored-by: Yuanke Luo <ykluo@birentech.com>	2025-10-25 20:37:20 +08:00

1 2 3 4 5 ...

2246 Commits