llvm-project

Author	SHA1	Message	Date
Sam Clegg	cac54a8ad0	[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined externally (#159143 ) Rather then defining these tags in each object file that requires them we can can declare them as undefined and require that they defined externally in, for example, compiler-rt or libcxxabi.	2025-09-19 10:11:15 -07:00
zhijian lin	be6c4d933d	[PowerPC] using milicode call for strlen instead of lib call (#153600 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.	2025-09-19 10:02:21 -04:00
Mikhail Gudim	562146499c	[CodeGen][NewPM] Port `ReachingDefAnalysis` to new pass manager. (#159572 ) In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`	2025-09-19 09:38:34 -04:00
Matt Arsenault	6b54c92be0	CodeGen: Add RegisterClass by HwMode (#158269 ) This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.	2025-09-19 20:08:51 +09:00
Fabian Ritter	d5607694e1	[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075 ) If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which AMDGPU for instance needs when folding constant offsets. For SWDEV-516125.	2025-09-19 11:58:41 +02:00
Fabian Ritter	771c94c8db	[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074 ) This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic.	2025-09-19 11:07:59 +02:00
Fabian Ritter	a2dcc88f39	[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330 ) There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.	2025-09-19 10:19:38 +02:00
Matt Arsenault	116ca9522e	Greedy: Take copy hints involving subregisters (#159570 ) Previously this would only accept full copy hints. This relaxes this to accept some subregister copies. Specifically, this now accepts: - Copies to/from physical registers if there is a compatible super register - Subreg-to-subreg copies This has the potential to repeatedly add the same hint to the hint vector, but not sure if that's a real problem.	2025-09-19 09:37:36 +09:00
Qiu Chaofan	e8311f8ebc	[DebugInfo] Emit skeleton to avoid mismatching inlining flags (#153568 ) This actually reverts 418120556398c01550d42500d56e6d328290185b. The original commit omits unit with all symbols inlined into current one, which leads to crash when a module using split-dwarf inlined a function from another module with mismatched split-dwarf-inlining option. This revert guarantees that DIEs are created in both DWO and the skeleton sections whenever split-dwarf is active.	2025-09-18 12:46:10 -07:00
Scott Linder	ad68e5d56c	[LiveDebugVariables] Use bundle-aware iterators consistently (#159471 ) Most of the pass works in terms of MachineBasicBlock::iterator (MachineInstrBundleIterator), but here one is constructed from an arbitrary instruction which may be within a bundle, causing an assertion.	2025-09-18 10:47:07 -04:00
Abhishek Kaushik	98ebb64a16	[NFC][MIRPrinter] Use `std::move` to avoid copy (#157832 )	2025-09-18 14:40:41 +05:30
woruyu	1a172b9924	[RISCV][GISel] Lower G_SSUBE (#157855 ) ### Summary Try to implemente Lower G_SSUBE in LegalizerHelper::lower	2025-09-18 10:08:56 +08:00
hev	7ca448e479	[LoongArch] Fix MergeBaseOffset for constant pool index operand (#159336 ) Fixes #159200	2025-09-18 10:06:33 +08:00
Björn Pettersson	1c4c7bd808	[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102 ) As reported in https://github.com/llvm/llvm-project/issues/141034 SelectionDAG::getNode had some unexpected behaviors when trying to create vectors with UNDEF elements. Since we treat both UNDEF and POISON as undefined (when using isUndef()) we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on isUndef(), as that could make the resulting vector more poisonous. Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR. Here are some examples: This fold was done even if vec[idx] was POISON: INSERT_VECTOR_ELT vec, UNDEF, idx -> vec This fold was done even if any of vec[idx..idx+size] was POISON: INSERT_SUBVECTOR vec, UNDEF, idx -> vec This fold was done even if the elements not extracted from vec could be POISON: sub = EXTRACT_SUBVECTOR vec, idx INSERT_SUBVECTOR UNDEF, sub, idx -> vec With this patch we avoid such folds unless we can prove that the result isn't more poisonous when eliminating the insert. Fixes https://github.com/llvm/llvm-project/issues/141034	2025-09-17 21:04:00 +00:00
Ramkumar Ramachandra	7fb3a91418	[PatternMatch] Introduce match functor (NFC) (#159386 ) A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <luke@igalia.com>	2025-09-17 21:04:33 +01:00
Vladislav Dzhidzhoev	432b58915a	[DebugInfo][DwarfDebug] Separate creation and population of abstract subprogram DIEs (#159104 ) With this change, construction of abstract subprogram DIEs is split in two stages/functions: creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE) and its population with children (in DwarfCompileUnit::constructAbstractSubprogramScopeDIE). With that, abstract subprograms can be created/referenced from DwarfDebug::beginModule, which should solve the issue with static local variables DIE creation of inlined functons with optimized-out definitions. It fixes https://github.com/llvm/llvm-project/issues/29985. LexicalScopes class now stores mapping from DISubprograms to their corresponding llvm::Function's. It is supposed to be built before processing of each function (so, now LexicalScopes class has a method for "module initialization" alongside the method for "function initialization"). It is used by DwarfCompileUnit to determine whether a DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is invoked. DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can create an abstract or a concrete DIE for a subprogram. It accepts llvm::Function* argument to determine whether a concrete DIE must be created. This is a temporary fix for https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be fixed by moving global variables and types emission to DwarfDebug::endModule (https://reviews.llvm.org/D144007, https://reviews.llvm.org/D144005). Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in https://github.com/llvm/llvm-project/pull/90523 was taken for this commit.	2025-09-17 20:06:49 +02:00
Simon Pilgrim	57d67bec6d	[DAG] getNode() - reuse result type instead of calling getValueType again. NFC. (#159381 ) We have assertions above confirming VT == N1.getValueType() for INSERT_VECTOR_ELT nodes.	2025-09-17 15:52:09 +00:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Matt Arsenault	1dbb932fd8	GlobalISel: Relax verifier between physreg and typed vreg (#159281 ) Accept mismatched register size and type size if the type is legal for the register class. For AMDGPU boolean registers have 2 possible interpretations depending on the use context type. e.g., these are both equally valid: %0:_(s1) = COPY $vcc %1:_(s64) = COPY $vcc vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit value depending on the use context. SelectionDAG has never required exact match between the register size and the used value type. You can assign a type with a smaller size to a larger register class. Relax the verifier to match. There are several hacks holding together these copies in various places, and this is preparation to remove one of them. The x86 test change is from what I would consider an X86 usage bug. X86 defines an FR32 register class and F16 register class, but the F16 register class is functionally an alias of F32 with the same members and size. There's no need to have the F16 class.	2025-09-17 19:43:50 +09:00
Mingming Liu	8b3c91c4fb	Re-apply "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161 ) This is a reland of https://github.com/llvm/llvm-project/pull/158460 Test failures are gone once I undo the changes in codegenprepare.	2025-09-16 20:33:29 +00:00
Mingming Liu	9277bcd1ab	Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159 ) Reverts llvm/llvm-project#158460 due to buildbot failures	2025-09-16 12:51:54 -07:00
Mingming Liu	027bccc469	[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460 ) Before this change, `setSectionPrefix` overwrites existing section prefix with new one unconditionally. After this change, `setSectionPrefix` checks for equivalences, updates conditionally and returns whether an update happens. Update the existing callers to make use of the return value. [PR 155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9) is a motivating use case whether the 'update' semantic is needed.	2025-09-16 12:01:21 -07:00
Craig Topper	f209d63b04	[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400 ) The PowerPC changes are caused by shifts created by different IR operations being CSEd now. This allows consecutive loads to be turned into vectors earlier. This has effects on the ordering of other combines and legalizations. This leads to some improvements and some regressions.	2025-09-16 10:26:49 -07:00
guan jian	6aab826e23	[DAGCombiner] add fold (xor (smin(x, C), C)) and fold (xor (smax(x, C), C)) (#155141 ) Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is: ``` define i64 @test_smin_neg_one(i64 %a) { %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1) %retval.0 = xor i64 %1, -1 ret i64 %retval.0 } ``` GCC generates: ``` cmp x0, 0 csinv x0, xzr, x0, ge ret ``` Clang generates: ``` cmn x0, #1 csinv x8, x0, xzr, lt mvn x0, x8 ret ``` Clang keeps flipping x0 through x8 unnecessarily. So I added the following folds to DAGCombiner: fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0 fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0 alive2: https://alive2.llvm.org/ce/z/gffoir --------- Co-authored-by: Yui5427 <785369607@qq.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-09-16 15:30:57 +00:00
zhijian lin	2771d35e2a	[NFC ]Add a helper function isTailCall for getting libcall in SelectionDAG (#155256 ) Based on comment of https://github.com/llvm/llvm-project/pull/153600#discussion_r2285729269, Add a helper function isTailCall for getting libcall in SelectionDAG.	2025-09-16 10:17:29 -04:00
Paul Walker	aa1a694846	[LLVM][GlobalISel] Make CSEMIRBuilder::buildConstant scalable vector aware. (#158299 )	2025-09-16 11:44:26 +01:00
Matt Arsenault	ea9acc97f1	CodeGen: Surface shouldRewriteCopySrc utility function (#158524 ) Change shouldRewriteCopySrc to return the common register class and expose it as a utility function. I've found myself reproducing essentially the same logic in multiple places. The purpose of this function is to jsut work through the API constraints of which combination of register class and subreg indexes you have. i.e. you need to use a different function if you have 0, 1, or 2 subregister indexes involved in a pair of copy-like operations.	2025-09-16 14:53:49 +09:00
Craig Topper	9bedece621	[LegalizeTypes] Use correct type for constant in PromoteIntRes_FunnelShift. This is a typo from #158553. We should use AmtVT instead of VT. I guess VT and AmtVT are always the same at this point for tested targets.	2025-09-15 15:54:26 -07:00
Craig Topper	bc745dcd78	[LegalizeTypes] Use getShiftAmountConstant in PromoteIntRes_FunnelShift. (#158553 )	2025-09-15 10:29:19 -07:00
David Green	1c21d5cb9b	[GlobalISel] Remove GI known bits cache (#157352 ) There is a cache on the known-bit computed by global-isel. It only works inside a single query to computeKnownBits, which limits its usefulness, and according to the tests can sometimes limit the effectiveness of known-bits queries. (Although some AMD tests look longer). Keeping the cache valid and clearing it at the correct times can also require being careful about the functions called inside known-bits queries. I measured compile-time of removing it and came up with: ``` 7zip 2.06405E+11 2.06436E+11 0.015018992 Bullet 1.01298E+11 1.01186E+11 -0.110236169 ClamAV 57942466667 57848066667 -0.16292023 SPASS 45444466667 45402966667 -0.091320249 consumer 35432466667 35381233333 -0.144594317 kimwitu++ 40858833333 40927933333 0.169118877 lencod 70022366667 69950633333 -0.102443457 mafft 38439900000 38413233333 -0.069372362 sqlite3 35822266667 35770033333 -0.145812474 tramp3d 82083133333 82045600000 -0.045726 Average -0.068828739 ``` The last column is % difference between with / without the cache. So in total it seems to be costing slightly more to keep the current known-bits cache than if it was removed. (Measured in instruction count, similar to llvm-compile-time-tracker). The hit rate wasn't terrible - higher than I expected. In the llvm-test-suite+external projects it was hit 4791030 times out of 91107008 queries, slightly more than 5%. Note that as globalisel increases in complexity, more known bits calls might be made and the numbers might shift. If that is the case it might be better to have a cache that works across calls, providing it doesn't make effectiveness worse.	2025-09-15 07:32:00 +01:00
Craig Topper	4cbf4408e7	[SelectionDAG] Use getShiftAmountConstant. (#158395 ) Many of the shifts in LegalizeIntegerTypes.cpp were using getPointerTy.	2025-09-12 19:49:48 -07:00
Craig Topper	4ebd202329	[LegalizeTypes][X86] Use getShiftAmountConstant in ExpandIntRes_SIGN_EXTEND. (#158388 ) This ensures we don't need to fixup the shift amount later. Unfortunately, this enabled the (SRA (SHL X, ShlConst), SraConst) -> (SRA (sext_in_reg X), SraConst - ShlConst) combine in combineShiftRightArithmetic for some cases in is_fpclass-fp80.ll. So we need to also update checkSignTestSetCCCombine to look through sign_extend_inreg to prevent a regression.	2025-09-12 19:49:29 -07:00
Craig Topper	0ca54d7738	[LegalizeTypes] Use getShiftAmountConstant in SplitInteger. (#158392 ) This function contained old code for handling the case that the type returned getScalarShiftAmountTy can't hold the shift amount. These days this is handled by getShiftAmountTy which is used by getShiftAmountConstant.	2025-09-12 18:54:48 -07:00
Afanasyev Ivan	ffcaeca90a	[CodeGen] Fix partial phi input removal in TailDuplicator. (#158265 ) Tail duplicator removes the first PHI income from the predecessor basic block, while it should remove all operands for this block. PHI instructions happen to have duplicated values for the same predecessor block: * `UnreachableMachineBlockElim` assumes that PHI instruction might have duplicates: `7289f2cd0c/llvm/lib/CodeGen/UnreachableBlockElim.cpp (L160)` * `AArch64` directly states that PHI instruction might have duplicates: `7289f2cd0c/llvm/lib/Target/AArch64/AArch64ConditionalCompares.cpp (L244)` * And `Hexagon`: `7289f2cd0c/llvm/lib/Target/Hexagon/HexagonConstPropagation.cpp (L844)` We have caught the bug on custom out-of-tree backend. `TailDuplicator` should remove all operands corresponding to the removing block. Please note, that bug likely does not affect in-tree backends, because: * It happens only in scenario of partial tail duplication (i.e. tail block is duplicated in some predecessors, but not in all of them) * It happens in Pre-RA tail duplication only (Post-RA does not contain PHIs, obviously) * The only backend (I know) uses Pre-RA tail duplication is X86. It uses tail duplication via `early-tailduplication` pass which declines partial tail duplication via `canCompletelyDuplicateBB` check, because it uses `TailDuplicator::tailDuplicateBlocks` public API. So, bug happens only in the case of pre-ra partial tail duplication if backend uses `TailDuplicator::tailDuplicate` public API directly. That's why I can not add reproducer test for in-tree backends.	2025-09-13 10:45:54 +09:00
Craig Topper	f32874f77b	[LegalizeIntegerTypes] Use getShiftAmountConstant.	2025-09-12 16:10:01 -07:00
Matt Arsenault	7289f2cd0c	CodeGen: Remove MachineFunction argument from getRegClass (#158188 ) This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.	2025-09-12 19:22:02 +09:00
Matt Arsenault	2331fbb019	CodeGen: Remove MachineFunction argument from getPointerRegClass (#158185 ) getPointerRegClass is a layering violation. Its primary purpose is to determine how to interpret an MCInstrDesc's operands RegClass fields. This should be context free, and only depend on the subtarget. The model of this is also wrong, since this should be an instruction / operand specific property, not a global pointer class. Remove the the function argument to help stage removal of this hook and avoid introducing any new obstacles to replacing it. The remaining uses of the function were to get the subtarget, which TargetRegisterInfo already belongs to. A few targets needed new subtarget derived properties copied there.	2025-09-12 09:18:50 +00:00
Owen Anderson	0f13cae7ff	[CodeGen, CHERI] Add capability types to MVT. (#156616 ) This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend. These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types. Co-authored-by: David Chisnall <theraven@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>	2025-09-11 17:44:30 +08:00
Yi-Chi Lee	0c6141a07a	[GlobalISel] Add computeNumSignBits for SHL (#152067 ) This patch ports the `ISD::SHL` handling from SelectionDAG’s `ComputeNumSignBits` to GlobalISel. Related to #150515.	2025-09-11 16:00:30 +09:00
Abhishek Kaushik	1278ac71d3	[NFC][GlobalISel] Pass `APInt` by const reference (#157827 ) Change `SpecificConstantMatch` constructor and `isBuildVectorConstantSplat` overloads to take `const APInt&` instead of by value to avoid unnecessary copies, especially for wide integers.	2025-09-11 11:11:14 +05:30
Shaoce SUN	41d7ae84e5	[RISCV][GlobalIsel] Lower G_FMINIMUMNUM, G_FMAXIMUMNUM (#157295 ) Similar to the implementation in https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s` instructions follow IEEE 754-2019 semantics, and `G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.	2025-09-11 10:16:42 +08:00
woruyu	c69172637e	[RISCV][GISel] Lower G_SADDE (#156865 ) ### Summary Try to implemente Lower G_SADDE in LegalizerHelper::lower	2025-09-11 09:32:56 +08:00
Craig Topper	8f8429540e	[ExpandVectorPredication] Keep the original value name when expanding predicated instructions. (#157943 ) This makes it easier to follow a value through the pass. If we pass the original name to the create function, a number will be added as a suffix since the original name is still used until it is replaced.	2025-09-10 16:18:11 -07:00
Arthur Eubanks	984251acad	Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953 ) Reverts llvm/llvm-project#157658 Causes hangs, see https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812	2025-09-10 21:33:44 +00:00
Craig Topper	262c7b7b5a	[RISCV][GISel] Widen G_ABDS/G_ABDU before lowering when Zbb is enabled. (#157766 ) This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.	2025-09-10 12:17:30 -07:00
Craig Topper	397e5a457a	[ExpandVectorPredication] Expand vp_merge and vp_select in expandPredication. (#157777 )	2025-09-10 08:50:30 -07:00
jyli0116	619d36ff4f	[GISel] Combine shift + trunc + shift pattern (#155583 ) Folds shift(trunc(shift(...))) pattern into trunc(shift(...)) by combining the two shift instructions	2025-09-10 15:01:55 +01:00
Jay Foad	349544d7ab	[CodeGen] Fix handling dead redefs in finalizeBundle (#157427 ) A dead redefinition should override any earlier non-dead definition inside a bundle. Also remove KilledDefSet since it can be folded into DeadDefSet.	2025-09-10 12:48:12 +01:00
Frederik Harwath	ffcf82c4a8	[AMDGPU] Change expand-fp opt level argument syntax (#157408 ) Align the syntax used for the optimization level argument of the expand-fp pass in textual descriptions of pass pipelines with the syntax used by other passes taking a similar argument. That is, use e.g. `expand-fp<O1>` instead of `expand-fp<opt-level=1>`.	2025-09-10 10:44:28 +02:00
ZhaoQi	4621e17dee	[DAGCombiner] Relax condition for extract_vector_elt combine (#157658 ) Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows more optimization opportunities. In particular, if a target wants to mark `extract_vector_elt` as `Custom` rather than `Legal` in order to optimize some certain cases, this combiner would otherwise miss some improvements. Previously, using `isOperationLegalOrCustom` was avoided due to the risk of getting stuck in infinite loops (as noted in `61ec738b60`). After testing, the issue no longer reproduces, but the coverage is limited to the regression/unit tests and the test-suite.	2025-09-10 15:51:52 +08:00

1 2 3 4 5 ...

38377 Commits