llvm-project

Author	SHA1	Message	Date
Nikita Popov	a22d1b5d43	[ConstantInt] Add ImplicitTrunc parameter to getSigned() (NFC) (#172875 ) For consistency with `ConstantInt::get()`, add an ImplicitTrunc parameter to `ConstantInt::getSigned()` as well. It currently defaults to true and will be flipped to false in the future (by #171456).	2025-12-19 09:48:26 +01:00
Alex MacLean	a40f444265	[NVPTX] Add support for barrier.cta.red.* instructions (#172541 ) This change adds full support for the ptx `barrier.cta.red` instruction, following the same conventions as are already used for `barrier.cta.sync` and `barrier.cta.arrive`. In addition this MR removes the following intrinsics which are no longer needed: * llvm.nvvm.barrier0.popc --> llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c) * llvm.nvvm.barrier0.and --> llvm.nvvm.barrier.cta.red.and.aligned.all(0, z) * llvm.nvvm.barrier0.or --> llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)	2025-12-18 18:06:27 -08:00
Stefan Schmidt	759fb0a224	[llvm][LLD][COFF] Add fat-lto-object support for COFF targets (#165529 ) This adds support for FatLTO to COFF targets in clang and lld. The changes are adapted from `610fc5cbcc` and `14e3bec8fc` but much smaller because it just needed the COFF-specific parts wired in, and I tried my best to adapt the pre-existing ELF tests for the COFF version. My main goal is to be able to use this for shipping pre-built https://github.com/XboxDev/nxdk container images someday, which uses the `i386-pc-win32` target.	2025-12-18 22:53:25 +02:00
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Nikita Popov	e957c81750	[InstCombine] Use getSigned() for negative number in shift transform Fixes the issue reported at: https://github.com/llvm/llvm-project/pull/171456#issuecomment-3668263635	2025-12-18 11:00:39 +01:00
Simon Pilgrim	24d9550b27	[VectorCombine] foldShuffleOfBinops - if both operands are the same don't duplicate the total new cost (#172719 ) If we're shuffling/concatenating the same operands then ensure we don't duplicate the total cost, ensure we reuse the final shuffle and recognise that we reduce the total instruction count (so fold even when NewCost == OldCost, not just NewCost < OldCost).	2025-12-18 07:03:06 +00:00
Florian Hahn	9cc1585b13	[VPlan] Add VPBlockUtils::transferSuccessors (NFCI). Add a new helper to transfer successors to a new, unconnected VPBB. Helps to simplify existing code, and prepare for upcoming changes.	2025-12-17 22:48:22 +00:00
Florian Hahn	bab0dc4d48	Reapply "[LV] Mark checks as never succeeding for high cost cutoff." Reapply 8a115b6934a90441 with an update to tests handling remarks. The patch now directly emits a clear remark when we bail out due to the memory check threshold. Original message: When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588	2025-12-17 20:21:49 +00:00
Miloš Poletanović	44a52ea8be	[InstCombine] Fix unsafe PHINode cast and simplify logic in PointerReplacer (#172332 ) Fixes #171883. Basically, if the operand of the phi is an Instruction but it's not available, the [condition ](`1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L300)`)would just break, and when we reach the[ deferral check](`1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L313)`), execution would continue even though there is a non-Instruction operand, leading to a crash in the [subsequent processing loop](`1847a4efae/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (L320)`).	2025-12-17 12:07:40 +00:00
Nikita Popov	dea9ec84a4	[SLSR] Allow implicit truncation for element size Ideally we'd reject too large types in the IR verifier, but for now we should follow the usual sext-or-trunc GEP semantics here.	2025-12-17 12:56:23 +01:00
Florian Hahn	eb0c7e752f	[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181 ) Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes to simplify codegen. PR: https://github.com/llvm/llvm-project/pull/172181	2025-12-16 19:19:31 +00:00
Ramkumar Ramachandra	1c6e5b2d04	[LV] Improve code using VPlan::get{ConstantInt,True} (NFC) (#172471 )	2025-12-16 13:03:43 +00:00
Nikita Popov	447c96363a	[SimplifyLibCalls] Avoid implicit truncation in convertStrToInt() This addresses two implicit truncation issues: * For the signed case, pass AsSigned. * For the negated unsigned case, truncate explicitly for clarity.	2025-12-16 09:33:47 +01:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Elvis Wang	1eba2cbe72	[LV] Convert uniform-address unmasked scatters to scalar store. (#166114 ) This patch optimizes vector scatters that have a uniform (single-scalar) address by replacing them with "extract-last-lane + scalar store" when the scatter is unmasked. Notes: - The legacy cost model can scalarize a store if both the address and the value are uniform. In VPlan we materialize the stored value via ExtractLastLane, so only the address must be uniform. - Some of the loops won't be vectorized any sine no vector instructions will be generated.	2025-12-16 12:24:22 +08:00
Florian Hahn	83eea87a36	[VPlan] Create header phis once, after constructing VPlan0 (NFC). (#168291 ) Together with https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 we can construct header phis once up front, after creating VPlan0, as the induction/reduction/first-order-recurrence classification applies across all VFs. Depends on https://github.com/llvm/llvm-project/pull/168289 & https://github.com/llvm/llvm-project/pull/166099 PR: https://github.com/llvm/llvm-project/pull/168291	2025-12-15 22:12:10 +00:00
Florian Hahn	dbb4f5c2dd	[VPlan] Set VF scale factor in tryToCreatePartialReduction (NFCI). Split off unrelated change from approved https://github.com/llvm/llvm-project/pull/168291/ to land separately as suggested.	2025-12-15 21:18:07 +00:00
Teresa Johnson	4b78647754	[MemProf] Add CalleeGUIDs from profile to existing VP metadata (#171495 ) Previously, we only synthesized VP metadata with the callee GUIDs from the memprof profile if no VP metadata already existed (i.e. from PGO). With this change we will add in any that are not already in the VP metadata, also with count 1.	2025-12-15 12:19:56 -08:00
Matt Arsenault	cbb2aa9b2d	InstCombine: Replace some isa<FPMathOperator> with dyn_cast (#172356 ) This isa and get flag pattern is essentially an abstracted isa and dyn_cast, so make this more direct.	2025-12-15 20:10:29 +00:00
Nicolai Hähnle	88bd56597c	VectorCombine: Improve the insert/extract fold in the narrowing case (#168820 ) Keeping the extracted element in a natural position in the narrowed vector has two beneficial effects: 1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which allows the insert/extract fold to trigger. 2. It makes the narrowing shuffles in a chain of extract/insert compatible, which allows foldLengthChangingShuffles to successfully recognize a chain that can be folded. There are minor X86 test changes that look reasonable to me. The IR change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2` at all.	2025-12-15 11:25:51 -08:00
Teresa Johnson	e3c621c50b	[ThinLTO][MemProf] Add option to override max ICP with larger number (#171652 ) Adds an option -module-summary-max-indirect-edges, and wiring into the ICP logic that collects promotion candidates from VP metadata, to support a larger number of promotion candidates for use in building the ThinLTO summary. Also use this in the MemProf ThinLTO backend handling where we perform memprof ICP during cloning. The new option, essentially off by default, can be used to override the value of -icp-max-prom, which is checked internally in ICP, with a larger max value when collecting candidates from the VP metadata. For MemProf in particular, where we synthesize new VP metadata targets from allocation contexts, which may not be all that frequent, we need to be able to include a larger set of these targets in the summary in order to correctly handle indirect calls in the contexts. Otherwise we will not set up the callsite graph edges correctly.	2025-12-15 10:16:06 -08:00
Alexey Bataev	b988555812	[SLP]Check if the extractelement is part of other buildvector node before marking for erasing Need to check if the extractelement instruction is part of other buildvector node, before trying to mark it for the deletion, otherwise the compiler may reuse the deleted instruction. Fixes #172221	2025-12-15 09:54:05 -08:00
Matt Arsenault	463c9f08be	InstCombine: Stop using m_c_BinOp for non-commutative ops (#172327 ) The previous flow tried both m_BinOp and m_c_BinOp for noncommutative ops. Seems to have worked out OK though, since there are no test changes.	2025-12-15 17:57:53 +01:00
Nikita Popov	015ab4e2e4	[Reassociate] Allow implicit truncation when converting adds to mul It's okay if the number of adds overflows. Explicitly allow implicit truncation.	2025-12-15 15:44:03 +01:00
Nikita Popov	42a47bf18a	[WPD] Avoid implicit truncation when creating full set Use the bit mask for the type instead of `~0`, so that we don't rely on implicit truncation of the top bits.	2025-12-15 15:44:03 +01:00
Nikita Popov	818c9138f9	[SimplifyCFG] Use getSigned() for signed value Base is a sized quantity derived via getSExtValue(), so we should use getSigned().	2025-12-15 15:44:03 +01:00
Bala_Bhuvan_Varma	0b2fe07e6b	[VectorCombine] Prevent redundant cost computation for repeated operand pairs in foldShuffleOfIntrinsics (#171965 ) This pr resolves [#170867](https://github.com/llvm/llvm-project/issues/170867) Existing code recomputes the cost for creating a shuffle instruction even for the repeating Intrinsic operand pairs. This will result in higher newCost. Hence the runtime will decide not to fold. The change proposed in this pr will address this issue. When calculating the newCost we are skipping the cost calculation of an operand pair if it was already considered. And when creating the transformed code, we are reusing the already created shuffle instruction for repeated operand pair.	2025-12-15 14:42:41 +00:00
int-zjt	72f3995363	[CodeExtractor] Optimize PHI incoming value removal using removeIncomingValueIf() (NFC) (#171956 )	2025-12-15 20:00:54 +08:00
int-zjt	c9c46a0820	[CloneFunction] Optimize PHI incoming value removal using reverse iteration (NFC) (#171955 )	2025-12-15 20:00:25 +08:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00
Florian Hahn	bcbbe2c2bc	[VPlan] Pass backedge value directly to FOR and reduction phis (NFC). Pass backedge values directly to VPFirstOrderRecurrencePHIRecipe and VPReductionPHIRecipe directly, as they must be provided and availbale. Split off from https://github.com/llvm/llvm-project/pull/168291.	2025-12-14 20:59:22 +00:00
Florian Hahn	53cf22f3a1	[VPlan] Simplify live-ins early using SCEV. (#155304 ) Use SCEV to simplify all live-ins during VPlan0 construction. This enables us to remove special SCEV queries when constructing VPWidenRecipes and improves results in some cases. This leads to simplifications in a number of cases in real-world applications (~250 files changed across LLVM, SPEC, ffmpeg) PR: https://github.com/llvm/llvm-project/pull/155304	2025-12-14 20:15:05 +00:00
int-zjt	fd95803a35	[LoopRotate] Simplify PHINode::removeIncomingValue usage (NFC) (#171958 )	2025-12-14 09:43:52 +08:00
Luke Lau	4ea8157773	Revert "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" This reverts commit fd5f53aa9b21060063484fc6c346316a34a6464c. It's triggering legacy cost model assertions reported in https://github.com/llvm/llvm-project/pull/171846#issuecomment-3647640019	2025-12-13 20:05:34 +08:00
Nicolai Hähnle	54ae1222ef	VectorCombine: Fold chains of shuffles fed by length-changing shuffles (#168819 ) Such chains can arise from folding insert/extract chains.	2025-12-12 13:53:03 -08:00
Florian Hahn	e6e3f94b5c	[VPlan] Re-add clarifying comment regarding part to extract. (NFC) Re-add and emphasize comment regarding extracting from the last part, as suggested post-commit in https://github.com/llvm/llvm-project/pull/171145.	2025-12-12 21:51:33 +00:00
Florian Hahn	333ee931df	[LV] Update stale comment after 4e05d702f02a. (NFC) Address post-commit suggestion, update stale comment after 4e05d702f.	2025-12-12 21:36:56 +00:00
Florian Hahn	0171e881b5	[VPlan] Strip stray whitespace when printing VPWidenIntOrFpInduction. printFlags takes care of inserting the needed spaces, remove unneeded extra stray whitespace	2025-12-12 21:28:50 +00:00
Alireza Torabian	9bc38df587	[LoopFusion] Simplifying the legality checks (#171889 ) Considering that the current loop fusion only supports adjacent loops, we are able to simplify the checks in this pass. By removing `isControlFlowEquivalent` check, this patch fixes multiple issues including #166560, #166535, #165031, #80301 and #168263. Now only the sequential/adjacent candidates are collected in the same list. This patch is the implementation of approach 2 discussed in post #171207.	2025-12-12 15:09:34 -05:00
Seraphimt	112a6126ef	Fixes non-functional changes found static analyzer (#171197 ) As per @arsenm 's instructions, I've separated the non-functional changes from https://github.com/llvm/llvm-project/pull/169958. Afterwards I'll tackle the functional ones one by one. I hope I did everything right this time. Full descriptions in the article: https://pvs-studio.com/en/blog/posts/cpp/1318/ 3. Array overrun is possible. The PVS-Studio warning: V557 Array overrun is possible. The value of 'regIdx' index could reach 31. VEAsmParser.cpp 696 10. Excessive check. The PVS-Studio warning: V547 Expression 'IsLeaf' is always false. PPCInstrInfo.cpp 419 11. Doubling the same check. The PVS-Studio warning: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 5820, 5823. PPCInstrInfo.cpp 5823 15. Excessive check. The PVS-Studio warning: V547 Expression 'i != e' is always true. MachineFunction.cpp 1444 17. Excessive assignment. The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 1995 18. Excessive check. The PVS-Studio warning: V547 Expression 'AllSame' is always true. SimplifyCFG.cpp 1914 19. Excessive check. The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true. LVDWARFReader.cpp 398	2025-12-12 20:03:02 +01:00
Craig Topper	ef21740781	[LoopPeel] Check for onlyAccessesInaccessibleMemory instead of llvm.assume in peelToTurnInvariantLoadsDereferenceable. (#171910 ) onlyAccessesInaccessibleMemory can't alias with a load. This allows us to ignore more intrinsics than llvm.assume. Follow up from #171547	2025-12-12 10:45:41 -08:00
Mircea Trofin	ff3dcd06a9	[GlobalOpt][profcheck] Mark as `unknown` the branch weights of global shrunk to boolean (#171530 )	2025-12-12 08:34:11 -08:00
Matt Arsenault	6e47d4ef45	Reapply "InstCombine: Fold ldexp with constant exponent to fmul" (#171895 ) (#171977 )	2025-12-12 12:55:55 +01:00
Nikita Popov	89c37fee25	[WPD] Use getSigned() for offset This offset is a signed int64_t which can take negative values.	2025-12-12 11:15:44 +01:00
Peter Collingbourne	b0d3405578	SROA: Recognize llvm.protected.field.ptr intrinsics. When an alloc slice's users include llvm.protected.field.ptr intrinsics and their discriminators are consistent, drop the intrinsics in order to avoid unnecessary pointer sign and auth operations. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/151650	2025-12-11 18:22:05 -08:00
Florian Hahn	65deac0872	[VPlan] Remove vector type checking in inferScalartType (NFC). inferScalarTypeForRecipe always infers a scalar type, so BaseTy must be a scalar type. Remove unneeded cast.	2025-12-11 22:10:31 +00:00
Florian Hahn	4e05d702f0	[LV] Always include middle block cost in isOutsideLoopWorkProfitable. (#171102 ) Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from https://github.com/llvm/llvm-project/pull/168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly. PR: https://github.com/llvm/llvm-project/pull/171102	2025-12-11 21:41:47 +00:00
Matt Arsenault	757c5b3bc7	Revert "InstCombine: Fold ldexp with constant exponent to fmul" (#171895 ) Reverts llvm/llvm-project#171731 Fails on a libc test	2025-12-11 21:12:59 +00:00
Teresa Johnson	75cd29b6d6	[MemProf] Add option to emit full call context for matched allocations (#170516 ) Add the -memprof-print-matched-alloc-stack option to enable emitting the full allocation call context (of stack ids) for each matched allocation reported by -memprof-print-match-info. Noop when the latter is not enabled.	2025-12-11 10:43:53 -08:00
Matt Arsenault	5eb2ec2179	InstCombine: Fold ldexp with constant exponent to fmul (#171731 ) If we can represent this with an fmul, prefer it as a canonical form. More optimizations will understand fmul, and allows contract to fma.	2025-12-11 19:20:45 +01:00

1 2 3 4 5 ...

41823 Commits