llvm-project

Author	SHA1	Message	Date
Mircea Trofin	52cb6e9d49	[ProfCheck][NFC] Make Function argument from branch weight setter optional (#166032 ) This picks up from #166028, making the `Function` argument optional: most cases don't need to provide it, but in e.g. InstCombine's case, where the instruction (select, branch) is not attached to a function yet, the function needs to be passed explicitly. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-11-05 07:40:37 -08:00
Ramkumar Ramachandra	1de55c9693	[VPlan] Avoid sinking allocas in sinkScalarOperands (#166135 ) Use cannotHoistOrSinkRecipe to forbid sinking allocas.	2025-11-05 13:06:24 +00:00
Gábor Spaits	628d53aba5	[InstCombine] Enable FoldOpIntoSelect and foldOpIntoPhi when the Op's other parameter is non-const (#166102 ) This patch enables `FoldOpIntoSelect` and `foldOpIntoPhi` for the cases when Op's second parameter is a non-constant. It doesn't seem to bring significant improvements, but the compile time impact is neglegable.	2025-11-05 10:04:32 +01:00
Thurston Dang	cdf52a1325	[msan][NFCI] Generalize handleVectorPmaddIntrinsic() (#166282 ) This generalizes `handleVectorPmaddIntrinsic()`: - potentially handle floating-point type intrinsics (e.g., `llvm.x86.avx512bf16.dpbf16ps.512`). This usage is not enabled yet. - "multiplication with an initialized zero guarantees that the corresponding output becomes initialized" is now gated by a parameter	2025-11-04 19:52:25 -08:00
Mircea Trofin	1458d313a1	[SLU][profcheck] Propagate profile for branches on injected conditions. (#164476 ) This patch addresses the profile of 2 branches: - one that compares the 2 limits, for which we have no information (the C1, C2, see https://reviews.llvm.org/D136233) - one that is conditioned on a condition for which we have a profile, so we reuse it Issue #147390	2025-11-04 17:23:55 -08:00
Mircea Trofin	d547931137	[SLU][profcheck] create likely branch weights for guard->branch (#164271 ) The `llvm.experimental.guard` intrinsic is a `call`, so its metadata - if present - would be one value (as per `Verifier::visitProfMetadata`). That wouldn't be a correct `branch_weights` metadata. Likely, `GI->getMetadata(LLVMContext::MD_prof)` was always `nullptr`. We can bias away from deopt instead. Issue #147390	2025-11-04 16:39:12 -08:00
Alireza Torabian	025e431e74	[LoopFusion] Forget loop and block dispositions after latch merge (#166233 ) Merging the latches of loops may affect the dispositions, so they should be forgotten after the merge. This patch fixed the crash in loop fusion [#164082](https://github.com/llvm/llvm-project/issues/164082).	2025-11-04 16:48:39 -05:00
Florian Hahn	290ff955f0	[VPlan] Verify incoming values of VPIRPhi matches before checking (NFC) Update the verifier to first check if the number of incoming values matches the number of predecessors, before using incoming_values_and_blocks. We unfortunately need also check here, as this may be called before verifyPhiRecipes runs. Also update the verifier unit tests, to actually fail for the expected recipes.	2025-11-04 18:34:14 +00:00
Matt Arsenault	fb21f16fe6	RuntimeLibcalls: Add stub API for getting function signatures (#166290 ) Eventually this should be generated by tablegen for all functions. For now add a manually implementation for sincos_stret, which I have an immediate use for. This will allow pulling repeated code across targets into shared call sequence code. Also add sqrt just to make sure we can handle adding return attributes on the declaration.	2025-11-04 10:06:29 -08:00
Joel E. Denny	1aa86ca521	[LoopUnroll] Fix division by zero (#166258 ) PR #159163's probability computation for epilogue loops does not handle the possibility of an original loop probability of one. Runtime loop unrolling does not make sense for such an infinite loop, and a division by zero results. This patch works around that case. Issue #165998.	2025-11-04 12:49:33 -05:00
Yingwei Zheng	4ce58833d3	[SimplifyCFG] Fix value enumeration of a full range (#166379 ) ConstantRange uses `[-1, -1)` as the canonical form of a full set. Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't work for full ranges. This patch fixes the value enumeration in `ConstantComparesGatherer` to prevent missing values for full sets. Closes https://github.com/llvm/llvm-project/issues/166369.	2025-11-05 01:43:05 +08:00
Ivan Kelarev	37825ad4f6	[LoopUnroll] Prevent LoopFullUnrollPass from performing partial unrolling when trip counts are unknown (#165013 ) Currently, `LoopFullUnrollPass` incorrectly performs partial unrolling when `#pragma unroll` is specified and both `TripCount` and `MaxTripCount` are unknown. This patch adds a check to prevent partial unrolling when `OnlyFullUnroll` parameter is true and both trip count values are zero.	2025-11-04 09:20:01 -08:00
Florian Hahn	af9a4263a1	[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445 ) Update isNoWrap to only use the inbounds/nusw flags from GEPs that are guaranteed to be dereferenced on every iteration. This fixes a case where we incorrectly determine no dependence. I think the issue is isolated to code that evaluates the resulting AddRec at BTC, just using it to compute the distance between accesses should still be fine; if the access does not execute in a given iteration, there's no dependence in that iteration. But isolating the code is not straight-forward, so be conservative for now. The practical impact should be very minor (only one loop changed across a corpus with 27k modules from large C/C++ workloads. Fixes https://github.com/llvm/llvm-project/issues/160912. PR: https://github.com/llvm/llvm-project/pull/161445	2025-11-04 17:08:12 +00:00
Yingwei Zheng	8a84b285f6	[SimplifyCFG] Eliminate dead edges of switches according to the domain of conditions (#165748 ) In simplifycfg/cvp/sccp, we eliminate dead edges of switches according to the knownbits/range info of conditions. However, these approximations may not meet the real-world needs when the domain of condition values is sparse. For example, if the condition can only be either -3 or 3, we cannot prove that the condition never evaluates to 1 (knownbits: ???????1, range: [-3, 4)). This patch adds a helper function `collectPossibleValues` to enumerate all the possible values of V. To fix the motivating issue, `eliminateDeadSwitchCases` will use the result to remove dead edges. Note: In https://discourse.llvm.org/t/missed-optimization-due-to-overflow-check/88700 I proposed a new value lattice kind to represent such values. But I find it hard to apply because the transition becomes much complicated. Compile-time impact looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=32d6b2139a6c8f79e074e8c6cfe0cc9e79c4c0c8&to=e47c26e3f1bf9eb062684dda4fafce58438e994b&stat=instructions:u This patch removes many dead error-handling codes: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3012 Closes https://github.com/llvm/llvm-project/issues/165179.	2025-11-04 20:55:33 +08:00
Julian Nagele	28a20b4af9	[VectorCombine] Avoid inserting freeze when scalarizing extend-extract if all extracts would lead to UB on poison. (#164683 ) This change aims to avoid inserting a freeze instruction between the load and bitcast when scalarizing extend-extract. This is particularly useful in combination with https://github.com/llvm/llvm-project/pull/164682, which can then potentially further scalarize, provided there is no freeze. alive2 proof: https://alive2.llvm.org/ce/z/W-GD88	2025-11-04 12:39:04 +00:00
Ramkumar Ramachandra	0a95a86634	[VPlan] Fix first-lane comment in sinkScalarOperands (NFC) (#166347 ) To follow-up on a post-commit review.	2025-11-04 12:02:58 +00:00
Jay Foad	f037f41350	[IR] Add new function attribute nocreateundeforpoison (#164809 ) Also add a corresponding intrinsic property that can be used to mark intrinsics that do not introduce poison, for example simple arithmetic intrinsics that propagate poison just like a simple arithmetic instruction. As a smoke test this patch adds the new property to llvm.amdgcn.fmul.legacy.	2025-11-04 12:00:44 +00:00
kper	5b2f9b53bd	[SimplifyCFG]: Switch on umin replaces default (#164097 ) A switch on `umin` can eliminate the default case by making the `umin`'s constant the default case. Proof: https://alive2.llvm.org/ce/z/_N6nfs Fixes: https://github.com/llvm/llvm-project/issues/162111	2025-11-04 18:35:40 +08:00
Ramkumar Ramachandra	0cae0af520	[VPlan] Shorten insert-idiom in sinkScalarOperands (NFC) (#166343 ) To follow-up on a post-commit review.	2025-11-04 10:04:57 +00:00
Shoreshen	00ee53cc7b	[Attributor] Propagate alignment through ptrmask (#150158 ) Propagate alignment through ptrmask based on potential constant values of mask and align of ptr. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-11-04 12:26:17 +08:00
Mircea Trofin	01221874e4	[SLU][profcheck] Use the original branch weigths in `buildPartialInvariantUnswitchConditionalBranch` (#164270 ) A new branch is created on the same condition as a branch for which we have a profile. We can reuse that profile in this case. Issue #147390	2025-11-03 14:37:41 -08:00
Laxman Sole	6fe3eccdf4	[llvm][DebugInfo] Emit 0/1 for constant boolean values (#151225 ) Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats booleans as unsigned values, so this resulted in the attribute `DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the debugger would display the value as `255` instead of `true`. This change modifies the behavior to use zero-extension for 1-bit values instead, ensuring that `true` is represented as 1. Consequently, the DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the debugger to correctly display the boolean as `true`.	2025-11-03 13:34:44 -08:00
Robert Imschweiler	a8ea7f4580	Reapply: [AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161 ) (#166195 ) Reapply #152161 with fixed 'changed' flags.	2025-11-03 20:59:48 +01:00
Alan Zhao	c4763e2b90	[profcheck][InstCombine] Preserve branch weights in logical identities (#165810 ) For the simplification ``` (C && A) \|\| (!C && B) --> sel C, A, B ``` (and related), if `C` (or (`!C`)) is the condition in the select instruction representing the logical and, we can preserve that logical and's branch weights when emitting the new instruction. Otherwise, the profile data is unknown. If `C` is the condition of both logical ands, then we just take the branch weights of the first logical and (though in practice they should be equal.) Furthermore, `select-safe-transforms.ii` now passes under the profcheck configuration, so we remove it from the failing tests. Tracking issue: #147390	2025-11-03 09:32:42 -08:00
Robert Imschweiler	af68efc9c4	Revert "[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm" (#166186 ) Reverts llvm/llvm-project#152161 Need to revert to fix changed logic for the expensive checks.	2025-11-03 16:33:20 +00:00
Florian Hahn	ce925820d8	[VPlan] Use operands() driectly in VPInstruction::clone() (NFC). There's no need to create temporary SmallVectors.	2025-11-03 16:28:27 +00:00
Alexey Bataev	7d5659083c	[SLP]Do not create copyable node, if parent node is non-schedulable and has a use in binop. If the parent node is non-schedulable (only externally used instructions), and at least one instruction has multiple uses and used in the binop, such copyable node should be created. Otherwise, it may contain wrong def-use chain model, which cannot be effective detected. Fixes #166035	2025-11-03 08:00:22 -08:00
Joel E. Denny	bb9bd5f263	[LoopUnroll] Fix assert fail on zeroed branch weights (#165938 ) BranchProbability fails an assert when its denominator is zero. Reported at <https://github.com/llvm/llvm-project/pull/159163#pullrequestreview-3406318423>.	2025-11-03 10:19:12 -05:00
Robert Imschweiler	332f9b5eee	[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161 ) Finishes adding inline-asm callbr support for AMDGPU, started by https://github.com/llvm/llvm-project/pull/149308.	2025-11-03 16:09:12 +01:00
Hassnaa Hamdi	8998df2097	[DropUnnecessaryAssumes] Don't drop public_type_test intrinsic (#166034 ) Don't drop `assume` intrinsic when it's using `public_type_test ` intrinsic, as it could be used by devirtualization.	2025-11-03 10:34:44 +00:00
Mel Chen	40a042e49c	[VPlanTransform] Specialize simplifyRecipe for VPSingleDefRecipe pointer. nfc (#165568 ) The function simplifyRecipe now takes a VPSingleDefRecipe pointer since it only simplifies single-def recipes for now.	2025-11-03 09:00:54 +00:00
Luke Lau	97d4e96cc5	[VPlan] Perform optimizeMaskToEVL in terms of pattern matching (#155394 ) Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask. So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in #113667 This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it is exactly masked with the header mask. After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change. This fixes #152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in #153144 It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make #146525 easier to implement and reason about.	2025-11-03 16:53:18 +08:00
Ramkumar Ramachandra	912cc5f098	[VPlan] Improve getOrCreateVPValueForSCEVExpr (NFC) (#165699 ) Use early exit in getOrCreateVPValueForSCEVExpr.	2025-11-03 06:44:30 +00:00
Ramkumar Ramachandra	03eb3cdaaa	[VPlan] Rewrite sinkScalarOperands (NFC) (#151696 ) Rewrite sinkScalarOperands in VPlanTransforms for clarity, in preparation for follow-up work to extend it to handle more recipes.	2025-11-03 06:43:42 +00:00
Kazu Hirata	902b0bd04a	[llvm] Remove "const" in the presence of "constexpr" (NFC) (#166109 ) "const" is extraneous in the presence of "constexpr" for simple variables and arrays.	2025-11-02 15:52:44 -08:00
Wenju He	79bf8c0331	[InstCombine] Fold select(X >s 0, 0, -X) \| smax(X, 0) to abs(X) (#165200 ) The IR pattern is compiled from OpenCL code: __builtin_astype(x > (uchar2)(0) ? x : -x, uchar2); where smax is created by foldSelectInstWithICmp + canonicalizeSPF. smax could also come from direct elementwise max call: int c = b > (int)(0) ? (int)(0) : -b; int d = __builtin_elementwise_max(b, (int)(0)); *a = c \| d; https://alive2.llvm.org/ce/z/2-brvr https://alive2.llvm.org/ce/z/Dowjzk https://alive2.llvm.org/ce/z/kathwZ --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-11-03 07:38:21 +08:00
Kazu Hirata	707bab651f	[llvm] Remove redundant typename (NFC) (#166087 ) Identified with readability-redundant-typename.	2025-11-02 13:15:16 -08:00
Florian Hahn	1c727baf69	[VPlan] Mark BranchOnCount and BranchOnCond as having side effects (NFC) BranchOnCount and BranchOnCond do not read memory, but cannot be moved. Mark them as having side-effects, but not reading/writing memory, which more accurately models that above. This allows removing some special checking for branches both in the current code and future patches.	2025-11-02 21:14:37 +00:00
Kazu Hirata	c9ef3d8eb8	[Transforms] Use "= default" (NFC) (#166043 ) Identified with modernize-use-equals-default.	2025-11-02 08:59:24 -08:00
Nathan Corbyn	138e0ff87c	[Matrix] (NFC) Refactor sharing of shape information (#164774 )	2025-11-02 09:44:34 +00:00
Florian Hahn	b7e922a3da	[VPlan] Convert BuildVector with all-equal values to Broadcast. (#165826 ) Fold BuildVector where all operands are equal to Broadcast of the first operand. This will subsequently make it easier to remove additional buildvectors/broadcasts, e.g. via https://github.com/llvm/llvm-project/pull/165506. PR: https://github.com/llvm/llvm-project/pull/165826	2025-11-01 17:28:42 -07:00
Florian Hahn	f773efcffb	[VPlan] Add VPIRMetadata parameter to VPInstruction constructor. (NFC) Update VPInstruction constructor to accept VPIRMetadata between the Flags and DebugLoc parameters. This allows metadata to be passed during construction rather than assigned afterward.	2025-11-01 21:57:52 +00:00
Florian Hahn	6e83937f39	[VPlan] Add getConstantInt helpers for constant int creation (NFC). Add getConstantInt helper methods to VPlan to simplify the common pattern of creating constant integer live-ins. Suggested as follow-up in https://github.com/llvm/llvm-project/pull/164127.	2025-11-01 04:13:01 +00:00
Florian Hahn	a943132761	[VPlan] Add VPRegionBlock::getCanonicalIVType (NFC). (#164127 ) Split off from https://github.com/llvm/llvm-project/pull/156262. Similar to VPRegionBlock::getCanonicalIV, add helper to get the type of the canonical IV, in preparation for removing VPCanonicalIVPHIRecipe. PR: https://github.com/llvm/llvm-project/pull/164127	2025-10-31 20:05:02 -07:00
Mircea Trofin	6adef40e75	[SimplifyCFG] Don't propagate weights to unconditional branches in `turnSwitchRangeIntoICmp` (#165931 ) PR #161000 introduced a bug whereby the IR would become invalid by having an unconditional branch have `!prof`attached to it. This only became evident in PR #165744, because the IR of `test/Transforms/SimplifyCFG/pr165301.ll`was simple enough to both (1) introduce the unconditional branch, and (2) survive in that fashion until the end of the pass (simplifycfg) and thus trip the verifier.	2025-10-31 23:10:59 +00:00
Alexey Bataev	964c7711f4	[SLP]Fix the minbitwidth analysis for slternate opcodes If the laternate operation is more stricter than the main operation, we cannot rely on the analysis of the main operation. In such case, better to avoid doing the analysis at all, since it may affect the overall result and lead to incorrect optimization Fixes #165878	2025-10-31 15:25:13 -07:00
Mircea Trofin	fe8ab75b40	[SimplifyCFG] Propagate profile in `simplifySwitchOfPowersOfTwo` (#165804 ) `simplifySwitchOfPowersOfTwo` converts (when applicable, see `00f5a1e30b`) a switch to a conditional branch. Its false case goes to the `default` target of the former switch, and the true case goes to a BB performing a `cttz`. We can calculate the branch weights from the branch weights of the old switch. Issue #147390	2025-10-31 13:05:18 -07:00
gbMattN	7a957bd2c8	[TySan] Add option to outline instrumentation (#120582 ) Added a command line option to use function calls rather than inline checks for TySan instrumentation.	2025-10-31 16:51:55 +00:00
Joel E. Denny	cc8ff73fba	[LoopUnroll] Fix block frequencies for epilogue (#159163 ) As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch fixes branch weights in tests that PR #157754 adversely affected.	2025-10-31 11:01:42 -04:00
Joel E. Denny	24557cce40	[LoopUnroll] Fix block frequencies when no runtime (#157754 ) This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a remainder loop, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a remainder loop and complete unrolling, and there are two associated tests whose branch weights this patch adversely affects. They will be addressed in future patches that should land with this patch.	2025-10-31 10:44:27 -04:00

1 2 3 4 5 ...

41460 Commits