llvm-project

Author	SHA1	Message	Date
Luke Lau	7250b66240	[VPlan] Create AVL as a phi from TC -> 0 with EVL tail folding (#151481 ) This implements the first half of #151459, by changing the AVL so it's no longer computed as `trip-count - EVL-based IV`, but instead a separate scalar phi that is decremented by EVL each iteration. This shortens the dependency chain for computing the AVL and should eventually allow us to convert the branch condition to `branch-count avl-next, 0`. `simplifyBranchConditionForVFAndUF` had to be updated to prevent a regression because this introduces a VPPhi in the header block.	2025-08-01 11:00:05 +08:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	a85c725952	Revert "[Utils] Fix a warning" This reverts commit 3a18fe33f0763cd9276c99c276448412100f6270. So that we can revert PR #148758.	2025-07-31 15:54:01 -04:00
shuffle2	7b5a44c605	[hwasan] Add hwasan-all-globals option (#149621 ) hwasan-globals does not instrument globals with custom sections, because existing code may use `__start_`/`__stop_` symbols to iterate over globals in such a way which will cause hwasan assertions. Introduce new hwasan-all-globals option, which instruments all user-defined globals (but not those globals which are generated by the hwasan instrumentation itself), including those with custom sections. fixes #142442	2025-07-31 11:38:42 -07:00
Kazu Hirata	3a18fe33f0	[Utils] Fix a warning This patch fixes: llvm/lib/Transforms/Utils/LoopUtils.cpp:818:28: error: unused function 'operator<<' [-Werror,-Wunused-function]	2025-07-31 11:24:33 -07:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
David Green	8f968fe3ec	[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896 ) Similar to #150639 this fixes the AggressiveInstCombine fold for convert tables to cttz instructions if the gep types are not array types. i.e `gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.	2025-07-31 16:53:55 +01:00
Florian Hahn	99d70e09a9	[SCEV] Allow adds of constants in tryToReuseLCSSAPhi. (#150693 ) Update the logic added in https://github.com/llvm/llvm-project/pull/147824 to also allow adds of constants. There are a number of cases where this can help remove redundant phis and replace some computation with a ptrtoint (which likely is free in the backend). PR: https://github.com/llvm/llvm-project/pull/150693	2025-07-31 16:33:25 +01:00
Luke Lau	08c5944222	[VPlan] Fix header phi VPInstruction verification. NFC (#151472 ) Noticed this when checking the invariant that all phis in the header block must be header phis. I think there's a missing set of parentheses here, since otherwise it only cast<VPInstruction> when RecipeI isn't a VPInstruction.	2025-07-31 23:09:20 +08:00
Nikita Popov	a71909156e	[InstCombine] Set flags when canonicalizing GEP indices (#151516 ) When truncating set nsw/nuw based on nusw/nuw. When extending, use zext nneg if nusw+nuw. Proof: https://alive2.llvm.org/ce/z/JA2Yzr	2025-07-31 15:58:04 +02:00
LU-JOHN	a757f23404	[SimplifyCFG] Extend jump-threading to allow live local defs (#135079 ) Extend jump-threading to allow local defs that are live outside of the threaded block. Allow threading to destinations where the local defs are not live. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-07-31 09:44:14 -04:00
Samuel Tebbs	339b0a1d74	[LV][NFCI] Format fcc419b05f62	2025-07-31 14:37:59 +01:00
Samuel Tebbs	fcc419b05f	[LV][NFCI] Swap reduction recipe operand order https://github.com/llvm/llvm-project/pull/147026 will enable sub reductions, which require that the phi value is the first operand since they aren't commutative. This re-orders the operands when executing reductions, which actually matches other existing code in VPReductionRecipe::execute.	2025-07-31 14:35:10 +01:00
Nathan Gauër	67273393b1	[VectorCombine][TTI] Prevent extract/ins rewrite to GEP (#150216 ) Using GEP to index into a vector is not disallowed, but not recommended. The SPIR-V backend needs to generate structured access into types, which is impossible with an untyped GEP instruction unless we add more info to the IR. Finding a solution is a work-in-progress, but in the meantime, we'd like to reduce the amount of failures. Preventing this optimizations from rewritting extract/insert instructions into a GEP helps us lower more code to SPIR-V. This change should be OK as it's only active when targeting SPIR-V and disabling a non-recommended transformation. Related to #145002	2025-07-31 14:14:00 +02:00
Ramkumar Ramachandra	b7d00b827e	[VPlan] Uniformly use VPlanPatternMatch in transforms (NFC) (#151488 )	2025-07-31 12:01:40 +01:00
Ramkumar Ramachandra	20f6ec4b29	[VPlan] Make VPBuilder APIs uniformly take ArrayRef (NFC) (#151484 )	2025-07-31 11:33:04 +01:00
Nikita Popov	16d73839b1	[InstCombine] Support folding intrinsics into phis (#151115 ) Call foldOpIntoPhi() for speculatable intrinsics. We already do this for FoldOpIntoSelect(). Among other things, this partially subsumes https://github.com/llvm/llvm-project/pull/149858.	2025-07-31 12:32:37 +02:00
Mel Chen	6752415ce8	[VectorUtils] Simplify the code by new function InterleaveGroup::isFull. nfc (#151112 )	2025-07-31 16:02:53 +08:00
Benjamin Maxwell	cd16c706ba	[IRCE] Use function_ref<> instead of optional<function_ref<>> (NFC) (#151308 ) llvm::function_ref<> is nullable.	2025-07-31 07:56:05 +01:00
Peter Collingbourne	ff38981a58	LTO: Redesign the CFI !aliases metadata. With the current aliases metadata we lose information about which groups of aliases survive symbol resolution. This causes various problems such as #150075 where symbol resolution breaks the link between alias groups. In this redesign of the aliases metadata, we stop representing the individual aliases in !aliases. Instead, the individual aliases are represented in !cfi.functions in the same way as functions, and the alias groups (i.e. groups of symbols with the same address) are stored in !aliases. At symbol resolution time, we filter out all non-prevailing members of !aliases; the resulting set is used by LowerTypeTests to recreate the aliases. With this change it is now possible for a jump table entry to refer to an alias in one of the ThinLTO object files (e.g. if a function is non-prevailing but its alias is prevailing), so instead of deleting them, rename them with the ".cfi" suffix. Fixes #150070. Fixes #150075. Reviewers: teresajohnson, vitalybuka Reviewed By: vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/150690	2025-07-30 14:04:11 -07:00
zGoldthorpe	71d6762309	[InstCombine] Added pattern for recognising the construction of packed integers. (#147414 ) This patch extends the instruction combiner to simplify the construction of a packed scalar integer from a vector type, such as: ```llvm target datalayout = "e" define i32 @src(<4 x i8> %v) { %v.0 = extractelement <4 x i8> %v, i32 0 %z.0 = zext i8 %v.0 to i32 %v.1 = extractelement <4 x i8> %v, i32 1 %z.1 = zext i8 %v.1 to i32 %s.1 = shl i32 %z.1, 8 %x.1 = or i32 %z.0, %s.1 %v.2 = extractelement <4 x i8> %v, i32 2 %z.2 = zext i8 %v.2 to i32 %s.2 = shl i32 %z.2, 16 %x.2 = or i32 %x.1, %s.2 %v.3 = extractelement <4 x i8> %v, i32 3 %z.3 = zext i8 %v.3 to i32 %s.3 = shl i32 %z.3, 24 %x.3 = or i32 %x.2, %s.3 ret i32 %x.3 } ; =============== define i32 @tgt(<4 x i8> %v) { %x.3 = bitcast <4 x i8> %v to i32 ret i32 %x.3 } ``` Alive2 proofs (little-endian): [YKdMeg](https://alive2.llvm.org/ce/z/YKdMeg) Alive2 proofs (big-endian): [vU6iKc](https://alive2.llvm.org/ce/z/vU6iKc)	2025-07-30 10:58:49 -06:00
Nikita Popov	385fe30ee0	[InstCombine] Strip trailing zero GEP indices (#151338 ) Zero indices at the end do not change the GEP offset and can be removed. (Doing the same at the start requires adjusting the source element type.)	2025-07-30 17:55:00 +02:00
Nikita Popov	2672719a09	[InstCombine] Don't handle non-canonical index type in icmp of load fold (#151346 ) We should just bail out and wait for it to be canonicalized. The current implementation could emit a trunc without actually performing the transform.	2025-07-30 17:52:08 +02:00
Thurston Dang	56944e606a	[msan] Approximately handle AVX Galois Field Affine Transformation (#150794 ) e.g., <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8) <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8) <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8) Out A x b where A and x are packed matrices, b is a vector, Out = A * x + b in GF(2) Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix computation also includes a parity calculation. For the bitwise AND of bits V1 and V2, the exact shadow is: Out_Shadow = (V1_Shadow & V2_Shadow) \| (V1 & V2_Shadow) \| (V1_Shadow & V2) We approximate the shadow of gf2p8affine using: Out_Shadow = _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0) \| _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0) \| _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0) \| _mm512_set1_epi8(b_Shadow) This approximation has false negatives: if an intermediate dot-product contains an even number of 1's, the parity is 0. It has no false positives. Updates the test from https://github.com/llvm/llvm-project/pull/149258	2025-07-30 08:06:50 -07:00
Kazu Hirata	8f9b01884d	[Coroutines] Remove a redundant call to std::unique_ptr<T>::get (NFC) (#151284 )	2025-07-30 07:30:37 -07:00
Nikita Popov	8a09adc22a	[InstCombine] Split GEPs with multiple variable indices (#137297 ) Split GEPs that have more than one variable index into two. This is in preparation for the ptradd migration, which will not support multi-index GEPs. This also enables the split off part to be CSEd and LICMed.	2025-07-30 12:54:06 +02:00
Shih-Po Hung	cc8c941e17	[VPlan] Convert EVL loops to variable-length stepping after dissolution (#147222 ) Loop regions require fixed-length steps and rounded-up trip counts, but after dissolution creates explicit control flow, EVL loops can leverage variable-length stepping with original trip counts. This patch adds a post-dissolution transform pass to convert EVL loops from fixed-length to variable-length stepping .	2025-07-30 16:50:57 +08:00
Luke Lau	b663e563cc	[VPlan] Fix header masks in EVL tail folding (#150202 ) With EVL tail folding, the EVL may not always be VF on the second-to-last iteration. Recipes that have been converted to VP intrinsics via optimizeMaskToEVL account for this, but recipes that are left behind will still use the old header mask which may end up having a different vector length. This is effectively the same as #95368, and fixes this by converting header masks from icmp ule wide-canonical-iv, backedge-trip-count -> icmp ult step-vector, evl. Without it, recipes that fall through optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and #149981. We really need to split off optimizeMaskToEVL into VPlanTransforms::optimize and move transformRecipestoEVLRecipes into tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for correctness and what is needed to optimize away the mask computations. We should be able to still generate a correct albeit suboptimal VPlan without running optimizeMaskToEVL. I've added a TODO for this, which I think we can do after #148274 Fixes #150197	2025-07-30 11:31:04 +08:00
Florian Hahn	55f9eccee9	[LV] Revert back to use Loop::isLoopInvariant in isPredicatedInst. (#150828 ) This partially reverts https://github.com/llvm/llvm-project/pull/140744, restoring the original TheLoop->isLoopInvariant check instead the more powerful Legal->isInvariant, which uses SCEV. This causes a mis-compile, because SCEV can prove that the stored value is loop-invariant, which in turn converts the store to a uniform store. But in VPlan, we aren't yet able to determine that the stored value is loop-invariant, so we extract the last lane, which is incorrect, because it does not account for the mask of the store. Restoring the original code is a safe fix and avoids this subtle divergence. Fixes https://github.com/llvm/llvm-project/issues/149347. PR: https://github.com/llvm/llvm-project/pull/150828	2025-07-29 20:32:31 +01:00
Nikita Popov	1a974527bb	[NewGVN] Slightly clean up the predicate swap handling (NFC) I found the naming here confusing. This is not something generic for intrinsics, it's specifically about predicates, and serves to remember a previous swap choice.	2025-07-29 17:19:24 +02:00
Teresa Johnson	d4562a1991	[MemProf] Use DenseMap for call map (NFC) (#151161 ) There is no reason to use std::map for the call maps maintained for function clones during function clone assignment, as we don't iterate over them and don't need deterministic ordering, so use the more efficient DenseMap.	2025-07-29 08:18:31 -07:00
Nikita Popov	fa6965f722	[SCCP] Extract PredicateInfo handling into separate method (NFC)	2025-07-29 16:36:33 +02:00
Nikita Popov	74001beded	[DSE] Use MemoryLocation API to get lifetime.end size (NFC)	2025-07-29 15:46:49 +02:00
Paul Walker	3ede2decbe	[LLVM][LV] Improve UF calculation for vscale based scalar loops. (#146102 ) Update getSmallConstantTripCount() to return scalable ElementCount values that is used to acurrately determine the maximum value for UF, namely: TripCount / VF ==> X * VScale / Y * VScale ==> X / Y This improves the chances of being able to remove the scalar loop and also fixes an issue where a UF=2 is choosen for a scalar loop with exactly VF(= X * VScale) iterations.	2025-07-29 12:49:38 +01:00
Nikita Popov	ef51514c38	[FunctionAttrs] Don't bail out on unknown calls (#150958 ) When inferring attributes, we should not bail out early on unknown calls (such as virtual calls), as we may still have call-site attributes that can be used for inference. Fixes https://github.com/llvm/llvm-project/issues/150817.	2025-07-29 11:45:31 +02:00
David Sherwood	6fbc397964	[IR] Add new CreateVectorInterleave interface (#150931 ) This PR adds a new interface to IRBuilder called CreateVectorInterleave, which can be used to create vector.interleave intrinsics of factors 2-8. For convenience I have also moved getInterleaveIntrinsicID and getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where it can be used by IRBuilder.	2025-07-29 08:47:07 +01:00
Kazu Hirata	255bba0136	[memprof] Fix a warning This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9: error: non-void lambda does not return a value in all control paths [-Werror,-Wreturn-type]	2025-07-28 19:35:02 -07:00
Teresa Johnson	f3761ab340	Reapply "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) (#151055 ) This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying PR150735 with a fix for the unstable iteration order exposed by the new tests (PR151039).	2025-07-28 17:04:45 -07:00
Alex Voicu	6bcff9eb13	[HIPSTDPAR] Add handling for math builtins (#140158 ) When compiling in `--hipstdpar` mode, the builtins corresponding to the standard library might end up in code that is expected to execute on the accelerator (e.g. by using the `std::` prefixed functions from `<cmath>`). We do not have uniform handling for this in AMDGPU, and the errors that obtain are quite arcane. Furthermore, the user-space changes required to work around this tend to be rather intrusive. This patch adds an additional `--hipstdpar` specific pass which forwards to the run time component of HIPSTDPAR the intrinsics / libcalls which result from the use of the math builtins, and which are not properly handled. In the long run we will want to stop relying on this and handle things in the compiler, but it is going to be a rather lengthy journey, which makes this medium term escape hatch necessary. The paired change in the run time component is here <https://github.com/ROCm/rocThrust/pull/551>.	2025-07-28 22:29:31 +01:00
Teresa Johnson	ced3b90738	[MemProf] Change map to vector to avoid unstable iteration (#151039 ) We iterate over a std::map indexed by FuncInfo, which is a pair of a pointer and a clone number. In the ThinLTO case, this isn't an issue as the function pointer always points to the same FunctionSummary object. However, for regular LTO, this is a pointer to a Function object, which is different for each clone. This will lead to unstable iteration order. This was exposed in a test case added for PR150735, which added a new instance of iteration over this map. Since these function clones are added and numbered sequentially, change this to a vector indexed by clone number, which points to a structure containing the clone FuncInfo and the call map (the old map's key and value, respectively).	2025-07-28 14:20:49 -07:00
Florian Hahn	c93d166c58	[VPlan] Simplify (MUL %x, 0) -> 0. Simplify trivial multiplies. https://alive2.llvm.org/ce/z/DabRkA	2025-07-28 21:50:57 +01:00
Ellis Hoag	819f020b28	Use F.hasOptSize() instead of checking optsize directly (#147348 )	2025-07-28 08:38:52 -07:00
Florian Hahn	f9f68af4b8	[SCEV] Make sure LCSSA is preserved when re-using phi if needed. If we insert a new add instruction, it may introduce a new use outside the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to fix LCSSA form, if needed. This fixes a crash reported in https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.	2025-07-28 16:24:46 +01:00
Luke Lau	92d09245d6	[VPlan] Fall back to scalar epilogue if possible when EVL isn't legal (#150908 ) When enabling predicated vectorization by default on RISC-V, there's a bunch of performance regressions on llvm-test-suite's LoopInterleaving microbenchmarks: https://lnt.lukelau.me/db_default/v4/nts/788?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=791&baseline=730&submit=Update Most of these regressions stem from the interleave_count pragma, which causes EVL tail folding interleaving to be unsupported (since we don't support unrolling with EVL) Currently if DataWithEVL isn't legal we fall back to DataWithoutLaneMask as the tail folding style, but this is very slow on RISC-V. The order of performance roughly is something like: DataWithEVL > None (scalar-epilogue) > Data[WithoutLaneMask] So this patch tries to prevent the regressions by falling back to a scalar epilogue where possible, i.e. the existing vectorization we have today. Not we may still need to fall back to DataWithoutLaneMask, e.g. if the trip count is low etc or it's forced by -prefer-predicate-over-epilogue=predicate-dont-vectorize.	2025-07-28 20:10:36 +08:00
Florian Hahn	2f2df751d4	[LV] Use SCEV::getElementCount in selectEpilogueVectorizationFactor. (#150018 ) Follow-up to https://github.com/llvm/llvm-project/pull/149789 to use getElementCount to compute the remaining iterations in selectEpilogueVectrizationFactor. PR: https://github.com/llvm/llvm-project/pull/150018	2025-07-28 12:12:27 +01:00
Adar Dagan	1afb42bc10	[InstCombine] Let shrinkSplatShuffle act on vectors of different lengths (#148593 ) shrinkSplatShuffle in InstCombine would only move truncs up through shuffles if those shuffles inputs had the exact same type as their output, this PR weakens this constraint to only requiring that the scalar type of the input and output match.	2025-07-28 13:00:37 +02:00
Madhur Amilkanthwar	90de4a4ac9	[LoopFusion] Fix sink instructions (#147501 ) If we have instructions in second loop's preheader which can be sunk, we should also be adjusting PHI nodes to receive values from the fused loop's latch block. Fixes #128600	2025-07-28 12:08:43 +05:30
Teresa Johnson	314e22bcab	Revert "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) Reverts llvm/llvm-project#150735 due to bot failures that I need to investigate	2025-07-27 15:55:22 -07:00
Teresa Johnson	0f2484a740	[MemProf] Ensure all callsite clones are assigned a function clone (#150735 ) Fix a bug in function assignment where we were not assigning all callsite clones to a function clone. This led to incorrect call updates because multiple callsite clones could look like they were assigned to the same function clone. Add in a stat and debug message to help identify and debug cases where this is still happening.	2025-07-27 11:48:30 -07:00
Florian Hahn	f8b1c7333f	[VPlan] Add getContext helper to VPlan (NFC).	2025-07-27 18:53:53 +01:00

1 2 3 4 5 ...

40664 Commits