llvm-project

Author	SHA1	Message	Date
Vitaly Buka	fc201d6133	Revert "[InstCombine] Support gep nuw in icmp folds" (#118698 ) Reverts llvm/llvm-project#118472 Breaks profile tests on i386 https://lab.llvm.org/buildbot/#/builders/66/builds/7009	2024-12-04 15:07:27 -08:00
Kazu Hirata	1b95e76d8f	[Instrumentation] Fix a warning This patch fixes: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3840:14: error: unused variable 'NumArgOperands' [-Werror,-Wunused-variable]	2024-12-04 08:31:40 -08:00
Alexander Shaposhnikov	95e44d3670	[msan] Add handling for sse41_round_pd/sse41_round_ps (#118441 ) Add handling for sse41_round_pd/sse41_round_ps similarly to maybeHandleSimpleNomemIntrinsic. Test plan: ninja check-all	2024-12-04 08:27:08 -08:00
Nikita Popov	66ed8fb973	[InstCombine] Fix use after free Make sure we only access cached nowrap flags.	2024-12-04 17:20:04 +01:00
Nikita Popov	4a7abfe0a7	[InstCombine] Preserve nuw in OptimizePointerDifference If both the geps and the subs are nuw the new sub is also nuw. Proof: https://alive2.llvm.org/ce/z/mM8UvF	2024-12-04 16:58:35 +01:00
Nikita Popov	a608607fd7	[ConstraintElim] Add support for decomposing gep nuw (#118639 ) ConstraintElimination currently only supports decomposing gep nusw with non-negative indices (with "non-negative" possibly being enforced via pre-condition). Add support for gep nuw, which directly gives us the necessary guarantees for the decomposition.	2024-12-04 16:27:31 +01:00
Florian Hahn	7b6e0d9fc3	[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282 ) ValueMap automatically updates entries with the new value if they have been RAUW. This can lead to instructions that are expected to not have shape info to be added to the map (e.g. shufflevector as in the added test case). This leads to incorrect results. Originally it was used for transpose optimizations, but they now all use updateShapeAndReplaceAllUsesWith, which takes care of updating the shape info as needed. This fixes a crash in the newly added test cases. PR: https://github.com/llvm/llvm-project/pull/118282	2024-12-04 14:51:31 +00:00
Jan Ječmen	78db4e9f7b	[NFC][IRCE] Don't require LoopStructure to determine IRCE profitability (#116384 ) This refactoring hoists the profitability check earlier in the pipeline, so that for loops that are not profitable to transform there is no iteration over the basic blocks or LoopStructure computation. Motivated by PR #104659 that tweaks how the profitability of individual branches is evaluated.	2024-12-04 11:09:19 +01:00
Antonio Frighetto	f68b0e3699	[AggressiveInstCombine] Use APInt and avoid truncation when folding loads A miscompilation issue has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/118467.	2024-12-04 10:20:14 +01:00
ronryvchin	ff281f7d37	[PGO] Add option to always instrumenting loop entries (#116789 ) This patch extends the PGO infrastructure with an option to prefer the instrumentation of loop entry blocks. This option is a generalization of `19fb5b467b`, and helps to cover cases where the loop exit is never executed. An example where this can occur are event handling loops. Note that change does NOT change the default behavior.	2024-12-04 07:56:46 +01:00
Owen Anderson	14a259f85b	GlobalOpt: Use the correct address space when creating a "*.init" global. (#118562 )	2024-12-04 14:01:16 +13:00
k-kashapov	f2fa9ac616	[nfc][MSan] Change for-loop to ArgNo instead of drop_begin (#117553 ) As discussed in https://github.com/llvm/llvm-project/pull/109284#discussion_r1838830571 Changed for loop to use `ArgNo` instead of `drop_begin` to keep loop code consistent with other helpers. Co-authored-by: Kamil Kashapov <kashapov@ispras.ru>	2024-12-03 14:32:54 -08:00
Teresa Johnson	d6cd214dd6	[ThinLTO][LowerTypeTests] Don't compute address taken set unless CFI (NFC) (#118508 ) The AddressTaken set used for CFI with regular LTO was being computed on the ExportSummary regardless of whether any CFI metadata existed. In the case of ThinLTO, the ExportSummary is the global summary index for the target, and the lack of guard in this code meant this was being computed on the ThinLTO index even when there was an empty regular LTO module, since the backend is called on the combined module to generate the expected output file (normally this is trivial as there is no IR). Move the computation of the AddressTaken set into the condition checking for CFI to avoid this overhead. This change resulted in a 20% speedup in the thin link of a large target. It looks like the outer loop has existed here for several years, but likely became a larger overhead after the inner loop was added very recently in PR113987. I will send a separate patch to refactor the ThinLTO backend handling to avoid invoking the opt pipeline if the module is empty, in case there are other summary-based analyses in some of the passes now or in the future. This change is still desireable as by default regular LTO modules contain summaries, or we can have split thin and regular LTO modules, and if they don't involve CFI these would still unnecessarily compute the AddressTaken set.	2024-12-03 12:14:16 -08:00
Nikita Popov	10223c72a9	[ConstraintElim] Use nusw flag for GEP decomposition Check for nusw instead of inbounds when decomposing GEPs. In this particular case, we can also look through multiple nusw flags, because we will ultimately be working in the unsigned constraint system.	2024-12-03 15:56:29 +01:00
Florian Hahn	a7fda0e1e4	[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305 ) Introduce a general recipe to generate a scalar phi. Lower VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe before plan execution, avoiding the need for duplicated ::execute implementations. There are other cases that could benefit, including in-loop reduction phis and pointer induction phis. Builds on a similar idea as https://github.com/llvm/llvm-project/pull/82270. PR: https://github.com/llvm/llvm-project/pull/114305	2024-12-03 14:53:51 +00:00
Ramkumar Ramachandra	2a0ee090db	IVDesc: strip redundant arg in getOpcode call (NFC) (#118476 )	2024-12-03 13:40:51 +00:00
Ramkumar Ramachandra	51a895aded	IR: introduce struct with CmpInst::Predicate and samesign (#116867 ) Introduce llvm::CmpPredicate, an abstraction over a floating-point predicate, and a pack of an integer predicate with samesign information, in order to ease extending large portions of the codebase that take a CmpInst::Predicate to respect the samesign flag. We have chosen to demonstrate the utility of this new abstraction by migrating parts of ValueTracking, InstructionSimplify, and InstCombine from CmpInst::Predicate to llvm::CmpPredicate. There should be no functional changes, as we don't perform any extra optimizations with samesign in this patch, or use CmpPredicate::getMatching. The design approach taken by this patch allows for unaudited callers of APIs that take a llvm::CmpPredicate to silently drop the samesign information; it does not pose a correctness issue, and allows us to migrate the codebase piece-wise.	2024-12-03 13:31:04 +00:00
Nikita Popov	f33536468b	[InstCombine] Support gep nuw in icmp folds (#118472 ) Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned icmp of gep nusw nuw folds to unsigned samesign icmp of offsets. Proofs: https://alive2.llvm.org/ce/z/VEwQY8	2024-12-03 14:28:56 +01:00
Nikita Popov	bdc6faf775	[InstCombine] Support nusw in icmp of two geps with same base Proof: https://alive2.llvm.org/ce/z/BYNQ7s	2024-12-03 11:51:14 +01:00
Nikita Popov	9c5a84b394	[InstCombine] Support nusw in icmp of gep with base Proof: https://alive2.llvm.org/ce/z/omnQXt	2024-12-03 11:51:14 +01:00
Antonio Frighetto	1d6ab189be	[MemCpyOpt] Drop dead `memmove` calls on `memset`'d source data When a memmove happens to clobber source data, and such data have been previously memset'd, the memmove may be redundant.	2024-12-03 09:50:57 +01:00
Yingwei Zheng	c1ad064dd3	[InstCombine] Fold `icmp spred (and X, highmask), C1` into `icmp spred X, C2` (#118197 ) Alive2: https://alive2.llvm.org/ce/z/Ffg64g Closes https://github.com/llvm/llvm-project/issues/104772.	2024-12-03 16:19:12 +08:00
Rajat Bajpai	de415fbb45	[InstCombine][FP] Fix nnan preservation for transform fcmp + sel => fmax/fmin (#117977 ) Preserve `nnan` constraint only if present on both `fcmp` and `select`. Alive2: https://alive2.llvm.org/ce/z/ZNDjzt	2024-12-03 14:01:36 +08:00
Yingwei Zheng	295d6b18f7	[InstCombine] Fold `(X * (Y << K)) u>> K -> X * Y` when highbits are not demanded (#111151 ) Alive2: https://alive2.llvm.org/ce/z/Z7QgjH	2024-12-03 12:04:04 +08:00
Han-Kuan Chen	f71ea4bc1b	[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260 )	2024-12-03 09:04:04 +08:00
Mingming Liu	6faf17b762	[ThinLTO]Supports declaration import for global variables in distributed ThinLTO (#117616 ) When `-import-declaration` option is enabled, declaration import is supported for functions. https://github.com/llvm/llvm-project/pull/88024 has the context for this option. This patch supports declaration import for global variables in distributed ThinLTO. The motivating use case is to propagate `dso_local` attribute of global variables across modules, to optimize global variable access when a binary is built with `-fno-direct-access-external-data`. * With `-fdirect-access-external-data`, non thread-local global variables will [have `dso_local` attributes](`fe3c23b439/clang/lib/CodeGen/CodeGenModule.cpp (L1730-L1746)`). This optimizes the global variable access as shown by https://gcc.godbolt.org/z/vMzWcKdh3	2024-12-02 16:15:52 -08:00
Florian Hahn	4226e0a0c7	[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316 ) Add an extra know to UnrollingPreferences to let backends control the maximum budget for SCEV expansions. This gives backends more fine-grained control on the cost of the runtime checks for runtime unrolling. PR: https://github.com/llvm/llvm-project/pull/118316	2024-12-02 21:35:00 +00:00
Florian Hahn	f8ce2e4bb3	[Matrix] Only retrieve analyses if there are any matrix intrinsics (NFC) Only request analyses if there are any matrix intrinics to avoid computing them if there are no matrix intrinsics.	2024-12-02 11:22:24 +00:00
Nikita Popov	7bbc049688	[InstCombine] Consolidate another fold into select value equivalence (#117746 ) We had a separate fold that handled just the trivial case where we're replacing exactly the argument of the select. Handle this in select value equivalence by relaxing the infinite loop protection to allow a replacement of a non-constant with a constant. This also fixes https://github.com/llvm/llvm-project/issues/113301, as the separate fold did not handle undef values correctly.	2024-12-02 09:45:39 +01:00
Veera	979a0356d4	[InstCombine] Fold `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp C1` (#116888 ) Fixes #82414. General Proof: https://alive2.llvm.org/ce/z/ERjNs4 Proof for Tests: https://alive2.llvm.org/ce/z/K-934G This PR transforms `select` instructions of the form `select (Cmp X C1) (BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`. This helps in eliminating a noop loop in https://github.com/rust-lang/rust/issues/123845 but does not improve optimizations.	2024-12-02 09:33:45 +01:00
Florian Hahn	77767986ed	[LV] Use IsaPred in a few more places (NFC). Simplifies the code slightly by removing explicit lambdas.	2024-12-01 18:47:53 +00:00
Yingwei Zheng	1a3eace82a	[InstCombine] Fold `umax(X, C) + -C` into `usub.sat(X, C)` (#118195 ) Alive2: https://alive2.llvm.org/ce/z/oSWe5S Closes https://github.com/llvm/llvm-project/issues/118155	2024-12-01 23:29:40 +08:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Tyler Nowicki	b40714b012	[Coroutines][NFC] Refactor CoroCloner (#116885 ) * Move CoroCloner to its own header. For now, the header is located in llvm/lib/Transforms/Coroutines * Change private to protected to allow inheritance * Create CoroSwitchCloner and move some of the switch specific code into this cloner. More code will follow in later commits.	2024-11-29 11:20:33 -05:00
Alexey Bataev	f4974e0931	[SLP] Add a check for poison value in AShrChecker Need to check if the value in AShrChecker is a poison before casting it to instruction to avoid compiler crash Fixes #118030	2024-11-29 06:51:19 -08:00
Luke Lau	d9c269577e	[VPlan] Remove manual constant fold in VPWidenIntOrFpInductionRecipe. NFC (#118028 ) This manual constant folding was added in 2017 in https://reviews.llvm.org/D29956, but since then it looks like IRBuilder has learnt to fold it away itself. I'm not sure at what point this happened, I just verified this by stepping through the call to CreateVectorSplat in the debugger.	2024-11-29 00:21:53 +01:00
Florian Hahn	12cefcc7ec	[Matrix] Skip already fused instructions before trying to fuse multiply. lowerDotProduct called above may already lower a matrix multiply and mark it as procssed by adding it to FusedInsts. Don't try to process it again in LowerMatrixMultiplyFused by checking if FusedInsts. Without this change, we trigger an assertion when trying to erase the same original matrix multiply twice.	2024-11-28 16:11:40 +00:00
Rafael Eckstein	2a6e5896a5	[MergeFunctions] Add support to run the pass over a set of function pointers (#111045 ) This modification will enable the usage of `MergeFunctions` as a standalone library. Currently, `MergeFunctions` can only be applied to an entire module. By adopting this change, developers will gain the flexibility to reuse the `MergeFunctions` code within their own projects, choosing which functions to merge; hence, promoting code reusability. Notice that this modification will not break backward compatibility, because `MergeFunctions` will still work as a pass after the modification.	2024-11-28 16:18:52 +01:00
Florian Hahn	82821254f5	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758 ) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758	2024-11-28 10:12:41 +00:00
Elvis Wang	9ea5be639d	Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109 )" (#117289 ) Update the test cases contains `any-of` printings from the precomputeCost(). Origin message: The any-of reduction contains phi and select instructions. The select instruction might be optimized and removed in the vplan which may cause VF difference between legacy and VPlan-based model. But if the select instruction be removed, planContainsAdditionalSimplifications() will catch it and disable the assertion. Therefore, we can just remove the ayn-of reduction calculation in the precomputeCost(). Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"	2024-11-28 15:07:36 +08:00
LiqinWeng	4a3f46de50	[LV][EVL] Support call instruction with EVL-vectorization (#110412 )	2024-11-28 10:05:08 +08:00
Joseph Huber	4cb4516ae9	[OpenMP] Fix RPC client not being optimized out after changes Summary: I forgot that this check deliberately looked through the indirection I removed. Fix it to just check if the symbol has no users.	2024-11-27 15:56:23 -06:00
Joseph Huber	89d8e70031	[libc] Export a pointer to the RPC client directly (#117913 ) Summary: We currently have an unnecessary level of indirection when initializing the RPC client. This is a holdover from when the RPC client was not trivially copyable and simply makes it more complicated. Here we use the `asm` syntax to give the C++ variable a valid name so that we can just copy to it directly. Another advantage to this, is that if users want to piggy-back on the same RPC interface they need only declare theirs as extern with the same symbol name, or make it weak to optionally use it if LIBC isn't avaialb.e	2024-11-27 14:57:38 -06:00
Krzysztof Pszeniczny	991154d0fb	[LTO] Use .at instead of .lookup to avoid copies. (NFC) (#117888 ) `DenseMap::lookup` returns by value (because it default-creates the returned value if the key isn't present in the map), which means that we do a lot of copying here. Since we assert that something is present in the returned value two lines below this call, it's safe to use `.at` here instead. Copying and then destroying dense maps here is responsible for 60% of the time spent in LTO indexing in a large internal build.	2024-11-27 18:41:29 +01:00
Nikita Popov	43ee6f7a01	[AlwaysInline] Avoid unnecessary BFI fetches (#117750 ) AlwaysInliner doesn't use BFI itself, it only updates it. If BFI is not already computed, it will spend time to first compute it, and then update it. This is not necessary: If BFI is not available in the first place, there is no need to update it. This is mainly relevant in debug builds for IR that has a lot of alwaysinline functions.	2024-11-27 15:53:21 +01:00
Nikita Popov	fc5c89900f	[SimpleLoopUnswitch] Fix LCSSA phi node invalidation Fixes https://github.com/llvm/llvm-project/issues/117537.	2024-11-27 11:48:05 +01:00
Yingwei Zheng	0f0c0c36e3	[ConstraintElim] Extend `checkOrAndOpImpliedByOther` to handle and/or expr trees. (#117123 ) This patch extends `checkOrAndOpImpliedByOther` to handle and/or trees. Limitation: At least one of the operands of root and/or instruction should be an icmp. That is, this patch doesn't support expressions like `(cmp1 & cmp2) & (cmp3 & cmp4)`. Closes https://github.com/llvm/llvm-project/issues/117107. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=69cc3f096ccbdef526bbd5a065a25c95122e87ee&to=919416d2c4c71e3b9fe533af2c168a36c7893be5&stat=instructions%3Au	2024-11-27 09:04:52 +08:00
AdityaK	39601a6e54	Bail out jump threading on indirect branches only (#117778 ) Remove check for PHI in pred as pointed out in #103688 Reduced the testcase to remove redundant phi in pred Fixes: #102351	2024-11-26 14:57:28 -08:00
Florian Hahn	46a08579f2	[Local] Only intersect alias.scope,noalias & parallel_loop if inst moves (#117716 ) Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata on the replacement instruction, if it does not move. In that case, the program would be UB, if the aliasing property encoded in the metadata does not hold. This makes use of the clarification re aliasing metadata implying UB if the property does not hold: #116220 Same as #115868, but for !alias.scope, !noalias and !mem.parallel_loop_access. PR: https://github.com/llvm/llvm-project/pull/117716	2024-11-26 20:39:53 +00:00
Florian Hahn	ab6677e7d6	[LICM] Only set AA metadata on hoisted load if it executes. (#117204 ) https://github.com/llvm/llvm-project/pull/116220 clarified that violations of aliasing metadata are UB. Only set the AA metadata after hoisting a log, if it is guaranteed to execute in the original loop. PR: https://github.com/llvm/llvm-project/pull/117204	2024-11-26 14:16:16 +00:00

1 2 3 4 5 ...

38279 Commits