llvm-project

Author	SHA1	Message	Date
Vitaly Buka	fc201d6133	Revert "[InstCombine] Support gep nuw in icmp folds" (#118698 ) Reverts llvm/llvm-project#118472 Breaks profile tests on i386 https://lab.llvm.org/buildbot/#/builders/66/builds/7009	2024-12-04 15:07:27 -08:00
Pedro Lobo	0d1e762da7	[InstSimplify] Refine `abs(min/undef, true)` to `poison` (#118669 ) Calls to `@llvm.abs(undef, i1 true)` and `@llvm.abs(INT_MIN, i1 true)` can be optimized to `poison` instead of `undef`. [Alive2](https://alive2.llvm.org/ce/z/Hg-2ug)	2024-12-04 18:41:05 +00:00
Simon Pilgrim	85d15bd130	[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642 ) Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers. Fixes #111126	2024-12-04 16:36:00 +00:00
Nikita Popov	4a7abfe0a7	[InstCombine] Preserve nuw in OptimizePointerDifference If both the geps and the subs are nuw the new sub is also nuw. Proof: https://alive2.llvm.org/ce/z/mM8UvF	2024-12-04 16:58:35 +01:00
Nikita Popov	a608607fd7	[ConstraintElim] Add support for decomposing gep nuw (#118639 ) ConstraintElimination currently only supports decomposing gep nusw with non-negative indices (with "non-negative" possibly being enforced via pre-condition). Add support for gep nuw, which directly gives us the necessary guarantees for the decomposition.	2024-12-04 16:27:31 +01:00
Florian Hahn	7b6e0d9fc3	[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282 ) ValueMap automatically updates entries with the new value if they have been RAUW. This can lead to instructions that are expected to not have shape info to be added to the map (e.g. shufflevector as in the added test case). This leads to incorrect results. Originally it was used for transpose optimizations, but they now all use updateShapeAndReplaceAllUsesWith, which takes care of updating the shape info as needed. This fixes a crash in the newly added test cases. PR: https://github.com/llvm/llvm-project/pull/118282	2024-12-04 14:51:31 +00:00
Paul Walker	a88653a2cd	[LLVM][IR] When evaluating GEP offsets don't assume ConstantInt is a scalar. (#117162 )	2024-12-04 12:45:30 +00:00
Simon Pilgrim	140df02aa2	[SLP][X86] Update test coverage for #111126 I'd copied the test case from #118016 instead of the original #111126 test case	2024-12-04 12:28:55 +00:00
Nikita Popov	75af62839b	[ConstraintElim] Add tests for gep nuw (NFC)	2024-12-04 13:17:02 +01:00
John Brawn	ecbe4d1e36	[IR] Allow fast math flags on fptrunc and fpext (#115894 ) This consists of: * Make these instructions part of FPMathOperator. * Adjust bitcode/ir readers/writers to expect fast math flags on these instructions. * Make IRBuilder set the fast math flags on these instructions. * Update langref and release notes. * Update a bunch of tests. Some of these are due to InstCombineCasts incorrectly adding fast math flags to fptrunc, which will be fixed in a later patch.	2024-12-04 10:53:04 +00:00
Simon Pilgrim	2202f0e093	[SLP][X86] Add test coverage for #111126 This needs to be expanded to a wider range of tests but for now just focus on #111126	2024-12-04 10:03:43 +00:00
Antonio Frighetto	f68b0e3699	[AggressiveInstCombine] Use APInt and avoid truncation when folding loads A miscompilation issue has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/118467.	2024-12-04 10:20:14 +01:00
ronryvchin	ff281f7d37	[PGO] Add option to always instrumenting loop entries (#116789 ) This patch extends the PGO infrastructure with an option to prefer the instrumentation of loop entry blocks. This option is a generalization of `19fb5b467b`, and helps to cover cases where the loop exit is never executed. An example where this can occur are event handling loops. Note that change does NOT change the default behavior.	2024-12-04 07:56:46 +01:00
Owen Anderson	14a259f85b	GlobalOpt: Use the correct address space when creating a "*.init" global. (#118562 )	2024-12-04 14:01:16 +13:00
Florian Hahn	45ff28746f	[ConstraintSystem] Fix signed overflow in negate. Use AddOverflow for potentially overflowing addition to fixed signed integer overflow. Compile-time impact is in the noise https://llvm-compile-time-tracker.com/compare.php?from=bfb26202e05ee2932b4368b5fca607df01e8247f&to=195b0707148b567c674235e59712458e7ce1bb0e&stat=instructions:u	2024-12-03 21:06:36 +00:00
Lee Wei	9bf6365237	[llvm] Remove `br i1 undef` from some regression tests [NFC] (#118419 ) This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`. After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`, which has more UB tests than `llvm/tests/Transforms`.	2024-12-03 20:54:36 +00:00
Igor Kirillov	af31aa4455	[LV] Pre-commit tests for fixed width VF fully unrolled loop cost model change	2024-12-03 16:47:52 +00:00
Dominik Steenken	866b9f43a0	[SystemZ] Add realistic cost estimates for vector reduction intrinsics (#118319 ) This PR adds more realistic cost estimates for these reduction intrinsics - `llvm.vector.reduce.umax` - `llvm.vector.reduce.umin` - `llvm.vector.reduce.smax` - `llvm.vector.reduce.smin` - `llvm.vector.reduce.fadd` - `llvm.vector.reduce.fmul` - `llvm.vector.reduce.fmax` - `llvm.vector.reduce.fmin` - `llvm.vector.reduce.fmaximum` - `llvm.vector.reduce.fminimum` - `llvm.vector.reduce.mul ` The pre-existing cost estimates for `llvm.vector.reduce.add` are moved to `getArithmeticReductionCosts` to reduce complexity in `getVectorIntrinsicInstrCost` and enable other passes, like the SLP vectorizer, to benefit from these updated calculations. These are not expected to provide noticable performance improvements and are rather provided for the sake of completeness and correctness. This PR is in draft mode pending benchmark confirmation of this. This also provides and/or updates cost tests for all of these intrinsics. This PR was co-authored by me and @JonPsson1 .	2024-12-03 17:08:51 +01:00
Nikita Popov	10223c72a9	[ConstraintElim] Use nusw flag for GEP decomposition Check for nusw instead of inbounds when decomposing GEPs. In this particular case, we can also look through multiple nusw flags, because we will ultimately be working in the unsigned constraint system.	2024-12-03 15:56:29 +01:00
Nikita Popov	f33536468b	[InstCombine] Support gep nuw in icmp folds (#118472 ) Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned icmp of gep nusw nuw folds to unsigned samesign icmp of offsets. Proofs: https://alive2.llvm.org/ce/z/VEwQY8	2024-12-03 14:28:56 +01:00
David Sherwood	8075445613	[LoopVectorize] Add tests for dereferenceable loads in more loops (#118470 ) * Adds tests for strided accesses. * Adds tests for reverse loops. As part of this I've moved one of the negative tests from load-deref-pred-align.ll into a new file (load-deref-pred-neg-off.ll) because the pointer type had a size of 16 bits and I realised it's probably not sensible for allocas that are >16 bits in size!	2024-12-03 12:41:30 +00:00
Nikita Popov	bdc6faf775	[InstCombine] Support nusw in icmp of two geps with same base Proof: https://alive2.llvm.org/ce/z/BYNQ7s	2024-12-03 11:51:14 +01:00
Nikita Popov	9c5a84b394	[InstCombine] Support nusw in icmp of gep with base Proof: https://alive2.llvm.org/ce/z/omnQXt	2024-12-03 11:51:14 +01:00
Ramkumar Ramachandra	bfb26202e0	LV/test: clean up a test and regen with UTC (#118394 )	2024-12-03 09:46:19 +00:00
Nikita Popov	5b0f4f2cb0	[BasicAA] Treat returns_twice functions as clobbering unescaped objects (#117902 ) Effectively this models all the accesses that occur between the first and second return as happening at the point of the call. Fixes https://github.com/llvm/llvm-project/issues/116668.	2024-12-03 09:55:12 +01:00
Antonio Frighetto	1d6ab189be	[MemCpyOpt] Drop dead `memmove` calls on `memset`'d source data When a memmove happens to clobber source data, and such data have been previously memset'd, the memmove may be redundant.	2024-12-03 09:50:57 +01:00
Antonio Frighetto	e30d304d72	[MemCpyOpt] Introduce test for PR101930 (NFC)	2024-12-03 09:50:56 +01:00
Yingwei Zheng	c1ad064dd3	[InstCombine] Fold `icmp spred (and X, highmask), C1` into `icmp spred X, C2` (#118197 ) Alive2: https://alive2.llvm.org/ce/z/Ffg64g Closes https://github.com/llvm/llvm-project/issues/104772.	2024-12-03 16:19:12 +08:00
Rajat Bajpai	de415fbb45	[InstCombine][FP] Fix nnan preservation for transform fcmp + sel => fmax/fmin (#117977 ) Preserve `nnan` constraint only if present on both `fcmp` and `select`. Alive2: https://alive2.llvm.org/ce/z/ZNDjzt	2024-12-03 14:01:36 +08:00
Yingwei Zheng	295d6b18f7	[InstCombine] Fold `(X * (Y << K)) u>> K -> X * Y` when highbits are not demanded (#111151 ) Alive2: https://alive2.llvm.org/ce/z/Z7QgjH	2024-12-03 12:04:04 +08:00
Han-Kuan Chen	f71ea4bc1b	[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260 )	2024-12-03 09:04:04 +08:00
Matt Arsenault	681bd84563	AMDGPU: Add baseline test for lane index simplification (#117962 )	2024-12-02 14:50:02 -05:00
Florian Hahn	21d27b3aab	[LoopUnroll] Add tests for loop unrolling on Apple platforms. Add first set of tests where runtime unrolling can be highly beneficial on Apple Silicon CPUs.	2024-12-02 15:48:48 +00:00
Yingwei Zheng	16ec534989	[ValueTracking] Handle and/or of conditions in `computeKnownFPClassFromContext` (#118257 ) Fix a typo introduced by https://github.com/llvm/llvm-project/pull/83161. This patch also supports decomposition of and/or expressions in `computeKnownFPClassFromContext`. Compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=688bb432c4b618de69a1d0e7807077a22f15762a&to=07493fc354b686f0aca79d6f817091a757bd7cd5&stat=instructions:u	2024-12-02 21:00:55 +08:00
Florian Hahn	f5bc6b47e8	[PhaseOrdering] Remove -enable-matrix flag from sub-xor.ll test. The test does not use matrix intrinsics, so does not need enable-matrix.	2024-12-02 11:43:43 +00:00
Nikita Popov	7a7a426188	[LVI] Fix insertelement of constexpr Bail out when evaluating an insertelement of a constant expression. Unlike other ValueLattice kinds, these don't have implicit splat semantics and we end up with type mismatches. If we actually wanted to handle these, we should actually evaluate the insertion via constant folding. I'm not bothering with that, as these should get constant folded on construction already.	2024-12-02 10:21:09 +01:00
Nikita Popov	8201926ec0	[InstSimplify] Generalize simplification of icmps with monotonic operands (#69471 ) InstSimplify currently folds patterns like `(x \| y) uge x` and `(x & y) ule x` to true. However, it cannot handle combinations of such situations, such as `(x \| y) uge (x & z)` etc. To support this, recursively collect operands of monotonic instructions (that preserve either a greater-or-equal or less-or-equal relationship) and then check whether any of them match. Fixes https://github.com/llvm/llvm-project/issues/69333.	2024-12-02 09:53:10 +01:00
Nikita Popov	7bbc049688	[InstCombine] Consolidate another fold into select value equivalence (#117746 ) We had a separate fold that handled just the trivial case where we're replacing exactly the argument of the select. Handle this in select value equivalence by relaxing the infinite loop protection to allow a replacement of a non-constant with a constant. This also fixes https://github.com/llvm/llvm-project/issues/113301, as the separate fold did not handle undef values correctly.	2024-12-02 09:45:39 +01:00
Veera	979a0356d4	[InstCombine] Fold `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp C1` (#116888 ) Fixes #82414. General Proof: https://alive2.llvm.org/ce/z/ERjNs4 Proof for Tests: https://alive2.llvm.org/ce/z/K-934G This PR transforms `select` instructions of the form `select (Cmp X C1) (BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`. This helps in eliminating a noop loop in https://github.com/rust-lang/rust/issues/123845 but does not improve optimizations.	2024-12-02 09:33:45 +01:00
Yingwei Zheng	1a3eace82a	[InstCombine] Fold `umax(X, C) + -C` into `usub.sat(X, C)` (#118195 ) Alive2: https://alive2.llvm.org/ce/z/oSWe5S Closes https://github.com/llvm/llvm-project/issues/118155	2024-12-01 23:29:40 +08:00
Yingwei Zheng	f7ef0721d6	[SCEV] Do not allow refinement in the rewriting of BEValue (#117152 ) See the following case: ``` ; bin/opt -passes="print<scalar-evolution>" test.ll --disable-output define i32 @widget() { b: br label %b1 b1: ; preds = %b5, %b %phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ] %phi2 = phi i32 [ 1, %b ], [ %add, %b5 ] %icmp = icmp eq i32 %phi, 0 br i1 %icmp, label %b3, label %b8 b3: ; preds = %b1 %udiv = udiv i32 10, %phi2 %urem = urem i32 %udiv, 10 %icmp4 = icmp eq i32 %urem, 0 br i1 %icmp4, label %b7, label %b5 b5: ; preds = %b3 %udiv6 = udiv i32 %phi2, 0 %add = add i32 %phi2, 1 br label %b1 b7: ; preds = %b3 ret i32 5 b8: ; preds = %b1 ret i32 7 } ``` ``` %phi2 = phi i32 [ 1, %b ], [ %add, %b5 ] --> {1,+,1}<nuw><nsw><%b1> %udiv6 = udiv i32 %phi2, 0 --> ({1,+,1}<nuw><nsw><%b1> /u 0) %phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ] --> ({0,+,1}<nuw><nsw><%b1> /u 0) ``` `ScalarEvolution::createAddRecFromPHI` gives a wrong SCEV result for `%phi`: `d7d6fb1804/llvm/lib/Analysis/ScalarEvolution.cpp (L5926-L5950)` It converts `phi(0, ({1,+,1}<nuw><nsw><%b1> /u 0))` into `phi(0 / 0, ({1,+,1}<nuw><nsw><%b1> /u 0))`. Then it simplifies the expr into `{0,+,1}<nuw><nsw><%b1> /u 0`. As we did in `acd700a24b`, this patch disallows udiv simplification if we cannot prove that the denominator is a well-defined non-zero value. Fixes https://github.com/llvm/llvm-project/issues/117133.	2024-12-01 20:11:09 +08:00
Simon Pilgrim	94df95de6b	[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999 ) SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur. Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.	2024-12-01 12:01:47 +00:00
Yingwei Zheng	6568ceb9fa	[CodeGenPrepare] Drop nsw flags in `optimizeLoadExt` (#118180 ) Alive2: https://alive2.llvm.org/ce/z/pMcD7q Closes https://github.com/llvm/llvm-project/issues/118172.	2024-12-01 11:25:31 +08:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Marina Taylor	8fb748b4a7	[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar (#117876 ) When the size is an appropriate constant, __memcpy_chk will turn into a memcpy that gets folded away by InstCombine. Therefore this patch avoids counting these as calls for purposes of inlining costs. This is only really relevant on platforms whose headers redirect memcpy to __memcpy_chk (such as Darwin). On platforms that use intrinsics, memcpy and similar functions are already exempt from call penalties.	2024-11-29 18:28:39 +00:00
David Green	fe04290482	[AArch64] Change the default vscale-for-tuning to 1. (#117174 ) Most AArch64 cpus outside of Neoverse V1 (256) and A64FX (512) have an SVE vector length of 128, and in environments like Android (where no mcpu option is common) we would expect all cpus to match. This patch changes the default vector length to 128 with -mcpu=generic, to match the most common case.	2024-11-29 17:41:05 +00:00
Alexey Bataev	f4974e0931	[SLP] Add a check for poison value in AShrChecker Need to check if the value in AShrChecker is a poison before casting it to instruction to avoid compiler crash Fixes #118030	2024-11-29 06:51:19 -08:00
David Green	6f4b4f41ca	[AArch64] Remove LoopVectorizer/AArch64/scatter-cost.ll test. NFC This test checks the costs, not vectorization, so is better placed in the existing gather/scatter cost modelling tests. An extra neoverse-v2 check line has been added for both gathers and scatters.	2024-11-29 14:38:36 +00:00
Ramkumar Ramachandra	4e8eabd93e	DSE: pre-commit tests for scalable vectors (#110669 ) As AliasAnalysis now has support for scalable sizes, add tests to DeadStoreElimination covering the scalable vectors case, in preparation to extend it.	2024-11-28 16:16:16 +00:00
Florian Hahn	12cefcc7ec	[Matrix] Skip already fused instructions before trying to fuse multiply. lowerDotProduct called above may already lower a matrix multiply and mark it as procssed by adding it to FusedInsts. Don't try to process it again in LowerMatrixMultiplyFused by checking if FusedInsts. Without this change, we trigger an assertion when trying to erase the same original matrix multiply twice.	2024-11-28 16:11:40 +00:00

1 2 3 4 5 ...

30532 Commits