llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	345d7b1618	[InstCombine] Fold minmax intrinsic using KnownBits information (#76242 ) This patch tries to fold minmax intrinsic by using `computeConstantRangeIncludingKnownBits`. Fixes regression in [_karatsuba_rec:cpython/Modules/_decimal/libmpdec/mpdecimal.c](`c31943af16/Modules/_decimal/libmpdec/mpdecimal.c (L5460-L5462)`), which was introduced by #71396. See also https://github.com/dtcxzyw/llvm-opt-benchmark/issues/16#issuecomment-1865875756. Alive2 for splat vectors with undef: https://alive2.llvm.org/ce/z/J8hKWd	2023-12-23 04:41:32 +08:00
Simon Pilgrim	3736e1d1cd	[SCEV] Ensure shift amount is in range before calling getZExtValue() Fixes #76234	2023-12-22 14:16:54 +00:00
Nikita Popov	a134abf4be	[ValueTracking] Make isGuaranteedNotToBeUndef() more precise (#76160 ) Currently isGuaranteedNotToBeUndef() is the same as isGuaranteedNotToBeUndefOrPoison(). This function is used in places where we only care about undef (due to multi-use issues), not poison. Make it more precise by only considering instructions that can create undef (like loads or call), and ignore those that can only create poison. In particular, we can ignore poison-generating flags. This means that inferring more flags has less chance to pessimize other transforms.	2023-12-21 16:49:37 +01:00
Nikita Popov	e414ba33b4	[ValueTracking] Shufflevector produces poison rather than undef Shufflevector semantics have changed so that poison mask elements return poison rather than undef. Reflect this in the canCreateUndefOrPoison() implementation.	2023-12-21 15:21:23 +01:00
Paschalis Mpeis	2e3d77d6ed	[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642 ) [TLI] Pass replace-with-veclib works with Scalable Vectors. The pass is heavily refactored. It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors. Improve tests for ArmPL and SLEEF Intrinsics: - Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines. - Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]` - Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.	2023-12-21 12:37:57 +00:00
Paschalis Mpeis	c4ff0a67d1	[TLI] Add getLibFunc that accepts an Opcode and scalar Type. (#75919 ) It sets a LibFunc similarly with the other two getLibFunc methods. Currently, it supports only the FRem Instruction. Add tests for FRem.	2023-12-21 11:02:54 +00:00
Nikita Popov	0df3200931	[ValueTracking] Fix KnownBits conflict for poison-only vector If all the demanded elements are poison, return unknown instead of conflict to avoid downstream assertions. Fixes https://github.com/llvm/llvm-project/issues/75505.	2023-12-21 09:23:47 +01:00
bipmis	64987c648f	[ValueTracking] isNonZero sub of ptr2int's with recursive GEP (#68680 ) When the sub arguments are ptr2int it is not possible to determine computeKnownBits() of its arguments. For scalar case generally sub of 2 ptr2int are converted to sub of indexes. However a loop with recursive GEP/PHI where the arguments to sub is of type ptr2int, if it is possible to determine that a sub of this GEP and another pointer with the same base is KnownNonZero we can return this. This helps subsequent passes to optimize the loop further.	2023-12-20 14:11:58 +00:00
Nikita Popov	0d3d445223	[LVI] Remove unnecessary TLI dependency Only used in ConstantFoldCompareInstOperands(), which does not actually use TLI.	2023-12-19 14:32:43 +01:00
Paschalis Mpeis	ddb6db4d09	[VFABI] Create FunctionType for vector functions (#75058 ) `createFunctionType` returns a FunctionType that may contain a mask, which is currently placed as the last parameter to the Function. The placement happens according to `VFParameters` of `VFInfo`, and it should be able to handle VFABI specification changes. Regarding the return type, it uses the scalar type of the input instruction, as the specification does not encode in the mangled name such information. If that ever happens, that information should be available from `VFInfo`.	2023-12-19 12:05:28 +00:00
Nikita Popov	6d91905f97	[ValueTracking] Short-circuit on unknown bits in isKnownNonEqual() (NFC) Don't bother computing known bits for the second operand if we know nothing about the first.	2023-12-18 15:36:38 +01:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Nikita Popov	337504683e	[ValueTracking] Use isKnownNonEqual() in isNonZeroSub() (x - y) != 0 is true iff x != y, so use the isKnownNonEqual() helper, which knows some additional tricks.	2023-12-18 12:26:40 +01:00
David Sherwood	49b0e6dcc2	[LoopVectorize] Enable hoisting of runtime checks by default (#71538 ) With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks from a vectorised inner loop to the preheader of the next outer-most loop. This is useful for benchmarks like SPEC2017's x264 where the inner loop is vectorised and only has a small trip count. In such cases the runtime memory checks become expensive and since the checks never fail in the case of x264 it makes sense to do this. However, this behaviour was controlled by the flag -hoist-runtime-checks which was off by default. This patch enables this flag by default for all targets, since I believe this is a generally beneficial thing to do. I have tested this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight improvements in the overall geomean on both machines. The only other notable changes were a 1% drop in the roms benchmark, which was compensated for by a 1% improvement in fotonik3d.	2023-12-18 09:41:54 +00:00
bipmis	6df6320374	[ValueTracking] isNonEqual Pointers with with a recursive GEP (#70459 ) Handles canonical icmp eq(ptr1, ptr2) -> where ptr1/ptr2 is a recursive GEP. Can helps scenarios where InstCombineCompares folds icmp eq(sub(ptr2int, ptr2int), 0) -> icmp eq(ptr1, ptr2) and icmp eq(phi(sub(ptr2int, ptr2int), ...)) -> phi i1 (icmp eq(sub(ptr2int, ptr2int), 0), ....)	2023-12-15 10:02:57 +00:00
Yingwei Zheng	1fea712cd1	[ValueTracking] Infer `X u<= X +nuw Y` for any Y (#75524 ) Alive2: https://alive2.llvm.org/ce/z/kiGxCf Fixes #70374.	2023-12-15 16:33:39 +08:00
Nikita Popov	bf5d96c96c	[IR] Add dead_on_unwind attribute (#74289 ) Add the `dead_on_unwind` attribute, which states that the caller will not read from this argument if the call unwinds. This allows eliding stores that could otherwise be visible on the unwind path, for example: ``` declare void @may_unwind() define void @src(ptr noalias dead_on_unwind %out) { store i32 0, ptr %out call void @may_unwind() store i32 1, ptr %out ret void } define void @tgt(ptr noalias dead_on_unwind %out) { call void @may_unwind() store i32 1, ptr %out ret void } ``` The optimization is not valid without `dead_on_unwind`, because the `i32 0` value might be read if `@may_unwind` unwinds. This attribute is primarily intended to be used on sret arguments. In fact, I previously wanted to change the semantics of sret to include this "no read after unwind" property (see D116998), but based on the feedback there it is better to keep these attributes orthogonal (sret is an ABI attribute, dead_on_unwind is an optimization attribute). This is a reboot of that change with a separate attribute.	2023-12-14 09:58:14 +01:00
Paul Walker	930b5b52ff	[ConstantHoisting] Add a TTI hook to prevent hoisting. (#69004 ) Code generation can sometimes simplify expensive operations when an operand is constant. An example of this is divides on AArch64 where they can be rewritten using a cheaper sequence of multiplies and subtracts. Doing this is often better than hoisting expensive constants which are likely to be hoisted by MachineLICM anyway.	2023-12-13 17:20:36 +00:00
Bruno De Fraine	c030778979	[AST] Switch to MemoryLocation in add method (NFC) Pass MemoryLocation as one argument, instead of passing all its parts separately.	2023-12-13 12:15:09 +01:00
Nikita Popov	1f5ee80789	[LVI] Don't return optional from getEdgeValueLocal() (NFC) The general convention inside LVI is that std::nullopt means that a value has been pushed to the worklist. However, getEdgeValueLocal() used it as an additional spelling for getOverdefined() instead.	2023-12-12 15:32:26 +01:00
Nikita Popov	7de592b9f9	[LVI] Move bulk of getConstantRangeAtUse() implementation into Impl (NFC) Make the layering here similar to all the other methods: LazyValueInfoImpl implements the underlying API returning a ValueLatticeElement, and then LazyValueInfo exposes this as a ConstantRange publicly.	2023-12-12 15:27:00 +01:00
Nikita Popov	edcc7fe9aa	[LVI] Reuse LatticeValue to ConstantRange conversion more Use the helper in the getConstantRange() and getConstantRangeAtUse() APIs as well. For that purpose move the handling of isUnknown() into the helper as well.	2023-12-12 14:42:24 +01:00
Nikita Popov	bfebadc8c3	[LVI] Don't require DataLayout in getConstantRangeOrFull() (NFC) We're only working on integers here, so we don't need DataLayout to determine the width.	2023-12-12 14:29:08 +01:00
Nikita Popov	0c2b6a0225	[LVI] Drop bitcast handling (NFCI) The code only works on integer casts, and the only bitcasts involving integers are trivial. The code as previously written would try to handle things like float to integer bitcasts by fetching a ConstantRange of a float value, which is an ill-defined operation.	2023-12-12 13:09:51 +01:00
Nikita Popov	516e34d98a	[LVI] Switch getValueFromCondition() to use recursion The current implementation using a worklist and visited map adds a significant amount of additional complexity and compile-time overhead. All we really care about here is that we don't overflow the stack or cause exponential complexity in degenerate cases. We can achieve this with a simple depth limit.	2023-12-12 10:55:10 +01:00
David Sherwood	ceb02379a9	[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515 ) When attempting to hoist runtime checks out of a loop we currently avoid creating pointer diff checks and prefer to do expanded range checks instead. This gives us the opportunity to hoist runtime checks out of a loop, since these checks are loop invariant. However, in some cases the pointer diff checks would also be loop invariant and so will naturally get hoisted. Therefore, since diff checks are cheaper so we should prefer to use those instead.	2023-12-12 09:10:39 +00:00
Nikita Popov	90d82412ea	[SCEV] Use loop guards when checking that RHS >= Start (#75039 ) Loop guards tend to provide better results when it comes to reasoning about ranges than isLoopEntryGuardedByCond(). See the test change for the motivating case. I have retained both the loop guard check and the implied cond based check for now, though the latter only seems to impact a single test and only via side effects (nowrap flag calculation) at that.	2023-12-12 09:41:54 +01:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
Nikita Popov	7686d49517	[ValueTracking] Handle returned attribute with mismatched type The returned attribute can be used when it is possible to "losslessly bitcast" between the argument and return type, including between two vector types. computeKnownBits() would crash in this case, isKnownNonZero() would potentially produce a miscompile. Fixes https://github.com/llvm/llvm-project/issues/74722.	2023-12-08 17:05:13 +01:00
Nikita Popov	52296e2527	[DomCondCache] Remove unused variable (NFC)	2023-12-08 10:25:04 +01:00
Nikita Popov	292256673c	[ValueTracking] Remove unused argument (NFC)	2023-12-08 10:25:04 +01:00
Kazu Hirata	6c87a0af95	[Analysis] Remove unnecessary includes (NFC)	2023-12-07 22:15:32 -08:00
Craig Topper	32ec5fbfed	[ValueTracking] Use BinaryOperator instead of Operator in matchSimpleRecurrence. (#74678 ) Operator allows the phi operand to be a ConstantExpr. A ConstantExpr is a valid operand to a phi, but is never going to be a recurrence. We can only match a BinaryOperator so use that instead.	2023-12-07 10:27:57 -08:00
Nikita Popov	6a1badfed2	[ValueTracking] Add missing check when computing known bits from pointer icmp I'm not sure whether it's possible to cause a miscompile due to the missing check right now, as the affected values mechanism effectively protects us against this. This becomes a problem for an upcoming patch though.	2023-12-07 14:10:59 +01:00
Nikita Popov	753c51bf88	[AST] Fix size merging for MustAlias sets (#73820 ) AST checks aliasing with MustAlias sets by only checking the representative pointer (getSomePointer). This is only correct if the Size and AATags information of that pointer also includes the Size/AATags of all other pointers in the set. When we add a new pointer to the AliasSet, we do perform this update (see the code in AliasSet::addPointer). However, if a pointer already in the MustAlias set is used with a new size, we currently do not update the representative pointer, resulting in miscompilations. Fix this by adding the missing update. This is a targeted fix using the current representation. There are a couple of alternatives: * For MustAlias sets, don't store per-pointer Size/AATags at all. This would make it clear that there is only one set of common Size/AATags for all pointers. * Check against all pointers in the set even for MustAlias. This is what https://github.com/llvm/llvm-project/pull/65731 proposes to do as part of a larger change to AST representation. Fixes https://github.com/llvm/llvm-project/issues/64897.	2023-12-07 10:45:48 +01:00
Paschalis Mpeis	7b83f69db4	[NFC] Replace CallInst with FunctionType in VFABI, VFShape API (#74569 ) Minor simplification applied to VFShape::getScalarShape, VFShape::get, and VFABI::tryDemangleForVFABI methods. Also, remove unnecessary `static_cast` in `SLPVectorizer.cpp`	2023-12-06 17:14:58 +00:00
Teresa Johnson	88fbc4d3df	[ThinLTO] Add tail call flag to call edges in summary (#74043 ) This adds support for a HasTailCall flag on function call edges in the ThinLTO summary. It is intended for use in aiding discovery of missing frames from tail calls in profiled call stacks for MemProf of profiled binaries that did not disable tail call elimination. A follow on change will add the use of this new flag during MemProf context disambiguation. The new flag is encoded in the bitcode along with either the hotness flag from the profile, or the relative block frequency under the -write-relbf-to-summary flag when there is no profile data. Because we now will always have some additional call edge information, I have removed the non-profile function summary record format, and we simply encode the tail call flag along with a hotness type of none when there is no profile information or relative block frequency. The change of record format and name caused most of the test case changes. I have added explicit testing of generation of the new tail call flag into the bitcode and IR assembly format as part of the changes to llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also added round trip testing through assembly and bitcode to llvm/test/Assembler/thinlto-summary.ll.	2023-12-06 08:41:44 -08:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Craig Topper	5c3496ff33	[InstCombine] Check isGuaranteedNotToBeUndef in haveNoCommonBitsSetSpecialCases. (#74390 ) It's not safe for InstCombine to add disjoint metadata when converting Add to Or otherwise. I've added noundef attribute to preserve existing test behavior.	2023-12-05 10:33:44 -08:00
Nikita Popov	ff0e4fb89a	[SCEV] Use or disjoint flag (#74467 ) Use the disjoint flag to convert or to add instead of calling the haveNoCommonBitsSet() ValueTracking query. This ensures that we can reliably undo add -> or canonicalization, even in cases where the necessary information has been lost or is too complex to reinfer in SCEV. I have updated the bulk of the test coverage to add the necessary disjoint flags in advance.	2023-12-05 17:01:46 +01:00
Alexandros Lamprineas	3ad6d1cbe5	[LAA] Fix incorrect dependency classification. (#70819 ) As shown in #70473, the following loop was not considered safe to vectorize. When determining the memory access dependencies in a loop which has negative iteration step, we invert the source and sink of the dependence. Perhaps we should just invert the operands to getMinusSCEV(). This way the dependency is not regarded to be true, since the users of the `IsWrite` variables, which correspond to each of the memory accesses, rely on program order and therefore should not be swapped. void vectorizable_Read_Write(int *A) { for (unsigned i = 1022; i >= 0; i--) A[i+1] = A[i] + 1; }	2023-12-05 15:27:30 +00:00
Mikhail Goncharov	0d0c229855	Revert "Reapply "ValueTracking: Identify implied fp classes by general fcmp (#66505 )"" This reverts commit d55692d60d218f402ce107520daabed15f2d9ef6. See discussion in #66505: assertion fires in OSS build of TensorFlow.	2023-12-05 11:10:24 +01:00
Nikita Popov	383e35048e	[CaptureTracking] Treat vector GEPs as captures Because AA does not support vectors of pointers, we have to treat pointers that are inserted into a vector as captures. We mostly already do so, but missed the case where getelementptr is used to produce a vector.	2023-12-05 10:09:52 +01:00
Craig Topper	b73d79fda8	[RISCV] Fix typo in comment. NFC This should say "Assume that VL output is <= 65536".	2023-12-04 14:15:49 -08:00
Nikita Popov	4275da2278	[ValueTracking] Add isGuaranteedNotToBeUndef() variant (NFC) We have a bunch of places where we have to guard against undef to avoid multi-use issues, but would be fine with poison. Use a different function for these to make it clear, and to indicate that this check can be removed once we no longer support undef. I've replaced some of the obvious cases, but there's probably more. For now, the implementation is the same as UndefOrPoison, it just has a more precise name.	2023-12-04 12:04:41 +01:00
Yingwei Zheng	741975df92	[InstCombine][InstSimplify] Pass `SimplifyQuery` to `computeKnownBits` directly. NFC. (#74246 ) This patch passes `SimplifyQuery` to `computeKnownBits` directly in `InstSimplify` and `InstCombine`. As the `DomConditionCache` in #73662 is only used in `InstCombine`, it is inconvenient to introduce a new argument `DC` to `computeKnownBits`.	2023-12-04 02:26:39 +08:00
Mircea Trofin	bb6497ffa6	[BPI] Reuse the AsmWriter's BB naming scheme in BranchProbabilityPrinterPass (#73593 ) When using `BranchProbabilityPrinterPass`, if a BB has no name, we get pretty unusable information like `edge -> has probability...` (i.e. we have no idea what the vertices of that edge are). This patch uses `printAsOperand`, which uses the same naming scheme as `Function::dump`, so for example during debugging sessions, the IR obtained from a function and the names used by `BranchProbabilityPrinterPass` will match. A shortcoming is that `printAsOperand` will result in the numbering algorithm re-running for every edge and every vertex (when `BranchProbabilityPrinterPass` is run on a function). If, for the given scenario, this is a problem, we can revisit this subsequently. Another nuance is that the entry basic block will be numbered, which may be slightly confusing when it's anonymous, but it's easily identifiable - the first edge would have it as source (and the number should be easily recognizable)	2023-12-02 13:01:48 -08:00
Nikita Popov	da86d4a8c9	[ValueTracking] Reduce duplication in haveNoCommonBitsSet() (NFC) Extract a function and call it with both operand orders, so that we don't have to explicitly commute every single pattern.	2023-12-01 14:26:15 +01:00
Nikita Popov	460faa0c87	[InstSimplify] Check common operand with constant earlier If both icmps have the same operands and the RHS is constant, we would currently go into the isImpliedCondMatchingOperands() code path, instead of the isImpliedCondCommonOperandWithConstants() path. Both are correct, but the latter can produce more accurate results if the implication is dependent on the sign.	2023-12-01 12:18:59 +01:00
Nikita Popov	cd31cf5989	[InstSimplify] Fix or disjoint miscompile with op replacement Make sure %x does not get folded to "or disjoint %x, %x" without dropping the flag, as this would be a derefinement.	2023-12-01 11:45:09 +01:00

1 2 3 4 5 ...

13000 Commits