llvm-project

Author	SHA1	Message	Date
Piotr Fusik	cc7b24a4d1	[NFC] Fix typos in comments (#109765 )	2024-09-24 11:19:56 +02:00
S. Bharadwaj Yadavalli	3734fa8c72	[DXIL] Consume Metadata Analysis information in passes (#108034 ) - Changed `DXILTranslateMetadata::translateMetadata()` to consume DXIL Metadata Analysis information. Subsumed into `DXILTranslateMetedata.cpp` the functionality in `DXILMetadata.*` files - that are hence deleted. - Changed `DXILPrepare` pass to consume DXIL Metadata Analysis information. - Renamed `ModuleMetadataInfo::ShaderStage` to `ModuleMetadataInfo::ShaderProfile` to better convey what it represents. - Updated `unknown` target shader stage specification in triples of a couple of tests. - Added new tests for additional verification of `DXILTranslateMetadata` pass functionality.	2024-09-23 19:00:20 -04:00
Mircea Trofin	783bac7ffb	[ctx_prof] Handle `select` and its `step` instrumentation (#109185 ) The `step` instrumentation shouldn't be treated, during use, like an `increment`. The latter is treated as a BB ID. The step isn't that, it's more of a type of value profiling. We need to distinguish between the 2 when really looking for BB IDs (==increments), and handle appropriately `step`s. In particular, we need to know when to elide them because `select`s may get elided by function cloning, if the condition of the select is statically known.	2024-09-23 15:21:25 -07:00
Benjamin Maxwell	50a1ab12ab	[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980 ) LoopAccessAnalysis currently does not check/track aliasing from the output pointers, but assumes vectorizing library calls with a mapping is safe. This can result in incorrect codegen if something like the following is vectorized: ``` for(int i=0; i<N; i++) { // No aliasing between input and output pointers detected. sincos(cos_out[0], sin_out+i, cos_out+i); } ``` Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the original value of `cos_out[0]` not the updated value.	2024-09-23 16:05:55 +01:00
David Sherwood	02ee96eca9	[Analysis] Teach isDereferenceableAndAlignedInLoop about SCEV predicates (#106562 ) Currently if a loop contains loads that we can prove at compile time are dereferenceable when certain conditions are satisfied the function isDereferenceableAndAlignedInLoop will still return false because getSmallConstantMaxTripCount will return 0 when SCEV predicates are required. This patch changes getSmallConstantMaxTripCount to take an optional Predicates pointer argument so that we can permit functions such as isDereferenceableAndAlignedInLoop to consider more cases.	2024-09-23 09:56:37 +01:00
Nikita Popov	5a4c6f9799	[Loads] Check context instruction for context-sensitive derefability (#109277 ) If a dereferenceability fact is provided through `!dereferenceable` (or similar), it may only hold on the given control flow path. When we use `isSafeToSpeculativelyExecute()` to check multiple instructions, we might make use of `!dereferenceable` information that does not hold at the speculation target. This doesn't happen when speculating instructions one by one, because `!dereferenceable` will be dropped while speculating. Fix this by checking whether the instruction with `!dereferenceable` dominates the context instruction. If this is not the case, it means we are speculating, and cannot guarantee that it holds at the speculation target. Fixes https://github.com/llvm/llvm-project/issues/108854.	2024-09-23 09:13:09 +02:00
Jonathan Tanner	76bc1eddb2	[AA] Take account of C++23's stricter rules for forward declarations (NFC) (#109416 ) C++23 has stricter rules for forward declarations around std::unique_ptr, this means that the inline declaration of the constructor was failing under clang in C++23 mode, switching to an out-of-line definition of the constructor fixes this. This was fairly major impact as it blocked inclusion of a lot of headers under clang in C++23 mode. Fixes #106597.	2024-09-20 22:56:40 +02:00
Youngsuk Kim	d31e314131	[llvm] Don't call raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )	2024-09-20 12:19:59 -05:00
braw-lee	173841cc56	[TLI] Add basic support for fdim libcall (#108702 ) first PR to fix #108695 Signed-off-by: Kushal Pal <kushalpal109@gmail.com>	2024-09-20 10:22:33 +04:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Florian Hahn	4e6ec0bf6d	[SCEV] Replace redundant !Preds.empty() check with assert. (NFCI) If there are no predicates, the predicated counts should not be different to the non-predicated ones.	2024-09-19 13:53:30 +01:00
Nikita Popov	dd599e92a6	[ValueTracking] Support assume in entry block without DT (#109264 ) isValidAssumeForContext() handles a couple of trivial cases even if no dominator tree is available. This adds one more for the case where there is an assume in the entry block, and a use in some other block. The entry block always dominates all blocks. As having context instruction but not having DT is fairly rare, there is not much impact. Only test change is in assume-builder.ll, where less redundant assumes are generated. I've found having this special case is useful for an upcoming change though.	2024-09-19 14:24:55 +02:00
Nikita Popov	dc6876fc98	[ValueTracking] Use isSafeToSpeculativelyExecuteWithVariableReplaced() in more places (#109149 ) This replaces some uses of isSafeToSpeculativelyExecute() with isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we are guarding against operand changes rather plain speculation. I believe that this is NFC with the current implementation of the function (as it only does something different from loads), but this makes us more defensive against future generalizations.	2024-09-19 09:38:20 +02:00
Mircea Trofin	ee5709b3b4	[nfc][ctx_prof] Don't try finding callsite annotation for un-instrumentable callsites (#109184 ) Reinforcing properties ensured at instrumentation time.	2024-09-18 21:13:48 -07:00
Nikita Popov	a4586bd2d4	[Loads] Extract some checks into a lambda (NFC) This makes it easier to add additional checks.	2024-09-18 15:04:36 +02:00
Farzon Lotfi	0f97b4824a	[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776 ) Since we are using the Scalarizer pass in the backend we needed a way to allow this pass to operate on Target intrinsics. We achieved this by adding `TargetTransformInfo ` to the Scalarizer pass. This allowed us to call a function available to the DirectX backend to know if an intrinsic is a target intrinsic that should be scalarized.	2024-09-17 11:35:42 -04:00
David Sherwood	270ee6549c	[Analysis][NFC] Clean-up in ScalarEvolution when copying predicates (#108851 ) There are a few places in ScalarEvolution.cpp where we copy predicates from one list to another and they have a similar pattern: for (const auto *P : ENT.Predicates) Predicates->push_back(P); We can avoid the loop by writing them like this: Predicates->append(ENT.Predicates.begin(), ENT.Predicates.end()); which may end up being more efficient since we only have to try reserving more space once.	2024-09-17 10:28:24 +01:00
Mircea Trofin	82266d3a2b	[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471 ) Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.	2024-09-13 21:25:47 -07:00
Kazu Hirata	ab06a18b59	[IRSim] Avoid repeated hash lookups (NFC) (#108483 )	2024-09-13 07:53:23 -07:00
Yingwei Zheng	2ca75df1d1	[ValueTracking] Infer is-power-of-2 from dominating conditions (#107994 ) Addresses downstream rustc issue: https://github.com/rust-lang/rust/issues/129795	2024-09-13 08:54:29 +08:00
Yingwei Zheng	ffcff4af59	[ValueTracking] Infer is-power-of-2 from assumptions. (#107745 ) This patch tries to infer is-power-of-2 from assumptions. I don't see that this kind of assumption exists in my dataset. Related issue: https://github.com/rust-lang/rust/issues/129795 Close https://github.com/llvm/llvm-project/issues/58996.	2024-09-10 10:38:21 +08:00
Kazu Hirata	51d3829d8f	[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706 ) During the ThinLTO indexing step for one of our large applications, we create 4 million instances of FunctionSummary. Changing: std::vector<EdgeTy> CallGraphEdgeList; to: SmallVector<EdgeTy, 0> CallGraphEdgeList; in FunctionSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.	2024-09-07 11:21:20 -07:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Mircea Trofin	6cb2d40387	[ctx_prof] Handle case when no root is in this Module. (#107463 ) If none of the functions in this `Module` are roots in the contextual profile, we can't use it and should just return the `{}` case.	2024-09-06 13:44:05 -07:00
Kazu Hirata	0ffa377c6b	[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342 ) During the ThinLTO indexing step for one of our large applications, we create 7.5 million instances of GlobalValueSummary. Changing: std::vector<ValueInfo> RefEdgeList; to: SmallVector<ValueInfo, 0> RefEdgeList; in GlobalValueSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.	2024-09-06 10:25:08 -07:00
Kazu Hirata	9528bcd532	[IRSim] Avoid repeated hash lookups (NFC) (#107510 )	2024-09-06 07:39:54 -07:00
Ivan Kosarev	222d3b031f	[TBAA] Fix the case where a subobject gets accessed at a non-zero offset. (#101485 )	2024-09-06 12:13:19 +01:00
Mircea Trofin	f32e5bdcef	[NFC] Rename the `Nr` abbreviation to `Num` (#107151 ) It's more clear. (This isn't exhaustive).	2024-09-05 12:34:47 -07:00
Antonio Frighetto	e80f48986c	[SCEV] BECount to zero if `((-C + (C smax %x)) /u %x), C > 0` holds The SCEV expression `((-C + (C smax %x)) /u %x)` can be folded to zero for any positive constant C. Proof: https://alive2.llvm.org/ce/z/_dLm8C.	2024-09-05 17:01:56 +02:00
Nikita Popov	9707b98e57	[ConstantRange] Perform increment on APInt (NFC) This handles the edge case where BitWidth is 1 and doing the increment gets a value that's not valid in that width, while we just want wrap-around. Split out of https://github.com/llvm/llvm-project/pull/80309.	2024-09-05 16:11:00 +02:00
Philip Reames	3d9abfc9f8	Consolidate all IR logic for getting the identity value of a reduction [nfc] This change merges the three different places (at the IR layer) for finding the identity value of a reduction into a single copy. This depends on several prior commits which fix ommissions and bugs in the distinct copies, but this patch itself should be fully non-functional. As the new comments and naming try to make clear, the identity value is a property of the @llvm.vector.reduce.* intrinsic, not of e.g. the recurrence descriptor. (We still provide an interface for clients using recurrence descriptors, but the implementation simply translates to the intrinsic which each corresponds to.) As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum or fminimum/fmaximum which is why we still need manual logic (but at least only one copy of manual logic) for those cases.	2024-09-04 08:23:21 -07:00
Ramkumar Ramachandra	f119151537	IVDescriptors: improve readability of a function (NFC) (#106219 ) Avoid dereferencing operand to llvm::isa.	2024-09-04 14:09:04 +01:00
Nikita Popov	205f7ee737	[Lint] Skip null args when checking noalias Do not emit a warning if there are two null noalias arguments, as they cannot be dereferenced anyway. This is a common pattern for `@.omp_outlined`, which has some optional noalias arguments.	2024-09-04 15:02:50 +02:00
Nikita Popov	8d4235d97e	[Lint] Fix another scalable vector crash We also need to check that the memory access LocationSize is not scalable.	2024-09-04 13:05:09 +02:00
Nikita Popov	360f82f370	[Lint] Fix crash for insert/extract on scalable vector Don't assume the vector is fixed size. For scalable vectors, do not report an error, as indices outside the minimum range may be valid.	2024-09-04 12:48:46 +02:00
Nikita Popov	29c076b859	[Lint] Fix crash with scalable alloca	2024-09-04 12:11:59 +02:00
Mircea Trofin	3209766608	[ctx_prof] Add Inlining support (#106154 ) Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant. Post-inlining, the update mainly consists of: - making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions) - in the contextual profile: - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual. - the contexts of the callee (at the inlined callsite) are moved to the caller. - the callee context at the inlined callsite is deleted.	2024-09-03 16:14:05 -07:00
Philip Reames	1fbb6b4efc	[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141 ) Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.	2024-09-03 12:21:54 -07:00
Philip Reames	0b2f2537a5	[LV] Separate AnyOf recurrence from getRecurrenceIdentity [NFC] These recurrence types don't have a meaningful identity, and the routine was abused to return the start value instead. Out of the three callers to this routine, only one actually wants this behavior. This is a prep change for removing the routine entirely and commoning it with other copies of the same logic.	2024-09-03 09:46:30 -07:00
Simon Pilgrim	6c8746b6e3	[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844 ) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents	2024-09-03 10:05:56 +01:00
David Sherwood	df3d70b5a7	[Analysis] Add getPredicatedExitCount to ScalarEvolution (#105649 ) Due to a reviewer request on PR #88385 I have created this patch to add a getPredicatedExitCount function, which is similar to getExitCount except that it uses the predicated backedge taken information. With PR #88385 we will start to care about more loops with multiple exits, and want the ability to query exit counts for a particular exiting block. Such loops may require predicates in order to be vectorised. New tests added here: Analysis/ScalarEvolution/predicated-exit-count.ll	2024-09-02 14:05:26 +01:00
Nikita Popov	b9bba6ca9f	[BasicAA] Track nuw through decomposed expressions (#106512 ) When we decompose the GEP offset expression, and the arithmetic is not performed using nuw operations, we cannot retain the nuw flag on the decomposed GEP. For example, if we have `gep nuw p, (a-1)`, this is not at all the same as `gep nuw (gep nuw p, a), -1`. Fix this by tracking NUW through linear expression decomposition, similarly to what we already do for the NSW flag. This fixes the miscompilation reported in https://github.com/llvm/llvm-project/pull/105496#issuecomment-2315322220.	2024-09-02 12:11:03 +02:00
Yingwei Zheng	a156b5a47d	[SLP] Add vectorization support for [u\|s]cmp (#106747 ) This patch adds vectorization support for [u\|s]cmp intrinsic calls.	2024-09-02 17:06:07 +08:00
S. Bharadwaj Yadavalli	8aa8c0590c	[DXIL][Analysis] Collect Function properties in Metadata Analysis (#105728 ) Basic infrastructure to collect Function properties in Metadata Analysis - Add a `SmallVector` of entry properties to the metadata information. - Add a structure to represent function properties. Currently `numthreads` and shader kind properties of shader entry functions are represented.	2024-08-31 17:56:06 -04:00
Artem Belevich	688a27496d	[PtrUseVisitor] Allow using Argument as a starting point (#106308 ) Argument is another possible starting point for the pointer traversal, and PtrUseVisitor should be able to handle it.	2024-08-30 11:47:34 -07:00
Philip Reames	68805de902	[IVDesc] Reuse getBinOpIdentity in getRecurrenceIdentity [nfc] Avoid duplication so that we can easily tell these lists are in sync.	2024-08-30 09:10:34 -07:00
Simon Pilgrim	d58d105cda	[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584 ) Show fallback cases in amdlibm tests where it doesn't have that specific op	2024-08-30 16:49:23 +01:00
Alex MacLean	369d8148e0	[ValueTracking] use KnownBits to compute fpclass from bitcast (#97762 ) When we encounter a bitcast from an integer type we can use the information from `KnownBits` to glean some information about the fpclass: - If the sign bit is known, we can transfer this information over. - If the float is IEEE format and enough of the bits are known, we may be able to prove or rule out some fpclasses such as NaN, Zero, or Inf.	2024-08-30 07:34:49 -07:00
Mircea Trofin	1991aa6b48	Reapply "[nfc][mlgo] Incrementally update DominatorTreeAnalysis in FunctionPropertiesAnalysis (#104867 ) (#106309 ) Reverts c992690179eb5de6efe47d5c8f3a23f2302723f2. The problem is that if there is a sequence "{delete A->B} {delete A->B} {insert A->B}" the net result is "{delete A->B}", which is not what we want. Duplicate successors may happen in cases like switch statements (as shown in the unit test). The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore. The fix is to sanitize the list of deletes, just like we do for inserts.	2024-08-29 18:28:09 -07:00
Thomas Preud'homme	eed135fea7	Revert "[Analysis] Guard logf128 cst folding" This reverts commit 42d3cccffd203ff6dc967d4243588ca466c0faf7 which caused a test failure.	2024-08-29 17:58:10 +01:00

1 2 3 4 5 ...

13575 Commits