13575 Commits

Author SHA1 Message Date
Piotr Fusik
cc7b24a4d1
[NFC] Fix typos in comments (#109765) 2024-09-24 11:19:56 +02:00
S. Bharadwaj Yadavalli
3734fa8c72
[DXIL] Consume Metadata Analysis information in passes (#108034)
- Changed `DXILTranslateMetadata::translateMetadata()` to consume DXIL
Metadata Analysis information. Subsumed into `DXILTranslateMetedata.cpp`
the functionality in `DXILMetadata.*` files - that are hence deleted.
- Changed `DXILPrepare` pass to consume DXIL Metadata Analysis
information.
- Renamed `ModuleMetadataInfo::ShaderStage` to
`ModuleMetadataInfo::ShaderProfile` to better convey what it represents.
- Updated `unknown` target shader stage specification in triples of a
couple of tests.
- Added new tests for additional verification of `DXILTranslateMetadata`
pass functionality.
2024-09-23 19:00:20 -04:00
Mircea Trofin
783bac7ffb
[ctx_prof] Handle select and its step instrumentation (#109185)
The `step` instrumentation shouldn't be treated, during use, like an `increment`. The latter is treated as a BB ID. The step isn't that, it's more of a type of value profiling. We need to distinguish between the 2 when really looking for BB IDs (==increments), and handle appropriately `step`s. In particular, we need to know when to elide them because `select`s may get elided by function cloning, if the condition of the select is statically known.
2024-09-23 15:21:25 -07:00
Benjamin Maxwell
50a1ab12ab
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.

This can result in incorrect codegen if something like the following is
vectorized:

```
for(int i=0; i<N; i++) {
  // No aliasing between input and output pointers detected.
  sincos(cos_out[0], sin_out+i, cos_out+i);
}
```

Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
2024-09-23 16:05:55 +01:00
David Sherwood
02ee96eca9
[Analysis] Teach isDereferenceableAndAlignedInLoop about SCEV predicates (#106562)
Currently if a loop contains loads that we can prove at compile time
are dereferenceable when certain conditions are satisfied the function
isDereferenceableAndAlignedInLoop will still return false because
getSmallConstantMaxTripCount will return 0 when SCEV predicates
are required. This patch changes getSmallConstantMaxTripCount to take
an optional Predicates pointer argument so that we can permit
functions such as isDereferenceableAndAlignedInLoop to consider more
cases.
2024-09-23 09:56:37 +01:00
Nikita Popov
5a4c6f9799
[Loads] Check context instruction for context-sensitive derefability (#109277)
If a dereferenceability fact is provided through `!dereferenceable` (or
similar), it may only hold on the given control flow path. When we use
`isSafeToSpeculativelyExecute()` to check multiple instructions, we
might make use of `!dereferenceable` information that does not hold at
the speculation target. This doesn't happen when speculating
instructions one by one, because `!dereferenceable` will be dropped
while speculating.

Fix this by checking whether the instruction with `!dereferenceable`
dominates the context instruction. If this is not the case, it means we
are speculating, and cannot guarantee that it holds at the speculation
target.

Fixes https://github.com/llvm/llvm-project/issues/108854.
2024-09-23 09:13:09 +02:00
Jonathan Tanner
76bc1eddb2
[AA] Take account of C++23's stricter rules for forward declarations (NFC) (#109416)
C++23 has stricter rules for forward declarations around
std::unique_ptr, this means that the inline declaration of the
constructor was failing under clang in C++23 mode, switching to an
out-of-line definition of the constructor fixes this.

This was fairly major impact as it blocked inclusion of a lot of headers
under clang in C++23 mode.

Fixes #106597.
2024-09-20 22:56:40 +02:00
Youngsuk Kim
d31e314131 [llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20 12:19:59 -05:00
braw-lee
173841cc56
[TLI] Add basic support for fdim libcall (#108702)
first PR to fix #108695

Signed-off-by: Kushal Pal <kushalpal109@gmail.com>
2024-09-20 10:22:33 +04:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Florian Hahn
4e6ec0bf6d
[SCEV] Replace redundant !Preds.empty() check with assert. (NFCI)
If there are no predicates, the predicated counts should not be
different to the non-predicated ones.
2024-09-19 13:53:30 +01:00
Nikita Popov
dd599e92a6
[ValueTracking] Support assume in entry block without DT (#109264)
isValidAssumeForContext() handles a couple of trivial cases even if no
dominator tree is available. This adds one more for the case where there
is an assume in the entry block, and a use in some other block. The
entry block always dominates all blocks.

As having context instruction but not having DT is fairly rare, there is
not much impact. Only test change is in assume-builder.ll, where less
redundant assumes are generated. I've found having this special case is
useful for an upcoming change though.
2024-09-19 14:24:55 +02:00
Nikita Popov
dc6876fc98
[ValueTracking] Use isSafeToSpeculativelyExecuteWithVariableReplaced() in more places (#109149)
This replaces some uses of isSafeToSpeculativelyExecute() with
isSafeToSpeculativelyExecuteWithVariableReplaced(), in cases where we
are guarding against operand changes rather plain speculation.

I believe that this is NFC with the current implementation of the
function (as it only does something different from loads), but this
makes us more defensive against future generalizations.
2024-09-19 09:38:20 +02:00
Mircea Trofin
ee5709b3b4
[nfc][ctx_prof] Don't try finding callsite annotation for un-instrumentable callsites (#109184)
Reinforcing properties ensured at instrumentation time.
2024-09-18 21:13:48 -07:00
Nikita Popov
a4586bd2d4 [Loads] Extract some checks into a lambda (NFC)
This makes it easier to add additional checks.
2024-09-18 15:04:36 +02:00
Farzon Lotfi
0f97b4824a
[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776)
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
2024-09-17 11:35:42 -04:00
David Sherwood
270ee6549c
[Analysis][NFC] Clean-up in ScalarEvolution when copying predicates (#108851)
There are a few places in ScalarEvolution.cpp where we copy predicates
from one list to another and they have a similar pattern:

  for (const auto *P : ENT.Predicates)
    Predicates->push_back(P);

We can avoid the loop by writing them like this:

  Predicates->append(ENT.Predicates.begin(), ENT.Predicates.end());

which may end up being more efficient since we only have to try
reserving more space once.
2024-09-17 10:28:24 +01:00
Mircea Trofin
82266d3a2b
[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471)
Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.
2024-09-13 21:25:47 -07:00
Kazu Hirata
ab06a18b59
[IRSim] Avoid repeated hash lookups (NFC) (#108483) 2024-09-13 07:53:23 -07:00
Yingwei Zheng
2ca75df1d1
[ValueTracking] Infer is-power-of-2 from dominating conditions (#107994)
Addresses downstream rustc issue:
https://github.com/rust-lang/rust/issues/129795
2024-09-13 08:54:29 +08:00
Yingwei Zheng
ffcff4af59
[ValueTracking] Infer is-power-of-2 from assumptions. (#107745)
This patch tries to infer is-power-of-2 from assumptions. I don't see
that this kind of assumption exists in my dataset.
Related issue: https://github.com/rust-lang/rust/issues/129795

Close https://github.com/llvm/llvm-project/issues/58996.
2024-09-10 10:38:21 +08:00
Kazu Hirata
51d3829d8f
[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706)
During the ThinLTO indexing step for one of our large applications, we
create 4 million instances of FunctionSummary.

Changing:

  std::vector<EdgeTy> CallGraphEdgeList;

to:

  SmallVector<EdgeTy, 0> CallGraphEdgeList;

in FunctionSummary reduces the size of each instance by 8 bytes.  The
rest of the patch makes the same change to other places so that the
types stay compatible across function boundaries.
2024-09-07 11:21:20 -07:00
Mingming Liu
d4ddf06b0c
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).

While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.

With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
2024-09-06 16:38:17 -07:00
Mircea Trofin
6cb2d40387
[ctx_prof] Handle case when no root is in this Module. (#107463)
If none of the functions in this `Module` are roots in the contextual profile, we can't use it and should just return the `{}` case.
2024-09-06 13:44:05 -07:00
Kazu Hirata
0ffa377c6b
[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342)
During the ThinLTO indexing step for one of our large applications, we
create 7.5 million instances of GlobalValueSummary.

Changing:

  std::vector<ValueInfo> RefEdgeList;

to:

  SmallVector<ValueInfo, 0> RefEdgeList;

in GlobalValueSummary reduces the size of each instance by 8 bytes.
The rest of the patch makes the same change to other places so that
the types stay compatible across function boundaries.
2024-09-06 10:25:08 -07:00
Kazu Hirata
9528bcd532
[IRSim] Avoid repeated hash lookups (NFC) (#107510) 2024-09-06 07:39:54 -07:00
Ivan Kosarev
222d3b031f
[TBAA] Fix the case where a subobject gets accessed at a non-zero offset. (#101485) 2024-09-06 12:13:19 +01:00
Mircea Trofin
f32e5bdcef
[NFC] Rename the Nr abbreviation to Num (#107151)
It's more clear. (This isn't exhaustive).
2024-09-05 12:34:47 -07:00
Antonio Frighetto
e80f48986c [SCEV] BECount to zero if ((-C + (C smax %x)) /u %x), C > 0 holds
The SCEV expression `((-C + (C smax %x)) /u %x)` can be folded
to zero for any positive constant C.

Proof: https://alive2.llvm.org/ce/z/_dLm8C.
2024-09-05 17:01:56 +02:00
Nikita Popov
9707b98e57 [ConstantRange] Perform increment on APInt (NFC)
This handles the edge case where BitWidth is 1 and doing the
increment gets a value that's not valid in that width, while we
just want wrap-around.

Split out of https://github.com/llvm/llvm-project/pull/80309.
2024-09-05 16:11:00 +02:00
Philip Reames
3d9abfc9f8 Consolidate all IR logic for getting the identity value of a reduction [nfc]
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy.  This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.

As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor.  (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)

As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
2024-09-04 08:23:21 -07:00
Ramkumar Ramachandra
f119151537
IVDescriptors: improve readability of a function (NFC) (#106219)
Avoid dereferencing operand to llvm::isa.
2024-09-04 14:09:04 +01:00
Nikita Popov
205f7ee737 [Lint] Skip null args when checking noalias
Do not emit a warning if there are two null noalias arguments,
as they cannot be dereferenced anyway.

This is a common pattern for `@.omp_outlined`, which has some
optional noalias arguments.
2024-09-04 15:02:50 +02:00
Nikita Popov
8d4235d97e [Lint] Fix another scalable vector crash
We also need to check that the memory access LocationSize is not
scalable.
2024-09-04 13:05:09 +02:00
Nikita Popov
360f82f370 [Lint] Fix crash for insert/extract on scalable vector
Don't assume the vector is fixed size. For scalable vectors, do
not report an error, as indices outside the minimum range may be
valid.
2024-09-04 12:48:46 +02:00
Nikita Popov
29c076b859 [Lint] Fix crash with scalable alloca 2024-09-04 12:11:59 +02:00
Mircea Trofin
3209766608
[ctx_prof] Add Inlining support (#106154)
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.

Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
   - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
   - the contexts of the callee (at the inlined callsite) are moved to the caller.
   - the callee context at the inlined callsite is deleted.
2024-09-03 16:14:05 -07:00
Philip Reames
1fbb6b4efc
[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141)
Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case
where the vectorizer is emitting a non-canonical identity value given
the available flags. We use largest/smallest value during ISEL, and VP
expansion, but not during vectorization.

Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start
value, this difference is only visible when masking of inactive lanes is
required.

Primary motivation of this change is simply to remove a difference
between version of code which reason about the identity value of a
reduction so I can kill all but one off.

In review, it was pointed out that this is actually a functional fix as well. 
The old code used inf on a noinf reduction instruction - whose
result is poison!  That wasn't the intent of the code.
2024-09-03 12:21:54 -07:00
Philip Reames
0b2f2537a5 [LV] Separate AnyOf recurrence from getRecurrenceIdentity [NFC]
These recurrence types don't have a meaningful identity, and the
routine was abused to return the start value instead.  Out of the
three callers to this routine, only one actually wants this
behavior.  This is a prep change for removing the routine entirely
and commoning it with other copies of the same logic.
2024-09-03 09:46:30 -07:00
Simon Pilgrim
6c8746b6e3
[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-03 10:05:56 +01:00
David Sherwood
df3d70b5a7
[Analysis] Add getPredicatedExitCount to ScalarEvolution (#105649)
Due to a reviewer request on PR #88385 I have created this patch
to add a getPredicatedExitCount function, which is similar to
getExitCount except that it uses the predicated backedge taken
information. With PR #88385 we will start to care about more
loops with multiple exits, and want the ability to query exit
counts for a particular exiting block. Such loops may require
predicates in order to be vectorised.

New tests added here:

Analysis/ScalarEvolution/predicated-exit-count.ll
2024-09-02 14:05:26 +01:00
Nikita Popov
b9bba6ca9f
[BasicAA] Track nuw through decomposed expressions (#106512)
When we decompose the GEP offset expression, and the arithmetic is not
performed using nuw operations, we cannot retain the nuw flag on the
decomposed GEP.

For example, if we have `gep nuw p, (a-1)`, this is not at all the same
as `gep nuw (gep nuw p, a), -1`.

Fix this by tracking NUW through linear expression decomposition,
similarly to what we already do for the NSW flag.

This fixes the miscompilation reported in
https://github.com/llvm/llvm-project/pull/105496#issuecomment-2315322220.
2024-09-02 12:11:03 +02:00
Yingwei Zheng
a156b5a47d
[SLP] Add vectorization support for [u|s]cmp (#106747)
This patch adds vectorization support for [u|s]cmp intrinsic calls.
2024-09-02 17:06:07 +08:00
S. Bharadwaj Yadavalli
8aa8c0590c
[DXIL][Analysis] Collect Function properties in Metadata Analysis (#105728)
Basic infrastructure to collect Function properties in Metadata Analysis
- Add a `SmallVector` of entry properties to the metadata information.
- Add a structure to represent function properties. Currently
`numthreads` and shader kind properties of shader entry functions are
represented.
2024-08-31 17:56:06 -04:00
Artem Belevich
688a27496d
[PtrUseVisitor] Allow using Argument as a starting point (#106308)
Argument is another possible starting point for the pointer traversal,
and PtrUseVisitor should be able to handle it.
2024-08-30 11:47:34 -07:00
Philip Reames
68805de902 [IVDesc] Reuse getBinOpIdentity in getRecurrenceIdentity [nfc]
Avoid duplication so that we can easily tell these lists are in sync.
2024-08-30 09:10:34 -07:00
Simon Pilgrim
d58d105cda
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30 16:49:23 +01:00
Alex MacLean
369d8148e0
[ValueTracking] use KnownBits to compute fpclass from bitcast (#97762)
When we encounter a bitcast from an integer type we can use the
information from `KnownBits` to glean some information about the
fpclass:
- If the sign bit is known, we can transfer this information over. 
- If the float is IEEE format and enough of the bits are known, we may
  be able to prove or rule out some fpclasses such as NaN, Zero, or Inf.
2024-08-30 07:34:49 -07:00
Mircea Trofin
1991aa6b48
Reapply "[nfc][mlgo] Incrementally update DominatorTreeAnalysis in FunctionPropertiesAnalysis (#104867) (#106309)
Reverts c992690179eb5de6efe47d5c8f3a23f2302723f2.

The problem is that if there is a sequence "{delete A->B} {delete A->B}
{insert A->B}" the net result is "{delete A->B}", which is not what we
want.

Duplicate successors may happen in cases like switch statements (as
shown in the unit test).

The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore.

The fix is to sanitize the list of deletes, just like we do for inserts.
2024-08-29 18:28:09 -07:00
Thomas Preud'homme
eed135fea7 Revert "[Analysis] Guard logf128 cst folding"
This reverts commit 42d3cccffd203ff6dc967d4243588ca466c0faf7 which
caused a test failure.
2024-08-29 17:58:10 +01:00