Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
This adjusts MemoryLocation to determine the MemoryLocation size from
the alloca size, instead of using the argument.
Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
Accordingly remove handling for size mismatch in the StackLifetime
analysis.
There is a larger problem here in that we should not be performing
arbitrary pointer replacements for assumes. This is handled for
branches, but assume goes through a different code path.
Fixes https://github.com/llvm/llvm-project/issues/151785.
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
The `nvvm_round` intrinsic should round to the nearest even number in
the case of ties. It lowers to PTX `cvt.rni`, which will "round to
nearest integer, choosing even integer if source is equidistant between
two integers", so it matches the semantics of `rint` (and not `round` as
the name suggests).
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.
Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.
Related to #145002
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.
The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.
This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.
Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.
https://alive2.llvm.org/ce/z/q7d303
PR: https://github.com/llvm/llvm-project/pull/151227
This is a follow on to
https://github.com/llvm/llvm-project/pull/115407 that introduced code
which bypasses the splat handling for scalable vectors. To maintain
existing tests I have moved the early return until after the splat
handling so all vector types are treated equally.
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.
For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
As specified in #53942, DA assumes base pointer invariance in its
process. Some cases were fixed by #116628. However, that PR only
addressed the parts related to AliasAnalysis, so the original issue
persists in later stages, especially when the AliasAnalysis results in
`MustAlias`.
This patch insert an explicit loop-invariant checks for the base pointer
and skips analysis when it is not loop-invariant.
Fix the cases added in #148240.
During compilation of large files with many branches, I observed that
the function `SortNonLocalDepInfoCache` in `MemoryDependenceAnalysis`
becomes a significant performance bottleneck. This is because
`Cache.size()` can be very large (around 20,000), but only a small
number of entries (approximately 5 to 8) actually need sorting. The
original implementation performs a full sort in all cases, which is
inefficient.
This patch introduces a lightweight heuristic to quickly estimate the
number of unsorted entries and choose a more efficient sorting method
accordingly.
As a result, the GVN pass runtime on a large file is reduced from
approximately 26.3 minutes to 16.5 minutes.
[ICP] Add a few tunings to indirect-call-promtion
Indirect-call promotion (ICP) has been adjusted with the following
tunings:
(1) Candidate functions can be now ICP'd even if only a declaration is
present.
(2) All non-cold candidate functions are now considered by ICP.
Previously, only hot targets were considered.
(3) If one target cannot be ICP'd, proceed with the remaining targets
instead of exiting the callsite.
This update hides all tunings under internal options and disables them
by default. They'll be enabled in a later update. There'll also be
another update to address the "not found" issue with indirect targets.
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the
second operand (Y) is an ADD. The code only simplifies the condition if
C1 < C2, so if the second ADD is NUW, it doesn't matter whether the
first operand also has the NUW flag, as it cannot wrap if C1 < C2.
https://alive2.llvm.org/ce/z/b3dM7N
PR: https://github.com/llvm/llvm-project/pull/149795
After #149310 the pointer argument of lifetime.start/lifetime.end is
guaranteed to be an alloca, so we don't need to go through
findAllocaForValue() anymore, and don't have to have special handling
for the case where it fails.
Fix a failing test for constant-folding the nvvm_round intrinsic. The
original implementation added in #141233 used a native libm call to the
"round" function, but on PPC this produces +0.0 if the input is -0.0,
which caused a test failure.
This patch updates it to use APFloat functions instead of native libm
calls to ensure cross-platform consistency.
When we rebuild the call site tries after inlining of an allocation with
MD_memprof metadata, we don't want to reapply the discarding of small
non-cold contexts (under -memprof-callsite-cold-threshold=) because we
have either no context size info (without -memprof-report-hinted-sizes
or another option that causes us to keep that as metadata), and even
with that information in the metadata, we have imperfect information at
that point as we have already discarded some contexts during matching.
The first case was even worse because we didn't guard our check by
whether the number of cold bytes was 0, leading to very aggressive
pruning during post-inline metadata rebuilding without the context size
information.
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for `llvm-ir2vec` tool (See #149214).
(Tracking issue - #141817)
DependenceAnalysis checks whether the given addresses are divisible by
the element size of corresponding load/store instructions. However, this
check was only executed when the two instructions (Src and Dst) are
different. We must also perform the same check when Src and Dst are the
same instruction.
Fix the test added in #147715.
There are cases where InstCombine / InstSimplify might sink extractvalue
instructions that use a deinterleave intrinsic into successor blocks,
which prevents InterleavedAccess from kicking in because the current
pattern requires deinterleave intrinsic to be used by extractvalue.
However, this requirement is bit too strict while we could have just
replaced the users of deinterleave intrinsic with whatever generated by
the target TLI hooks.
Fold trig functions call of poison to poison.
This includes sin, cos, asin, acos, atan, atan2, sinh, cosh, sincos,
sincospi.
Test cases are fixed and also added to
llvm/test/Transforms/InstSimplify/fold-intrinsics.ll just like in
https://github.com/llvm/llvm-project/pull/146750
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.