Add a default off option to the inline cost calculation to always inline
all viable calls regardless of the cost/benefit and cost/threshold
calculations.
For performance reasons, some users require that all calls be inlined.
Rather than forcing them to adjust the inlining threshold to an
arbitrarily high value, offer an option to inline all calls.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
Similarly to https://github.com/llvm/llvm-project/pull/131538, we can
also try and check if a predicate is known to wrap given the backedge
taken count.
For now, this just checks directly when we try to create predicated
AddRecs. This both helps to avoid spending compile-time on optimizations
where we know the predicate is false, and can also help to allow
additional vectorization (e.g. by deciding to scalarize memory accesses
when otherwise we would try to create a predicated AddRec with a
predicate that's always false).
The initial version is quite restricted, but can be extended in
follow-ups to cover more cases.
PR: https://github.com/llvm/llvm-project/pull/151134
In #149619, for the test of `@dot_follow_modulo_spec_2`, constant
folding the addition of two i32 1073741824 causes an overflow from 2^32
to -2^32=-2147483648, which triggers the UB sanitizer. This PR reapplies
the previous PR, explicitly casting the addition operand to int64_t
first before performing the addition before producing a int32 number via
`Constant *C = get(cast<IntegerType>(Ty->getScalarType()), V, isSigned)`
As part of the Root Signature Spec, we need to validate if Root
Signatures are not defining overlapping ranges.
Closes: https://github.com/llvm/llvm-project/issues/126645
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.
The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
This patch add cost kind to `getAddressComputationCost()` for #149955.
Note that this patch also remove all the default value in `getAddressComputationCost()`.
This patch refactors `exactSIVtest` and `exactRDIVtest` by consolidating
duplicated logic into a single function. Same as #152688, the main goal
is to improve code maintainability, since extra validation logic (as
written in TODO comments) may be necessary.
This patch refactors `gcdMIVtest` by consolidating duplicated logic into
a single function. The main goal of this change is to improve code
maintainability rather than readability, especially since we may need to
revise this logic for correctness (as noted in the added TODO comments).
I hope this patch is NFC, but I've also added several new assertions,
which may cause some previously passing cases to fail.
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.
The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.
This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.
No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
For the attached test:
Before the loop-idiom pass, we have a store into the inner loop which is
considered simple and one that does not have any side effects on the
loop. Post loop-idiom pass, we get a memset into the outer loop that is
considered to introduce side effects on the loop. This changes the
backedge taken count before and after the pass and hence, the crash with
verify-scev.
We try to consider non-volatile memory intrinsics as not having
side-effect for forward progress to fix the issue.
Fixes#149377
A coroutine function may be split to ramp function and resume function,
and they have different stack frames, so a pointer to stack objects may
have different addresses depending on where it is used, so it's not a
loop invariant.
It temporarily fixes https://github.com/llvm/llvm-project/issues/149604.
Unlike ptrtoint, ptrtoaddr does not capture provenance, only the address.
Note: As defined by the LangRef, we always treat `ptrtoaddr` as a
location-independent address capture since it is a direct inspection of the
pointer address.
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/152221
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.
Fixes: https://github.com/llvm/llvm-project/issues/152725.
The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.
This patch introduces new functions `findFixedSizeArrayDimensions` and
`delinearizeFixedSizeArray` to delinearize a fixed size array without
using the type information in GEP. The new function
`findFixedSizeArrayDimensions` infers the size of each dimension of the
array based on the value to be added to the address as induction
variables are incremented. `delinearizeFixedSizeArray` attempts to
restore the subscripts of each dimension based on the estimated array
size.
This is an initial implementation that may not cover all cases, but is
intended to replace the existing function in the future.
Related:
- https://discourse.llvm.org/t/enabling-loop-interchange/82589/4
-
https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Delinearization provides two values: the size of the array, and the
subscript of the access. DA checks their validity (`0 <= subscript <
size`), with some special handling. In particular, to ensure `subscript
< size`, calculate the maximum value of `subscript - size` and check if
it is negative. There was an issue in its process: when `subscript -
size` is expressed as an affine format like `init + step * i`, the value
in the last iteration (`start + step * (num_iterations - 1)`) was
assumed to be the maximum value. This assumption is incorrect in the
following cases:
- When `step` is negative
- When the AddRec wraps
This patch introduces extra checks to ensure the sign of `step` and
verify the existence of nsw/nuw flags.
Also, `isKnownNonNegative(S - smax(1, Size))` was used as a regular
check, which is incorrect when `Size` is negative. This patch also
replace it with `isKnownNonNegative(S - Size)`, although it's still
unclear whether using `isKnownNonNegative` is appropriate in the first
place.
Fix#150604
Changes: The original patch, landed as 1336675, was reverted due to a
bug in LoopVectorize resulting in a crash. The bug has now been fixed by
95c32bf ([VPlan] Return invalid cost if any skeleton block has invalid
costs), and this reland is identical to the original patch.
Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
This adjusts MemoryLocation to determine the MemoryLocation size from
the alloca size, instead of using the argument.
Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
Accordingly remove handling for size mismatch in the StackLifetime
analysis.
There is a larger problem here in that we should not be performing
arbitrary pointer replacements for assumes. This is handled for
branches, but assume goes through a different code path.
Fixes https://github.com/llvm/llvm-project/issues/151785.
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
The `nvvm_round` intrinsic should round to the nearest even number in
the case of ties. It lowers to PTX `cvt.rni`, which will "round to
nearest integer, choosing even integer if source is equidistant between
two integers", so it matches the semantics of `rint` (and not `round` as
the name suggests).
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.
Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.
Related to #145002
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.
The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.
This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.
Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.
https://alive2.llvm.org/ce/z/q7d303
PR: https://github.com/llvm/llvm-project/pull/151227
This is a follow on to
https://github.com/llvm/llvm-project/pull/115407 that introduced code
which bypasses the splat handling for scalable vectors. To maintain
existing tests I have moved the early return until after the splat
handling so all vector types are treated equally.
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.
For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.