There is no reason to call combineMetadata directly with a list of MD_
nodes. The combineMetadataForCSE function handles all the metadata
correctly
Partially fixes: #30866
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.
Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.
This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.
Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.
Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
`disable_sanitizer_instrumetation` is attached to functions that shall
not be instrumented e.g. ifunc resolver because those run before
everything is initialised.
Some sanitizer already handles this attribute, this patch adds it to
DataFLow and Coverage too.
This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
Fixes#68120
When flattening the loop, if the GEP was inbound, it should stay
inbound, because the only thing that changed is how the pointers are
calculated, not the elements being accessed.
Proof: https://alive2.llvm.org/ce/z/dApMpQ
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.
Fixes https://github.com/llvm/llvm-project/issues/91883.
This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR : foo:1 bar:2 foo:4 bar:5
Profile : bar:3 foo:5 bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.
Remove redundant debug instructions after blocks have been merged into
the predecessor, It can reduce some compile time in some cases.
This change only fixes the situation of loop unrolling, and other
situations are not considered. "RemoveRedundantDbgInstrs" seems to be
very time-consuming. Thus, we just add here after the "Dest" has been
merged into the "Fold", this may be a more targeted solution!!!
fixes: https://github.com/llvm/llvm-project/issues/89073
Fix#91814
When instructions are extracted into a new function the `DIAssignID` metadata
uses and attachments need to be remapped so that the stores and assignment
markers don't link to stores and assignment markers in the original function.
This matches existing inlining behaviour for DIAssignIDs.
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788
Current interface is:
llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)
The integer type used by 'inc_amount' needs to match the type of the buckets in memory.
The intrinsic covers the following operations:
* Gather load
* histogram on the elements of 'ptrs'
* multiply the histogram results by 'inc_amount'
* add the result of the multiply to the values loaded by the gather
* scatter store the results of the add
Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
This patch fixes:
llvm/lib/Transforms/Scalar/GVNSink.cpp:270:33: error: lambda capture
'this' is not used [-Werror,-Wunused-lambda-capture]
While I am at it, this patch replaces llvm::for_each with a
range-based for loop.
GVNSink used to order instructions based on their pointer values and was
prone to non-determinism because of that.
This patch ensures all the values stored are using a deterministic
order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to
assert when ordering isn't preserved.
Additionally, I have added a test case (mirror graph image of an
existing test) that would have failed before this patch.
Fixes: #77852
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
We have flexibility in what constant to use when combining an `ashr
exact` with a slt or ult of a constant, and it's not possible to revisit
this decision later in the compilation pipeline after the `ashr exact`
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should
be no worse than neutral, and in some cases may allow better codegen
later on for targets that can more cheaply generate power of 2 (which
may be selectable if converting back to setle/setge) or near power of 2
constants.
Alive2 proofs:
<https://alive2.llvm.org/ce/z/2BmPnq> and
<https://alive2.llvm.org/ce/z/DtuhnR>
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.
Update the code to use the signed minimum distance instead of a constant
distance
This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.
If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.
PR: https://github.com/llvm/llvm-project/pull/91525
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add
VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast &
co work as expected.
Split off https://github.com/llvm/llvm-project/pull/67934.
This patch contains no functional changes.
The main goal of this patch is to get better clarity out of the code, to
make intentions and assumptions clear.
One major design problem I had in the past were `Lowerer`. It previously
inherited from `coro::LowererBase` but it doesn't use any of the fields
or methods from `LowererBase`. It might be an artifact leftover from
previous designs of this code.
Furthermore, we should clarify that although one such instance is bound
to the function, `Lowerer` was dedicated to one `CoroId` instruction at
a time. We rely on a sequence of fragile constructs like
`CoroBegins.clear(); DestroyAddr.clear()`. This doesn't help understand
the code.
What's worse is that we have confusing calls like
`elideHeapAllocations(CoroId->getFunction(), ...` and it might get
confused with `CoroId->getCoroutine()`.
The new structure intends to make it clear that we always operate on one
`CoroId` at a time, which may have multiple `CoroBegin`s. Such structure
doesn't rely on frequent `.clear()` that's prone to miss.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
A related change is https://reviews.llvm.org/D133121, which correctly
preserves both branch weights and value profiles for invoke instruction.
* If the branch weight of the `invokeinst` specifies taken / not-taken branches, there is no scale.