36393 Commits

Author SHA1 Message Date
AdityaK
b42d245b77
[GVNHoist] Replace combineKnownMetadata with combineMetadataForCSE (#92197)
There is no reason to call combineMetadata directly with a list of MD_
nodes. The combineMetadataForCSE function handles all the metadata
correctly

Partially fixes: #30866
2024-05-15 07:44:34 -07:00
Jie Fu
7c8176ebd3 [Coroutines] Remove unused function (NFC)
llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:1223:1:
error: unused function 'scanPHIsAndUpdateValueMap' [-Werror,-Wunused-function]
scanPHIsAndUpdateValueMap(Instruction *Prev, BasicBlock *NewBlock,
^
1 error generated.
2024-05-15 22:08:17 +08:00
Hans
3bb39690d7
[coro] Lower llvm.coro.await.suspend.handle to resume with tail call (#89751)
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.

Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.

This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.

Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.

Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
2024-05-15 15:29:08 +02:00
Pietro Ghiglio
83d9aa2768
[VPlan] Add scalar inferencing support for addrspace cast (#92107)
Fixes https://github.com/llvm/llvm-project/issues/91434

PR: https://github.com/llvm/llvm-project/pull/92107
2024-05-15 14:03:21 +01:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Daniel Kiss
45726c1a3a
[LLVM] Make sanitizers respect the disable_santizer_instrumentation attribute. (#91732)
`disable_sanitizer_instrumetation` is attached to functions that shall
not be instrumented e.g. ifunc resolver because those run before
everything is initialised.
Some sanitizer already handles this attribute, this patch adds it to
DataFLow and Coverage too.
2024-05-15 08:40:16 +02:00
Matt Arsenault
d7bb0723fe
InstCombine: Emit ldexp intrinsic in exp2->ldexp combine (#92039)
Prefer to emit the intrinsic over a libcall in the
intrinsic or no-math-errno case.
2024-05-15 07:41:28 +02:00
Matt Arsenault
847c83f7cc
InstCombine: Process addrspacecast uses in PointerReplacer (#91953)
This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
    
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
    
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
    
Fixes #68120
2024-05-15 07:02:31 +02:00
Nikita Popov
71fbbb69d6 [IR] Move GlobalValue::getGUID() out of line (NFC)
Avoid including MD5.h in a core IR header.
2024-05-15 10:49:25 +09:00
AtariDreams
4d1ecf1923
[Transforms] Preserve inbounds attribute of transformed GEPs when flattening loops (#86961)
When flattening the loop, if the GEP was inbound, it should stay
inbound, because the only thing that changed is how the pointers are
calculated, not the elements being accessed.

Proof: https://alive2.llvm.org/ce/z/dApMpQ
2024-05-15 10:26:23 +09:00
Philip Reames
baca93fc83 [LSR] Tweak debug output to always print initial cost 2024-05-14 13:34:20 -07:00
Florian Hahn
67d840b60f
[VPlan] Relax over-aggressive assertion in VPTransformState::get().
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.

Fixes https://github.com/llvm/llvm-project/issues/91883.
2024-05-14 19:10:49 +01:00
Mingming Liu
6c8ebc0535
[NFC][CallPromotionUtils]Extract a helper function versionCallSiteWithCond from versionCallSite (#81181)
* This is to be used by https://github.com/llvm/llvm-project/pull/81378
to implement a variant of versionCallSite that compares vtables.
* The parent patch is https://github.com/llvm/llvm-project/pull/81051
2024-05-14 10:13:57 -07:00
Graham Hunter
2b15c4a62b
[AArch64] Postcommit fixes for histogram intrinsic (#92095)
A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.
2024-05-14 15:16:42 +01:00
AdityaK
bf7a0f9958
Fix incorrect codegen with respect to GEPs #85333 (#92047)
As mentioned in #68882 and
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699

Gep arithmetic isn't consistent with different types. GVNSink didn't
realize this and sank all geps as long as their operands can be wired
via PHIs
in a post-dominator.

Fixes: #85333
Reapply: #88440 after fixing the non-determinism issues in #90995
2024-05-14 06:13:11 -07:00
Florian Hahn
b1e99a699d
[LV] Drop redundant comment from createEdgeMask (NFC).
Follow-up to remove a redundant comment post-commit
https://github.com/llvm/llvm-project/pull/91897
2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra
d7ef34bfe3
[LV] update comment following 63d8058 (NFC) (#91120)
Address a review comment post landing 63d8058 (LoopVectorize: guard
appending InstsToScalarize; fix bug) to update a comment.
2024-05-14 10:59:26 +01:00
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Lei Wang
5b6f151104
[SampleFDO] Improve stale profile matching by diff algorithm (#87375)
This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.
2024-05-13 16:01:29 -07:00
Florian Hahn
e122380445
[LV] Use VPBuilder to create Select (NFCI). 2024-05-13 20:44:39 +01:00
chenlin
79643565a8
[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246)
Remove redundant debug instructions after blocks have been merged into
the predecessor, It can reduce some compile time in some cases.

This change only fixes the situation of loop unrolling, and other
situations are not considered. "RemoveRedundantDbgInstrs" seems to be
very time-consuming. Thus, we just add here after the "Dest" has been
merged into the "Fold", this may be a more targeted solution!!!

fixes: https://github.com/llvm/llvm-project/issues/89073
2024-05-13 09:42:04 -07:00
Paul Kirth
89a080cb79 [llvm][NFC] Document cl::opt MisExpectTolerance and fix typo
Pull Request: https://github.com/llvm/llvm-project/pull/90670
2024-05-13 16:19:09 +00:00
Matt Arsenault
8823abea6f InstCombine: Simplify vector initialization 2024-05-13 13:59:45 +02:00
Orlando Cazalet-Hyams
91d7ca904c
[DebugInfo] Remap extracted DIAssignIDs in hotcoldsplit (#91940)
Fix #91814

When instructions are extracted into a new function the `DIAssignID` metadata
uses and attachments need to be remapped so that the stores and assignment
markers don't link to stores and assignment markers in the original function.

This matches existing inlining behaviour for DIAssignIDs.
2024-05-13 12:49:42 +01:00
Matt Arsenault
c5b0da9d83
InstCombine: Preserve inbounds in PointerReplacer (#91735)
This avoids spurious test changes in a future commit.
2024-05-13 13:49:09 +02:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Yingwei Zheng
b5f4210e9f
[InstCombine] Drop nuw flag when CtlzOp is a sub nuw (#91776)
See the following case:
```
define i32 @src1(i32 %x) {
  %dec = sub nuw i32 -2, %x
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub nsw i32 32, %ctlz
  %shl = shl i32 1, %sub
  %ugt = icmp ult i32 %x, -2
  %sel = select i1 %ugt, i32 %shl, i32 1
  ret i32 %sel
}

define i32 @tgt1(i32 %x) {
  %dec = sub nuw i32 -2, %x
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub nsw i32 32, %ctlz
  %and = and i32 %sub, 31
  %shl = shl nuw i32 1, %and
  ret i32 %shl
}
```
`nuw` in `%dec` should be dropped after the select instruction is
eliminated.

Alive2: https://alive2.llvm.org/ce/z/7S9529

Fixes https://github.com/llvm/llvm-project/issues/91691.
2024-05-13 14:27:59 +08:00
Kazu Hirata
e6785fd752 [Scalar] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Scalar/GVNSink.cpp:270:33: error: lambda capture
  'this' is not used [-Werror,-Wunused-lambda-capture]

While I am at it, this patch replaces llvm::for_each with a
range-based for loop.
2024-05-12 23:02:37 -07:00
AdityaK
abe3c5ac19
[GVNSink] Fix non-determinisms by using a deterministic ordering (#90995)
GVNSink used to order instructions based on their pointer values and was
prone to non-determinism because of that.
This patch ensures all the values stored are using a deterministic
order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to
assert when ordering isn't preserved.

Additionally, I have added a test case (mirror graph image of an
existing test) that would have failed before this patch.

Fixes: #77852
2024-05-12 19:41:54 -07:00
David Green
b7ed097f29
[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000)
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
2024-05-12 20:31:11 +01:00
Shan Huang
cdd782183d
[DebugInfo][LICM] Fix missing debug location updates (#91729) 2024-05-11 16:26:04 +01:00
Shan Huang
3773191fc4
[DebugInfo][JumpThreading] Fix missing debug location updates (#91581) 2024-05-11 16:10:00 +01:00
Alex Bradbury
3be8e2c95d
[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant (#86111)
We have flexibility in what constant to use when combining an `ashr
exact` with a slt or ult of a constant, and it's not possible to revisit
this decision later in the compilation pipeline after the `ashr exact`
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should
be no worse than neutral, and in some cases may allow better codegen
later on for targets that can more cheaply generate power of 2 (which
may be selectable if converting back to setle/setge) or near power of 2
constants.

Alive2 proofs:
<https://alive2.llvm.org/ce/z/2BmPnq> and
<https://alive2.llvm.org/ce/z/DtuhnR>
2024-05-10 13:50:03 +01:00
Florian Hahn
28767afd53
[LAA] Support backward dependences with non-constant distance. (#91525)
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.

Update the code to use the signed minimum distance instead of a constant
distance

This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.

If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.

PR: https://github.com/llvm/llvm-project/pull/91525
2024-05-10 11:47:13 +01:00
Graham Hunter
2e8d815596
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
2024-05-10 11:22:11 +01:00
Jeffrey Byrnes
f865dbff17
[SeparateConstOffsetFromGEP] Support GEP reordering for different types (#90802)
This doesn't show up in existing lit tests, but has an impact on real
code -- especially after the canonicalization of GEPs to i8.

Alive2 tests for the inbounds handling:

Case 1: https://alive2.llvm.org/ce/z/6bfFY3
Case 2: https://alive2.llvm.org/ce/z/DkLMLF
2024-05-09 16:57:36 -07:00
Eli Friedman
f893dccbba
Replace uses of ConstantExpr::getCompare. (#91558)
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
2024-05-09 16:50:01 -07:00
Florian Hahn
c3d2af0f4e
[VPlan] VPEVLBasedIVPHI is a VPSingleDefRecipe.
VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add
VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast &
co work as expected.

Split off https://github.com/llvm/llvm-project/pull/67934.
2024-05-09 19:18:37 +01:00
Yuxuan Chen
87235fa9af
[NFC] CoroElide: Refactor Lowerer into CoroIdElider (#91539)
This patch contains no functional changes. 

The main goal of this patch is to get better clarity out of the code, to
make intentions and assumptions clear.

One major design problem I had in the past were `Lowerer`. It previously
inherited from `coro::LowererBase` but it doesn't use any of the fields
or methods from `LowererBase`. It might be an artifact leftover from
previous designs of this code.

Furthermore, we should clarify that although one such instance is bound
to the function, `Lowerer` was dedicated to one `CoroId` instruction at
a time. We rely on a sequence of fragile constructs like
`CoroBegins.clear(); DestroyAddr.clear()`. This doesn't help understand
the code.

What's worse is that we have confusing calls like
`elideHeapAllocations(CoroId->getFunction(), ...` and it might get
confused with `CoroId->getCoroutine()`.

The new structure intends to make it clear that we always operate on one
`CoroId` at a time, which may have multiple `CoroBegin`s. Such structure
doesn't rely on frequent `.clear()` that's prone to miss.
2024-05-09 08:21:40 -07:00
Shan Huang
b452b34932
[DebugInfo][IndVarSimplify] Fix missing debug location updates (#91443)
Adds debug location updates for the newly created `phi`, `add`, `icmp` and `sitofp` instructions in `IndVarSimplify`.

Fixes #91436
2024-05-09 12:58:53 +01:00
Alexey Bataev
58a94b1d0a [SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type.
Need to look through the SExt/ZExt scalars to be gathered, when trying
to reduce their width after minbitwidth analysis to prevent permanent
attempts to revectorize such gathered instructions.
2024-05-09 04:19:43 -07:00
Nikita Popov
534701d5f9 [InstCombine] Handle commuted variants in or of xor pattern
This pattern only handled commutation in the "or", while all
involved operations are commutative. Make sure we handle all
sixteen patterns.
2024-05-09 15:16:25 +09:00
Nikita Popov
0d335f78e4 [InstCombine] Handle more commuted cases in matchesSquareSum() 2024-05-09 12:35:16 +09:00
AtariDreams
409ff97aac
[InstCombine] Fix comment from #88193 (NFC) (#91427)
It is inaccurate and needs to be corrected.
2024-05-09 09:26:36 +09:00
Mircea Trofin
96568f3539
[llvm][ctx_profile] Add instrumentation lowering (#90821)
This adds the instrumentation lowering pass.

(Tracking Issue: #89287, RFC referenced there)
2024-05-08 16:49:08 -07:00
Arthur Eubanks
2fb3774321 Revert "[SLP]Fix PR91467: Look through scalar cast, when trying to cast to another type."
This reverts commit 2475efa91d8b4fa8f1a2d16052cb6d14be7d5dc6.

Causes crashes, see comments on 2475efa91d.
2024-05-08 23:01:47 +00:00
Mingming Liu
64f4ceb09e
[Inline][PGO] After inline, update InvokeInst profile counts in caller and cloned callee (#83809)
A related change is https://reviews.llvm.org/D133121, which correctly
preserves both branch weights and value profiles for invoke instruction.
* If the branch weight of the `invokeinst` specifies taken / not-taken branches, there is no scale.
2024-05-08 15:48:40 -07:00
Teresa Johnson
cec6665f2b
[MemProf] Optionally update hints on existing hot/cold new calls (#91047)
If directed by an option, update hints on calls to new that already
provide a hot/cold hint.
2024-05-08 13:41:29 -07:00
AdityaK
42d99013bd
NFC: Add a comment indicating UpdateAnalysisInformation invalidates DFS Numbering (#91252) 2024-05-08 11:03:46 -07:00