36407 Commits

Author SHA1 Message Date
Matt Arsenault
f5c8242042
SimplifyLibCalls: Prefer to emit intrinsic in pow(2, x) -> ldexp(1, x) (#92363) 2024-05-17 14:28:03 +02:00
David Sherwood
0ad275c158
[InstCombine] Fold vector.reduce.op(vector.reverse(X)) -> vector.reduce.op(X) (#91743)
For all of the following reductions:

vector.reduce.or
vector.reduce.and
vector.reduce.xor
vector.reduce.add
vector.reduce.mul
vector.reduce.umin
vector.reduce.umax
vector.reduce.smin
vector.reduce.smax
vector.reduce.fmin
vector.reduce.fmax

if the input operand is the result of a vector.reverse then we can
perform a reduction on the vector.reverse input instead since the answer
is the same. If the reassociation is permitted we can also do the same
folds for these:

vector.reduce.fadd
vector.reduce.fmul
2024-05-17 12:58:14 +01:00
Florian Hahn
1e7d047c71
[VPlan] Mark LoopInfo preserved in native-path as well (NFC).
LoopInfo is updated during VPlan execution now, so it will also be
updated correctly in the native path.
2024-05-17 12:18:01 +01:00
Shan Huang
d0e2808f80
[DebugInfo][LoopLoadElim] Fix missing debug location updates (#91839) 2024-05-17 10:56:05 +01:00
DianQK
c79690040a
[GlobalOpt] Don't replace aliasee with alias that has weak linkage (#91483)
Fixes #91312.

Don't perform the transform if the alias may be replaced at link time.
2024-05-17 05:51:49 +08:00
Noah Goldstein
23f1047daa [InstCombine] Fold (icmp pred (trunc nuw/nsw X), C) -> (icmp pred X, (zext/sext C))
This is valid as long as the sign of the wrap flag doesn't differ from
the sign of the `pred`.

Proofs: https://alive2.llvm.org/ce/z/35NsrR

NB: The online Alive2 hasn't been updated with `trunc nuw/nsw`
support, so the proofs must be reproduced locally.

Closes #87935
2024-05-16 13:03:32 -05:00
Matt Arsenault
cdb41e416a
PlaceSafepoints: Fix using default constructed TargetLibraryInfo (#92411) 2024-05-16 17:54:26 +02:00
Jie Fu
e948da1021 [Transforms] Fix -Wunused-variable in DemoteRegToStack.cpp (NFC)
llvm-project/llvm/lib/Transforms/Utils/DemoteRegToStack.cpp:58:21:
error: unused variable 'BB' [-Werror,-Wunused-variable]
        BasicBlock *BB = SplitCriticalEdge(II, i);
                    ^
1 error generated.
2024-05-16 20:52:56 +08:00
Jie Fu
03d8e61391 [Transforms] Fix -Wsign-compare in DemoteRegToStack.cpp (NFC)
llvm-project/llvm/lib/Transforms/Utils/DemoteRegToStack.cpp:54:23:
error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
    for (int i = 0; i < CBI->getNumSuccessors(); i++) {
                    ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
2024-05-16 20:29:54 +08:00
XChy
fdaad73875
[Reg2Mem] Handle CallBr instructions (#90953)
Fixes #90900
2024-05-16 20:13:39 +08:00
Matt Arsenault
0ea178b085
SimplifyLibCalls: Emit vector ldexp intrinsics in exp2->ldexp combine (#92219)
Co-authored-by: Nikita Popov <github@npopov.com>
2024-05-16 10:24:56 +02:00
Matt Arsenault
8389177710
SimplifyLibCalls: Use IRBuilder helpers for creating intrinsics (#92288) 2024-05-16 09:20:18 +02:00
Matt Arsenault
ce1ce5d30c
InstCombine: Try to use exp10 intrinsic instead of libcall (#92287)
Addresses old TODO about the exp10 intrinsic not existing.
2024-05-16 09:09:02 +02:00
Nikita Popov
b4d1a606c7 [SeparateConstOffsetFromGEP] Check correct index for non-negativity
We were checking the index of GEP twice, instead of checking both
GEP and PtrGEP.
2024-05-16 11:59:07 +09:00
AdityaK
b42d245b77
[GVNHoist] Replace combineKnownMetadata with combineMetadataForCSE (#92197)
There is no reason to call combineMetadata directly with a list of MD_
nodes. The combineMetadataForCSE function handles all the metadata
correctly

Partially fixes: #30866
2024-05-15 07:44:34 -07:00
Jie Fu
7c8176ebd3 [Coroutines] Remove unused function (NFC)
llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:1223:1:
error: unused function 'scanPHIsAndUpdateValueMap' [-Werror,-Wunused-function]
scanPHIsAndUpdateValueMap(Instruction *Prev, BasicBlock *NewBlock,
^
1 error generated.
2024-05-15 22:08:17 +08:00
Hans
3bb39690d7
[coro] Lower llvm.coro.await.suspend.handle to resume with tail call (#89751)
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.

Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.

This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.

Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.

Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
2024-05-15 15:29:08 +02:00
Pietro Ghiglio
83d9aa2768
[VPlan] Add scalar inferencing support for addrspace cast (#92107)
Fixes https://github.com/llvm/llvm-project/issues/91434

PR: https://github.com/llvm/llvm-project/pull/92107
2024-05-15 14:03:21 +01:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Daniel Kiss
45726c1a3a
[LLVM] Make sanitizers respect the disable_santizer_instrumentation attribute. (#91732)
`disable_sanitizer_instrumetation` is attached to functions that shall
not be instrumented e.g. ifunc resolver because those run before
everything is initialised.
Some sanitizer already handles this attribute, this patch adds it to
DataFLow and Coverage too.
2024-05-15 08:40:16 +02:00
Matt Arsenault
d7bb0723fe
InstCombine: Emit ldexp intrinsic in exp2->ldexp combine (#92039)
Prefer to emit the intrinsic over a libcall in the
intrinsic or no-math-errno case.
2024-05-15 07:41:28 +02:00
Matt Arsenault
847c83f7cc
InstCombine: Process addrspacecast uses in PointerReplacer (#91953)
This was looking through an addrspacecast, and not finding a later
unfoldable cast to another address space. Fixes improperly deleting
a required alloca + memcpy and introducing an illegal addrspacecast.
    
This also required fixing some worklist management issues with
addrspacecast, and assuming that only memcpy sources could need
replacement.
    
Regresses one test function, but this looks like it optimized
before by accident. It never saw the pointer use by the call
to readonly_callee, which should require insertion of a new cast.
    
Fixes #68120
2024-05-15 07:02:31 +02:00
Nikita Popov
71fbbb69d6 [IR] Move GlobalValue::getGUID() out of line (NFC)
Avoid including MD5.h in a core IR header.
2024-05-15 10:49:25 +09:00
AtariDreams
4d1ecf1923
[Transforms] Preserve inbounds attribute of transformed GEPs when flattening loops (#86961)
When flattening the loop, if the GEP was inbound, it should stay
inbound, because the only thing that changed is how the pointers are
calculated, not the elements being accessed.

Proof: https://alive2.llvm.org/ce/z/dApMpQ
2024-05-15 10:26:23 +09:00
Philip Reames
baca93fc83 [LSR] Tweak debug output to always print initial cost 2024-05-14 13:34:20 -07:00
Florian Hahn
67d840b60f
[VPlan] Relax over-aggressive assertion in VPTransformState::get().
There are cases where a vector value has some users that demand the
the single scalar value only (NeedsScalar), while other users demand the
vector value (see attached test cases). In those cases, the NeedsScalar
users should only demand the first lane.

Fixes https://github.com/llvm/llvm-project/issues/91883.
2024-05-14 19:10:49 +01:00
Mingming Liu
6c8ebc0535
[NFC][CallPromotionUtils]Extract a helper function versionCallSiteWithCond from versionCallSite (#81181)
* This is to be used by https://github.com/llvm/llvm-project/pull/81378
to implement a variant of versionCallSite that compares vtables.
* The parent patch is https://github.com/llvm/llvm-project/pull/81051
2024-05-14 10:13:57 -07:00
Graham Hunter
2b15c4a62b
[AArch64] Postcommit fixes for histogram intrinsic (#92095)
A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.
2024-05-14 15:16:42 +01:00
AdityaK
bf7a0f9958
Fix incorrect codegen with respect to GEPs #85333 (#92047)
As mentioned in #68882 and
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699

Gep arithmetic isn't consistent with different types. GVNSink didn't
realize this and sank all geps as long as their operands can be wired
via PHIs
in a post-dominator.

Fixes: #85333
Reapply: #88440 after fixing the non-determinism issues in #90995
2024-05-14 06:13:11 -07:00
Florian Hahn
b1e99a699d
[LV] Drop redundant comment from createEdgeMask (NFC).
Follow-up to remove a redundant comment post-commit
https://github.com/llvm/llvm-project/pull/91897
2024-05-14 12:43:47 +01:00
Ramkumar Ramachandra
d7ef34bfe3
[LV] update comment following 63d8058 (NFC) (#91120)
Address a review comment post landing 63d8058 (LoopVectorize: guard
appending InstsToScalarize; fix bug) to update a comment.
2024-05-14 10:59:26 +01:00
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Lei Wang
5b6f151104
[SampleFDO] Improve stale profile matching by diff algorithm (#87375)
This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR :      foo:1  bar:2  foo:4  bar:5 
Profile :        bar:3  foo:5  bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.
2024-05-13 16:01:29 -07:00
Florian Hahn
e122380445
[LV] Use VPBuilder to create Select (NFCI). 2024-05-13 20:44:39 +01:00
chenlin
79643565a8
[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246)
Remove redundant debug instructions after blocks have been merged into
the predecessor, It can reduce some compile time in some cases.

This change only fixes the situation of loop unrolling, and other
situations are not considered. "RemoveRedundantDbgInstrs" seems to be
very time-consuming. Thus, we just add here after the "Dest" has been
merged into the "Fold", this may be a more targeted solution!!!

fixes: https://github.com/llvm/llvm-project/issues/89073
2024-05-13 09:42:04 -07:00
Paul Kirth
89a080cb79 [llvm][NFC] Document cl::opt MisExpectTolerance and fix typo
Pull Request: https://github.com/llvm/llvm-project/pull/90670
2024-05-13 16:19:09 +00:00
Matt Arsenault
8823abea6f InstCombine: Simplify vector initialization 2024-05-13 13:59:45 +02:00
Orlando Cazalet-Hyams
91d7ca904c
[DebugInfo] Remap extracted DIAssignIDs in hotcoldsplit (#91940)
Fix #91814

When instructions are extracted into a new function the `DIAssignID` metadata
uses and attachments need to be remapped so that the stores and assignment
markers don't link to stores and assignment markers in the original function.

This matches existing inlining behaviour for DIAssignIDs.
2024-05-13 12:49:42 +01:00
Matt Arsenault
c5b0da9d83
InstCombine: Preserve inbounds in PointerReplacer (#91735)
This avoids spurious test changes in a future commit.
2024-05-13 13:49:09 +02:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Yingwei Zheng
b5f4210e9f
[InstCombine] Drop nuw flag when CtlzOp is a sub nuw (#91776)
See the following case:
```
define i32 @src1(i32 %x) {
  %dec = sub nuw i32 -2, %x
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub nsw i32 32, %ctlz
  %shl = shl i32 1, %sub
  %ugt = icmp ult i32 %x, -2
  %sel = select i1 %ugt, i32 %shl, i32 1
  ret i32 %sel
}

define i32 @tgt1(i32 %x) {
  %dec = sub nuw i32 -2, %x
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub nsw i32 32, %ctlz
  %and = and i32 %sub, 31
  %shl = shl nuw i32 1, %and
  ret i32 %shl
}
```
`nuw` in `%dec` should be dropped after the select instruction is
eliminated.

Alive2: https://alive2.llvm.org/ce/z/7S9529

Fixes https://github.com/llvm/llvm-project/issues/91691.
2024-05-13 14:27:59 +08:00
Kazu Hirata
e6785fd752 [Scalar] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Scalar/GVNSink.cpp:270:33: error: lambda capture
  'this' is not used [-Werror,-Wunused-lambda-capture]

While I am at it, this patch replaces llvm::for_each with a
range-based for loop.
2024-05-12 23:02:37 -07:00
AdityaK
abe3c5ac19
[GVNSink] Fix non-determinisms by using a deterministic ordering (#90995)
GVNSink used to order instructions based on their pointer values and was
prone to non-determinism because of that.
This patch ensures all the values stored are using a deterministic
order. I have also added a verfier(`ModelledPHI::verifyModelledPHI`) to
assert when ordering isn't preserved.

Additionally, I have added a test case (mirror graph image of an
existing test) that would have failed before this patch.

Fixes: #77852
2024-05-12 19:41:54 -07:00
David Green
b7ed097f29
[VectorCombine] Add intrinsics handling to shuffleToIdentity (#91000)
This is probably the most involved addition, as it tries to make use of
isTriviallyVectorizable with isVectorIntrinsicWithScalarOpAtArg to handle a
number of different intrinsics that are all lane-wise. Additional tests have
been added for some of the different intrinsics from
isVectorIntrinsicWithScalarOpAtArg / isVectorIntrinsicWithOverloadTypeAtArg.
2024-05-12 20:31:11 +01:00
Shan Huang
cdd782183d
[DebugInfo][LICM] Fix missing debug location updates (#91729) 2024-05-11 16:26:04 +01:00
Shan Huang
3773191fc4
[DebugInfo][JumpThreading] Fix missing debug location updates (#91581) 2024-05-11 16:10:00 +01:00
Alex Bradbury
3be8e2c95d
[InstCombine] Prefer to keep power-of-2 constants when combining ashr exact and slt/ult of a constant (#86111)
We have flexibility in what constant to use when combining an `ashr
exact` with a slt or ult of a constant, and it's not possible to revisit
this decision later in the compilation pipeline after the `ashr exact`
is removed. Keeping a constant close to power-of-2 (pow2val + 1) should
be no worse than neutral, and in some cases may allow better codegen
later on for targets that can more cheaply generate power of 2 (which
may be selectable if converting back to setle/setge) or near power of 2
constants.

Alive2 proofs:
<https://alive2.llvm.org/ce/z/2BmPnq> and
<https://alive2.llvm.org/ce/z/DtuhnR>
2024-05-10 13:50:03 +01:00
Florian Hahn
28767afd53
[LAA] Support backward dependences with non-constant distance. (#91525)
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.

Update the code to use the signed minimum distance instead of a constant
distance

This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.

If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.

PR: https://github.com/llvm/llvm-project/pull/91525
2024-05-10 11:47:13 +01:00
Graham Hunter
2e8d815596
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
2024-05-10 11:22:11 +01:00