13651 Commits

Author SHA1 Message Date
Kazu Hirata
890c4bece2
[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599)
We can stay within 8 inlined elements more than 99% of the time while
building a large application.
2024-11-01 19:52:11 -07:00
Min-Yih Hsu
64314dedeb
[InlineCost] Print inline cost for invoke call sites as well (#114476)
Previously InlineCostAnnotationPrinter only prints inline cost for call
instructions. I don't think there is any reason not to analyze invoke
and its callee, and this patch adds such support.
2024-11-01 09:55:17 -07:00
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
David Green
0f919444ad
[ValueTracking] Handle recursive phis in knownFPClass (#114008)
As a follow-on to 113686, this breaks the recursion between phi nodes
that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be
calculated from the classes of p1 and p2.
2024-11-01 13:38:29 +00:00
c8ef
cf0b6cc711
Revert "[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value." (#114496)
Reverts llvm/llvm-project#114065
2024-11-01 09:26:11 +08:00
c8ef
1f07f995cc
[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value. (#114065)
This patch adds support for constant folding for the `tgamma` and
`tgammaf` libc functions.
2024-11-01 09:07:55 +08:00
Manish Kausik H
0856592f6f
Ensure collectTransitivePredecessors returns Pred only from the Loop. (#113831)
It's possible that we encounter Irreducible control flow, due to which,
we may find that a few predecessors of BB are not a part of the CurLoop.
Currently we crash in the function for such cases. This patch ensures
that we only return Predecessors that are a part of CurLoop and
gracefully ignore other Predecessors.

For example, consider Irreducible IR of this form:
```
define i64 @baz() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb3, %bb
  br label %bb3

bb2:                                              ; No predecessors!
  br label %bb3

bb3:                                              ; preds = %bb2, %bb1
  %load = load ptr addrspace(1), ptr addrspace(1) null, align 8
  br label %bb1
}
```

This crashes when `collectTransitivePredecessors` is called on the
`%bb1<Header>, %bb3<latch>` loop, because the loop body has a
predecessor `%bb2` which is not a part of the loop.

See https://godbolt.org/z/E9fM1q3cT for the crash
2024-10-31 11:08:15 -07:00
Kenji Mouri / 毛利 研二
7e877fc0ac
[Reland][TLI] Add support for hypot libcall. (#114343)
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.

Related issue: https://github.com/llvm/llvm-project/issues/113711

Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.

Note: I had created the same PR and merged
(https://github.com/llvm/llvm-project/pull/113724), but reverted caused
by the merging issue. (The CI issue happened in 3 A.M. at my timezone.
So, I need to fall asleep again after I replied about why issue
happened.) So, I rebased to the latest main branch and recreate the PR
and hope I won't have the third time to create the same PR.

I hope @arsenm can help me review the code again. I’m sorry for that.

Kenji Mouri
2024-10-31 07:50:29 -07:00
David Green
9735c05186
[ValueTracking] Compute KnownFP state from recursive select/phi. (#113686)
Given a recursive phi with select:
 %p = phi [ 0, entry ], [ %sel, loop]
 %sel = select %c, %other, %p

The fp state can be calculated using the knowledge that the select/phi
pair can only be the initial state (0 here) or from %other. This adds a
short-cut into computeKnownFPClass for PHI to detect that the select is
recursive back to the phi, and if so use the state from the other
operand.

This helps to address a regression from #83200.
2024-10-31 07:50:44 +00:00
gulfemsavrun
36d5692570
Revert "[TLI] Add support for hypot libcall." (#114312)
Reverts llvm/llvm-project#113724
2024-10-30 15:10:29 -07:00
Kenji Mouri / 毛利 研二
feb2d867fa
[TLI] Add support for hypot libcall. (#113724)
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.

Related issue: https://github.com/llvm/llvm-project/issues/113711

Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.

Kenji Mouri
2024-10-30 10:34:32 -07:00
Fangrui Song
318bdd0aeb
[StackSafetyAnalysis] Bail out when calling ifunc
An assertion failure arises when a call instruction calls a GlobalIFunc.
Since we cannot reason about the underlying function, just bail out.

Fix #87923

Pull Request: https://github.com/llvm/llvm-project/pull/113841
2024-10-29 09:26:47 -07:00
Rohit Aggarwal
dfb60bb919
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Please refer [https://github.com/amd/aocl-libm-ose]

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-10-29 10:09:55 +00:00
Kyungwoo Lee
0dd9fdcf83
[StructuralHash] Support Differences (#112638)
This computes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided. Instead of
a single hash value, it now returns FunctionHashInfo which includes a
hash value, an instruction mapping, and a map to track the operand
location and its corresponding hash value that is ignored.

Depends on https://github.com/llvm/llvm-project/pull/112621.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-10-26 20:02:05 -07:00
Nashe Mncube
e37d736def
Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.

- Pass optimizes memcpy's by padding out destinations and sources to a
  full word to make ARM backend generate full word loads instead of
  loading a single byte (ldrb) and/or half word (ldrh). Only pads
  destination when it's a stack allocated constant size array and source
  when it's constant string. Heuristic to decide whether to pad or not
  is very basic and could be improved to allow more examples to be
  padded.
- Pass works at the midend level
2024-10-24 10:12:01 +01:00
Thomas Fransham
b8fddca7bd
[llvm] Support llvm::Any across shared libraries on windows (#108051)
This is part of the effort to support for enabling plugins on windows by
adding better support for building llvm as a DLL. The export macros used
here were added in #96630

Since shared library symbols aren't deduplicated across multiple
libraries on windows like Linux we have to manually explicitly import
and export `Any::TypeId` template instantiations for the uses of
`llvm::Any` in the LLVM codebase to support LLVM Windows shared library
builds.
This change ensures that external code, including LLVM's own tests, can
use PassManager callbacks when LLVM is built as a DLL.

I also removed the only use of llvm::Any for LoopNest that only existed
in debug code and there also doesn't seem to be any code creating
`Any<LoopNest>`
2024-10-24 08:07:13 +03:00
Ramkumar Ramachandra
d897ea37db
LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra
f719cfa868
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
2024-10-22 09:55:51 +01:00
c8ef
b90ea5caad
[ConstantFold] Fold erf and erff when the input parameter is a constant value. (#113079)
This patch adds support for constant folding for the `erf` and `erff`
libc functions.
2024-10-22 12:58:11 +08:00
Farzon Lotfi
dcbf2c2ca0
[Scalarizer][DirectX] support structs return types (#111569)
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306

LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.

## Three changes are needed. 
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.

## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`

fixes https://github.com/llvm/llvm-project/issues/111437
2024-10-21 12:51:01 -04:00
Nikita Popov
a18dd29077 [ConstantFolding] Set signed/implicitTrunc when handling GEP offsets
GEP offsets have sext_or_trunc semantics. We were already doing
this for the outer-most GEP, but not for the inner ones.

I believe one of the sanitizer buildbot failures was due to this,
but I did not manage to reproduce the issue or come up with a
test case. Usually the problematic case will already be folded
away due to index type canonicalization.
2024-10-21 12:47:02 +02:00
Fawdlstty
20bda93e43
[TLI] Add basic support for scalbnxx (#112936)
This patch adds basic support for `scalbln, scalblnf, scalblnl, scalbn,
scalbnf, scalbnl`. Constant folding support will be submitted in a
subsequent patch.

Related issue: <#112631>
2024-10-20 14:17:15 -07:00
c8ef
1336e3d0b9
[ConstantFold] Fold ilogb and ilogbf when the input parameter is a constant value. (#113014)
This patch adds support for constant folding for the `ilogb` and
`ilogbf` libc functions.
2024-10-20 10:46:35 +08:00
Teresa Johnson
5995e4b97b
[MemProf] Disable memprof ICP support by default (#112940)
A failure showed up after this was committed, rather than revert simply
disable this new support to simplify investigation and further testing.
2024-10-18 10:40:27 -07:00
Teresa Johnson
6264288d70
[MemProf] Fix the option to disable memprof ICP (#112917)
The -enable-memprof-indirect-call-support meant to guard the recently
added memprof ICP support was not used in enough places. Specifically,
it was not checked in mayHaveMemprofSummary, which is called from the
ThinLTO backend applyImports. This led to failures when checking the
callsite records, as we incorrectly expected records for indirect calls.

Fix the option to be checked in all necessary locations, and add
testing.
2024-10-18 10:12:23 -07:00
Mohammed Keyvanzadeh
721b796809
[llvm] prefer isa_and_nonnull over v && isa (#112541)
Use `isa_and_nonnull<T>(v)` instead of `v && isa<T>(v)`, where `v` is
evaluated twice in the latter.
2024-10-18 19:12:04 +03:30
Yingwei Zheng
c89d731c5d
[LVI] Infer non-zero from equality icmp (#112838)
This following pattern is common in loop headers:
```
  %101 = sub nuw i64 %78, %98
  %103 = icmp eq i64 %78, %98
  br i1 %103, label %.thread.i.i, label %.preheader.preheader.i.i

.preheader.preheader.i.i:
  %invariant.umin.i.i = call i64 @llvm.umin.i64(i64 %101, i64 9)
  %umax.i = call i64 @llvm.umax.i64(i64 %invariant.umin.i.i, i64 1)
  br label %.preheader.i.i

.preheader.i.i:
  ...
  %116 = add nuw nsw i64 %.011.i.i, 1
  %exitcond.not.i = icmp eq i64 %116, %umax.i
  br i1 %exitcond.not.i, label %.critedge.i.i, label %.preheader.i.i
```
As `%78` is not equal to `%98` in BB `.preheader.preheader.i.i`, we can
prove `%101` is non-zero. Then we can simplify the loop exit condition.

Addresses regression introduced by
https://github.com/llvm/llvm-project/pull/112742.
2024-10-18 21:19:02 +08:00
c8ef
761fa5844e
[TLI] Add support for the ilogb libcall. (#112725)
This patch adds the `ilogb` libcall. Constant folding will be handled in
subsequent patches.
2024-10-18 14:20:34 +08:00
Kazu Hirata
b47849b4cb
[SCEV] Avoid repeated hash lookups (NFC) (#112656) 2024-10-17 07:46:32 -07:00
Nashe Mncube
370fd74361
Revert "[llvm][ARM]Add widen global arrays pass" (#112701)
Reverts llvm/llvm-project#107120 

Unexpected build failures in post-commit pipelines. Needs investigation
2024-10-17 13:38:01 +01:00
Nashe Mncube
ab90d2793c
[llvm][ARM]Add widen global arrays pass (#107120)
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
2024-10-17 11:56:00 +01:00
Nikita Popov
255a99c29f
[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309)
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
2024-10-17 08:48:08 +02:00
Yingwei Zheng
aad3a1630e
[ValueTracking] Respect samesign flag in isKnownInversion (#112390)
In https://github.com/llvm/llvm-project/pull/93591 we introduced
`isKnownInversion` and assumes `X` is poison implies `Y` is poison
because they share common operands. But after introducing `samesign`
this assumption no longer hold if `X` is an icmp has `samesign` flag.

Alive2 link: https://alive2.llvm.org/ce/z/rj3EwQ (Please run it locally
with this patch and https://github.com/AliveToolkit/alive2/pull/1098).

This approach is the most conservative way in my mind to address this
problem. If `X` has `samesign` flag, it will check if `Y` also has this
flag and make sure constant RHS operands have the same sign.

Fixes https://github.com/llvm/llvm-project/issues/112350.
2024-10-17 00:27:21 +08:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Amr Hesham
4ba1800be6
[LLVM][NFC] Reduce copying of parameter in lambda (#110299)
Reduce redundant copy parameter in lambda

Fixes #95642
2024-10-16 09:55:01 +01:00
Alexey Bader
583fa4f5b7
[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088)
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.

maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}

The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.

The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.

New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}

The same changes are applied to the minnum intrinsic.
2024-10-15 22:05:16 +04:00
c8ef
47a6da2d4d
[ConstantFold] Fold log1p and log1pf when the input parameter is a constant value. (#112113)
This patch adds support for constant folding for the `log1p` and
`log1pf` libc functions.
2024-10-16 00:19:26 +08:00
Ramkumar Ramachandra
1c6c850937
InstCombine: extend select-equiv to support vectors (#111966)
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
2024-10-15 11:10:45 +01:00
Alexey Bataev
f9bc00e4bb
[SLP]Initial support for interleaved loads
Adds initial support for interleaved loads, which allows
emission of segmented loads for RISCV RVV.

Vectorizes extra code for RISCV
CFP2006/447.dealII, CFP2006/453.povray,
CFP2017rate/510.parest_r, CFP2017rate/511.povray_r,
CFP2017rate/526.blender_r, CFP2017rate/538.imagick_r, CINT2006/403.gcc,
CINT2006/473.astar, CINT2017rate/502.gcc_r, CINT2017rate/525.x264_r

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/112042
2024-10-14 09:12:33 -04:00
Ramkumar Ramachandra
bdf241cab3
ValueTracking: handle more ops in isNotCrossLaneOperation (#112183)
Reuse llvm::isTriviallyVectorizable in llvm::isNotCrossLaneOperation, in
order to get it to handle more intrinsics.

Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/XSV_GT
2024-10-14 14:08:12 +01:00
Florian Hahn
7f06d8afb0
[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824)
Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if
there is an UDiv operand that may divide by 0 or poison

PR: https://github.com/llvm/llvm-project/pull/110824
2024-10-14 13:08:49 +01:00
Ramkumar Ramachandra
c5f82f7893
ValueTracking: introduce llvm::isNotCrossLaneOperation (#112011)
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.

Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
2024-10-14 11:37:30 +01:00
Kazu Hirata
c9a1cffd3d
[Analysis] Simplify code with DenseMap::operator[] (NFC) (#112082) 2024-10-12 08:04:38 -07:00
Tim Renouf
76007138f4
[LLVM] New NoDivergenceSource function attribute (#111832)
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
2024-10-12 09:34:45 +01:00
Teresa Johnson
1de71652fd
[MemProf] Support cloning for indirect calls with ThinLTO (#110625)
This patch enables support for cloning in indirect callsites.

This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.

In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.

Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
2024-10-11 13:53:35 -07:00
Shilei Tian
e34e27f198
[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus (#111311)
Currently we will not be able to inline a large function even if it only
has one live use because the inline cost is still very high after
applying `LastCallToStaticBonus`, which is a constant. This could
significantly impact the performance because CSR spill is very
expensive.

This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to
allow targets to customize this value.

Fixes SWDEV-471398.
2024-10-11 10:19:54 -04:00
David Sherwood
72f339de45
[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928)
There are a number of places where we call getSmallConstantMaxTripCount
without passing a vector of predicates:

getSmallBestKnownTC
isIndvarOverflowCheckKnownFalse
computeMaxVF
isMoreProfitable

I've changed all of these to now pass in a predicate vector so that
we get the benefit of making better vectorisation choices when we
know the max trip count for loops that require SCEV predicate checks.

I've tried to add tests that cover all the cases affected by these
changes.
2024-10-11 10:10:15 +01:00
c8ef
923566a67d
[ConstantFold] Fold logb and logbf when the input parameter is a constant value. (#111232)
This patch adds support for constant folding for the `logb` and `logbf`
libc functions.
2024-10-10 07:56:16 +08:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Paul Walker
87cdc8328d
[LLVM][ConstFolds] Verify a scalar src before attempting scalar->vector bitcast transformation. (#111149)
It was previously safe to assume isa<Constant{Int,FP}> meant a scalar
value. This is not true when use-constant-##-for-###-splat are enabled.
2024-10-08 13:28:44 +01:00