30252 Commits

Author SHA1 Message Date
Florian Hahn
17bad1a9da
[LV] Bail out on header phis in shouldConsiderInvariant.
This fixes an infinite recursion in rare cases.

Fixes https://github.com/llvm/llvm-project/issues/113794.
2024-11-01 20:51:25 +00:00
Han-Kuan Chen
a795a18bba
[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551) 2024-11-02 03:03:52 +08:00
Simon Pilgrim
718d50d6d0 [VectorCombine] foldPermuteOfBinops - prefer the new fold for matching costs.
Minor tweak to #114101 - as we're reducing the instruction count, we should prefer the fold if the old/new costs are the same.
2024-11-01 17:28:37 +00:00
Min-Yih Hsu
64314dedeb
[InlineCost] Print inline cost for invoke call sites as well (#114476)
Previously InlineCostAnnotationPrinter only prints inline cost for call
instructions. I don't think there is any reason not to analyze invoke
and its callee, and this patch adds such support.
2024-11-01 09:55:17 -07:00
Yingwei Zheng
a77dedcacb
[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280)
Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes #114191 and #113866 (only poison-safety issue).
2024-11-01 22:56:22 +08:00
Yingwei Zheng
e577f14b67
[InstCombine] Use m_NotForbidPoison when folding (X u< Y) ? -1 : (~X + Y) --> uadd.sat(~X, Y) (#114345)
Alive2: https://alive2.llvm.org/ce/z/mTGCo-
We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some
poison elts. An alternative solution is to create a new not instead of
reusing `~X`. But it doesn't worth the effort because we need to add a
one-use check.

Fixes https://github.com/llvm/llvm-project/issues/113869.
2024-11-01 22:18:44 +08:00
David Green
0f919444ad
[ValueTracking] Handle recursive phis in knownFPClass (#114008)
As a follow-on to 113686, this breaks the recursion between phi nodes
that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be
calculated from the classes of p1 and p2.
2024-11-01 13:38:29 +00:00
Han-Kuan Chen
e4aeeba84c
[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526) 2024-11-01 21:17:57 +08:00
Nuno Lopes
344d972736 AssumeBundleBuilder: switch placeholder from undef to poison [NFC] 2024-11-01 10:12:10 +00:00
Yingwei Zheng
f16bff1261
[GVN][NewGVN][Local] Handle attributes for function calls after CSE (#114011)
This patch intersects attributes of two calls to avoid introducing UB.
It also skips incompatible call pairs in GVN/NewGVN. However, I cannot
provide negative tests for these changes.

Fixes https://github.com/llvm/llvm-project/issues/113997.
2024-11-01 12:44:33 +08:00
Lei Wang
bef3b54ea1
[InstrPGO] Avoid using global variable to fix potential data race (#114364)
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
2024-10-31 21:28:13 -07:00
Yingwei Zheng
96b14f2ccb
[Reland][InstCombine] Fix FMF propagation in foldSelectIntoOp (#114499)
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
2024-11-01 12:22:57 +08:00
c8ef
cf0b6cc711
Revert "[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value." (#114496)
Reverts llvm/llvm-project#114065
2024-11-01 09:26:11 +08:00
c8ef
1f07f995cc
[ConstantFold] Fold tgamma and tgammaf when the input parameter is a constant value. (#114065)
This patch adds support for constant folding for the `tgamma` and
`tgammaf` libc functions.
2024-11-01 09:07:55 +08:00
Ruiling, Song
54d31bde32
Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)" (#114347)
This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.
2024-11-01 08:29:59 +08:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Igor Kudrin
454abad7b0
[CFI][LowerTypeTests] Fix indirect call with alias (#113987)
This is a fixed version of #106185, which was reverted in #113978 due to
a buildbot failure.

Motivation example:
```
> cat test.cpp
extern "C" [[gnu::weak]] void f() {}
void alias() __attribute__((alias("f")));
int main() { auto p = alias; p(); }
> clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld
> ./a.out
[1]    1868 illegal hardware instruction  ./a.out
```

If the address of a function was only taken through its alias, the
function was not considered exported and therefore was not included in
the CFI jumptable. This resulted in `@llvm.type.test()` being lowered to
`false`, and consequently the indirect call to the function was
eventually optimized to `ubsantrap()`.
2024-10-31 13:29:07 -07:00
gulfemsavrun
d183dc7c24
Revert "[InstCombine] Fix FMF propagation in foldSelectIntoOp" (#114458)
Reverts llvm/llvm-project#114356 because it caused test failures.
https://lab.llvm.org/buildbot/#/builders/190/builds/8601

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-base-linux-x64/b8732549597609293617/overview
2024-10-31 13:21:52 -07:00
Matt Arsenault
9cc298108a
AtomicExpand: Copy metadata from atomicrmw to cmpxchg (#109409)
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.

The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
2024-10-31 11:54:07 -07:00
Matt Arsenault
e3222e6f80
AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408)
We need a non-atomic path if flat may access private.
2024-10-31 11:46:13 -07:00
Alexey Bataev
e05def081e
[SLP]Do not vectorize code in EH and non-returning blocks
The code in EH and non-returning blocks can be skipped by the
vectorizer, since it does not add to the perfromance, just consumes
compile/link time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112221
2024-10-31 13:50:02 -04:00
Alexey Bataev
19a34dded7
[SLP]Do not account external uses in EH block and in non-returning blocks
No need to account the cost of the external uses in EH and non-returning
basic blocks.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112045
2024-10-31 13:23:43 -04:00
Alexey Bataev
e7080fd735 [SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops
If the instruction is vectorized and it is a part of the reduced values
gather/buildvector node, it should replaced in reduced operation
instructions before removal properly, to avoid compiler crash.

Fixes #114371
2024-10-31 09:59:35 -07:00
Artem Belevich
8129b6b53b
[NVPTX, InstCombine] instcombine known pointer AS checks. (#114325)
The change improves the code in general and, as a side effect, avoids
crashing on an impossible address space casts guarded 
by `__isGlobal/__isShared`, which partially fixes 
https://github.com/llvm/llvm-project/issues/112760

It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.

This is #112964 + a small fix for the crash on unintended argument
access which was the root cause to revers the earlier version of the patch.
2024-10-31 09:24:51 -07:00
Yingwei Zheng
cf1963afad
[InstCombine] Fix FMF propagation in foldSelectIntoOp (#114356)
Closes https://github.com/llvm/llvm-project/issues/113423.
2024-10-31 23:26:45 +08:00
Matt Arsenault
1d0370872f
AMDGPU: Expand flat atomics that may access private memory (#109407)
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.

Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
2024-10-31 08:08:48 -07:00
Kenji Mouri / 毛利 研二
7e877fc0ac
[Reland][TLI] Add support for hypot libcall. (#114343)
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.

Related issue: https://github.com/llvm/llvm-project/issues/113711

Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.

Note: I had created the same PR and merged
(https://github.com/llvm/llvm-project/pull/113724), but reverted caused
by the merging issue. (The CI issue happened in 3 A.M. at my timezone.
So, I need to fall asleep again after I replied about why issue
happened.) So, I rebased to the latest main branch and recreate the PR
and hope I won't have the third time to create the same PR.

I hope @arsenm can help me review the code again. I’m sorry for that.

Kenji Mouri
2024-10-31 07:50:29 -07:00
goldsteinn
1e072ae289
[CGP] [CodeGenPrepare] Folding urem with loop invariant value plus offset (#104724)
This extends the existing fold:

```
for(i = Start; i < End; ++i)
   Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
 ->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
   Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```

To work with a non-zero `IncrLoopInvariant`.

This is a common usage in cases such as:

```
for(i = 0; i < N; ++i)
    if ((i + 1) % X) == 0)
        do_something_occasionally_but_not_first_iter();
```

Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3

Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
2024-10-31 09:14:33 -05:00
Hari Limaye
b396921d0c
[SCCP] Handle llvm.vscale intrinsic calls (#114033)
Teach SCCP to compute a constant range for calls to llvm.vscale
intrinsics.
2024-10-31 12:22:15 +00:00
Simon Pilgrim
92af82a48d
[VectorCombine] Fold "shuffle (binop (shuffle, shuffle)), undef" --> "binop (shuffle), (shuffle)" (#114101)
Add foldPermuteOfBinops - to fold a permute (single source shuffle) through a binary op that is being fed by other shuffles.

Fixes #94546
Fixes #49736
2024-10-31 10:58:09 +00:00
Dmitry Chernenkov
d924a9ba03 Revert "[InstrPGO] Support cold function coverage instrumentation (#109837)"
This reverts commit e517cfc531886bf6ed64b4e7109bb3141ac7f430.
2024-10-31 10:55:17 +00:00
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
David Green
9735c05186
[ValueTracking] Compute KnownFP state from recursive select/phi. (#113686)
Given a recursive phi with select:
 %p = phi [ 0, entry ], [ %sel, loop]
 %sel = select %c, %other, %p

The fp state can be calculated using the knowledge that the select/phi
pair can only be the initial state (0 here) or from %other. This adds a
short-cut into computeKnownFPClass for PHI to detect that the select is
recursive back to the phi, and if so use the state from the other
operand.

This helps to address a regression from #83200.
2024-10-31 07:50:44 +00:00
Paul Kirth
b01e2a8b56
[llvm] Allow always dropping all llvm.type.test sequences
Currently, the `DropTypeTests` parameter only fully works with phi nodes
and llvm.assume instructions. However, we'd like CFI to work in
conjunction with FatLTO, in so far as the bitcode section should be able
to contain the CFI instrumentation, while any incompatible bits are
dropped when compiling the object code.

To do that, we need to drop the llvm.type.test instructions everywhere,
and not just their uses in phi nodes. This patch updates the
LowerTypeTest pass so that uses are removed, and replaced with `true` in
all cases, and not just in phi nodes.

Addressing this will allow us to fix #112053 by modifying the FatLTO
pipeline.

Reviewers: pcc, nikic

Reviewed By: pcc

Pull Request: https://github.com/llvm/llvm-project/pull/112787
2024-10-30 16:56:30 -07:00
Artem Belevich
04e876e6c6
Revert "[NVPTX] instcombine known pointer AS checks." (#114319)
Reverts llvm/llvm-project#112964

Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665
2024-10-30 15:34:08 -07:00
Artem Belevich
1cecc58c3f
[NVPTX] instcombine known pointer AS checks. (#112964)
The change improves the code in general and, as a side effect, avoids crashing
on an impossible address space casts guarded by `__isGlobal/__isShared`, which
partially fixes https://github.com/llvm/llvm-project/issues/112760
It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.
2024-10-30 15:13:06 -07:00
gulfemsavrun
36d5692570
Revert "[TLI] Add support for hypot libcall." (#114312)
Reverts llvm/llvm-project#113724
2024-10-30 15:10:29 -07:00
Luke Lau
14045de250
[RISCV] Account for factor in interleave memory op costs (#111511)
Currently we cost an interleaved memory op as if it were a load/store of
the widened vector type, but this was undercosting in all cases when
compared to the measured performance of todays hardware.

On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load
is carried out as a wide load and NF LMUL shuffle ops:
https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput

All other NFs go through a slow path. On the spacemit-x60 this is
proportional to VLMAX * NF, and on the x280 proportional to the number
of segments.

This patch increases the cost by implementing a wide load + NF LMUL
shuffle op cost for the lowest common denominator NF=2, and then a
slower cost proportional to VL for the other NFs.

In a follow up patch we can add a tuning flag to use the faster cost
model for NF=3 and 4 on the spacemit-x60.

Note that the FIXME about illegal vectors seems to have been fixed in
#100436
2024-10-31 05:36:46 +08:00
Kenji Mouri / 毛利 研二
feb2d867fa
[TLI] Add support for hypot libcall. (#113724)
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.

Related issue: https://github.com/llvm/llvm-project/issues/113711

Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.

Kenji Mouri
2024-10-30 10:34:32 -07:00
Steven Perron
f405c683ba
[OPT] Search whole BB for convergence token. (#112728)
The spec for llvm.experimental.convergence.entry says that is must be in
the entry block for a function, and must preceed any other convergent
operation. It does not have to be the first instruction in the entry
block.

Inlining assumes that the call to llvm.experimental.convergence.entry
will be the first instruction after any phi instructions. This commit
modifies inlining to search the entire block for the call.
2024-10-30 11:19:23 -04:00
David Sherwood
7f498a865f
[CostModel][LoopVectorize] Move some loop vectoriser tests (#113702)
Many tests that were in test/Analysis/CostModel were actually
loop vectoriser tests. I've moved them as follows:

Analysis/CostModel/X86 -> Transforms/LoopVectorize/X86/CostModel
Analysis/CostModel/AArch64/arith-fp-frem.ll ->
  Transforms/LoopVectorize/AArch64/arith-fp-frem-costs.ll
2024-10-30 13:50:02 +00:00
Simon Pilgrim
80c8ecd565 [VectorCombine] Add baseline "shuffle (binop (shuffle, shuffle)), undef" tests for #114101 2024-10-30 13:42:58 +00:00
Simon Pilgrim
bc999ee57a [PhaseOrdering][X86] Add test coverage for #94546 2024-10-30 11:55:04 +00:00
Simon Pilgrim
2de1fc8286 [PhaseOrdering][X86] Add additional test coverage for #49736
I've kept the old PR50392 tag since this is such an old issue....
2024-10-30 11:10:48 +00:00
David Green
f358422268 [Attributor] Add nofpclass test for phi+select recurrences. NFC 2024-10-30 08:10:35 +00:00
Teresa Johnson
bb3915149a
[MemProf] Support for random hotness when writing profile (#113998)
Add support for generating random hotness in the memprof profile writer,
to be used for testing. The random seed is printed to stderr, and an
additional option enables providing a specific seed in order to
reproduce a particular random profile.
2024-10-29 22:10:33 -07:00
Matthias Braun
255e441613
X86: Do not return invalid cost for fp16 conversion (#114128)
Returning invalid instruction costs when converting from/to fp16 in
`X86TTIImpl::getCastInstrCost` when there is no hardware support
available was triggering asserts. This changes the code to return a
large (arbitrary) number to model the fact that libcalls are used to
implement the conversion.

This also simplifies the code by only reporting costs for the scalar
fp16 conversion; vectorized costs being left to the fallback assuming
scalarization.

This is a follow-up to assertion issues reported for the changes in
#113195
2024-10-29 17:16:17 -07:00
Hari Limaye
e19a5fc6d3
[FuncSpec] Improve accounting of specialization codesize growth (#113448)
Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.

This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.
2024-10-29 11:53:12 +00:00
Hari Limaye
06664fdc76
[FuncSpec] Enable SpecializeLiteralConstant by default (#113442)
Enable specialization on literal constant arguments by default in
Function Specialization.

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2024-10-29 11:41:25 +00:00
Rohit Aggarwal
dfb60bb919
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Please refer [https://github.com/amd/aocl-libm-ose]

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-10-29 10:09:55 +00:00