26330 Commits

Author SHA1 Message Date
Aleksandr Popov
bca5501869 [IRCE] Add NSW flag to main loop's indvar base
We have guarantees that induction variable will not overflow in the main
loop after the loop constrained. Therefore we can add no wrap flags on
its base in order not to miss info that loop is countable.

Add NSW flag now, since adding NUW flag requires a bit more complicated
analysis.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D154954
2023-07-17 01:03:52 +02:00
Nuno Lopes
68f1391a62 [ScalarizeMaskedMemIntrin] Use poison instead of undef as placeholder [NFC]
This is used for masked out lanes, that are replaced with the passthrough value
2023-07-17 10:11:14 +01:00
ManuelJBrito
ace9b6bbf5 [NewGVN] Canonicalize expressions for commutative intrinsics
Ensure that commutative intrinsics that only differ by a permutation
of their operands get the same value number by sorting the operand value
numbers.

Fixes https://github.com/llvm/llvm-project/issues/46753

Differential Revision: https://reviews.llvm.org/D155309
2023-07-16 17:24:17 +01:00
Maksim Kita
da822ce90e [InstCombine] Generalise ((x1 ^ y1) | (x2 ^ y2)) == 0 transform
Generalise ((x1 ^ y1) | (x2 ^ y2)) == 0 transform to more than two pairs of variables https://github.com/llvm/llvm-project/issues/57831.
Depends D154384.

Reviewed By: goldstein.w.n, nikic

Differential Revision: https://reviews.llvm.org/D154306
2023-07-15 16:57:16 -05:00
Maksim Kita
39f0afde98 [InstCombine] Generalise ((x1 ^ y1) | (x2 ^ y2)) == 0 transform tests
Precommit tests for D154306.

Differential Revision: https://reviews.llvm.org/D154384
2023-07-15 16:57:16 -05:00
zhongyunde
4d2723bd00 [ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo
This patch is separated from D154953 to see what tests are affected by this
change alone according comment.
Depend on the related updating of LangRef on D155193.

Reviewed By: paulwalker-arm, nikic, david-arm
Differential Revision: https://reviews.llvm.org/D155350
2023-07-15 19:42:58 +08:00
zhongyunde
a41e7a2a5d [tests] precommit tests for D155350
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D155363
2023-07-15 19:36:37 +08:00
khei4
b02d349cbf Revert "Revert "[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas""
This reverts commit 36a6eb7d12a9f827bf3d5d4e5fdc68b8a62807b2.

[MemCpyOpt] check that load/store and dest/src alloca are all in the same bb

Differential Revision: https://reviews.llvm.org/D153453
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-07-15 16:27:38 +09:00
khei4
a92e197114 [MemCpyOpt] precommit tests to add multi-BB stack-move optimization to check crash for D153453 (NFC)
Differential Revision: https://reviews.llvm.org/D155179
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-07-15 16:27:38 +09:00
Zhongyunde
7203286329 [LangRef] vscale_range implies the vscale is power-of-two
According the discuss on D154953, we need to make the LangRef change
before the optimization relied on the new behaviour:
      vscale_range implies vscale is a power-of-two value, parse of the
  attribute to reject values that are not a power-of-two.

Thanks nikic for the wonderful summary of discussing on D154953:
To provide a bit more context here. We would like to have power of two vscale exposed in a target-independent way, so we can make use of this in places like ValueTracking, just like we currently do the vscale range. Some options that have been discussed are:
  - Remove support for non-power-of-two vscales entirely. (This is my personal preference, but this is hard to undo if it turns out someone does need them.)
  - Add an extra attribute vscale_pow2, or a data layout property.
  - Make vscale_range imply power-of-two vscale, as a compromise solution (what this patch does). This would be relatively easy to turn into one of the two above at a later point.

Reviewed By: paulwalker-arm, nikic, efriedma
Differential Revision: https://reviews.llvm.org/D155193
2023-07-15 09:13:48 +08:00
Johannes Doerfert
232ce90541 [OpenMP][FIX] Adjust "known" attributes for runtime functions
This showed up when we started to deduce readnone for the argument of
__kmpc_global_thread_num. The known attributes for "getters" did not
allow to read arguments, but that is sometimes the case.
2023-07-14 17:01:48 -07:00
Johannes Doerfert
55544518c6 [Attributor] Allow IR-attr deduction for non-IPO amendable functions
If the function is non-IPO amendable we do skip most attributes/AAs.
However, if an AA has a isImpliedByIR that can deduce the attribute from
other attributes, we can run those. For now, we manually enable them,
if we have more later we can use some automation/flag.
2023-07-14 13:54:04 -07:00
Johannes Doerfert
4dc5662c27 [Attributor][NFC] Update all tests with the script
Three tests needed manual adjustment after
https://reviews.llvm.org/D148216 got reverted. See
https://github.com/llvm/llvm-project/issues/63746.
2023-07-14 13:53:38 -07:00
Anna Thomas
dfaf4587e4 Precommit follow-up testcase for interleaved miscompile
Follow-up testcase for PR63602.

Suggested by Ayal in D154309, more complete fix coming up which should
handle this testcase as well.
2023-07-14 16:04:56 -04:00
Nikita Popov
2bc7d02312 Revert "[InstSimplify] Make simplifyWithOpReplaced() recursive (PR63104)"
This is very likely the cause of a stage 2 failure in
Transforms/LoopVectorize/check-prof-info.ll. Revert until I can
investigate this.

This reverts commit 3d199d086e076f0b9b90d4c59f2226a417a639b5.
2023-07-14 18:33:39 +02:00
Nikita Popov
3d199d086e [InstSimplify] Make simplifyWithOpReplaced() recursive (PR63104)
Support replacement of operands not only in the immediate
instruction, but also instructions it uses.

To the most part, this extension is straightforward, but there are
two bits worth highlighting:

First, we can now no longer assume that if the Op is a vector, the
instruction also returns a vector. If Op is a vector and the
instruction returns a scalar, we should consider it as a cross-lane
operation.

Second, for the x ^ x special case, we can no longer assume that
the operand is RepOp, as we might have a replacement higher up the
instruction chain.

There is one optimization regression, but it is in a fuzzer-generated
test case.

Fixes https://github.com/llvm/llvm-project/issues/63104.
2023-07-14 16:33:40 +02:00
Jay Foad
70eafa391b [InstCombine] Regenerate AMDGPU test checks 2023-07-14 15:28:55 +01:00
Alexey Bataev
8ab962e411 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 07:19:49 -07:00
Nikita Popov
547544112b [InstSimplify] Allow gep inbounds x, 0 -> x in non-refining op replacement
After the semantics change from https://reviews.llvm.org/D154051,
gep inbounds x, 0 can no longer produce poison. As such, we can
also perform this fold during non-refining operand replacement
and avoid unnecessary drops of the inbounds flag.

The online alive2 version has not been update to the new
semantics yet, but we can use the following proof locally:

    define ptr @src(ptr %base, i64 %offset) {
      %cmp = icmp eq i64 %offset, 0
      %gep = getelementptr inbounds i8, ptr %base, i64 %offset
      %sel = select i1 %cmp, ptr %base, ptr %gep
      ret ptr %sel
    }

    define ptr @tgt(ptr %base, i64 %offset) {
      %gep = getelementptr inbounds i8, ptr %base, i64 %offset
      ret ptr %gep
    }
2023-07-14 16:14:50 +02:00
Nikita Popov
91b84811ab [InstSimplify] Add tests for recursive simplify with op replaced (NFC) 2023-07-14 16:06:34 +02:00
Alexey Bataev
bc8abb42bb Revert "[SLP]Relax assertion to check if the input scalars were extended to"
This reverts commit 6fdfc81287ecdc2a7f409d08538ec6ce2bd698da to fix the
check in the assert )need to use end, nod begin function).
2023-07-14 07:04:06 -07:00
Alexey Bataev
6fdfc81287 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 06:48:25 -07:00
Nikita Popov
21827268ad [InstCombine] Fold add of zext and sext of i1
(zext a) + (sext a) is 0 if a is a bool.

The regression is in a fuzzer-generated test.

Proof: https://alive2.llvm.org/ce/z/KotnN6
2023-07-14 14:52:13 +02:00
Nikita Popov
893ad30d11 [InstCombine] Add test for add of zext and sext (NFC) 2023-07-14 14:52:13 +02:00
Nikita Popov
dc2b2ae7dc [InstCombine] Fold cttz of lowest set bit
cttz(-a & a) is the same as cttz(a). -a & a is an idiom to extract
the lowest set bit, which naturally does not affect the number of
trailing zeroes.

Proof: https://alive2.llvm.org/ce/z/Yp26x7
2023-07-14 14:31:35 +02:00
Nikita Popov
c8bc1abf55 [InstCombine] Add tests for cttz of lowest set bit (NFC) 2023-07-14 14:31:35 +02:00
Jay Foad
9ff71814cb [EarlyCSE] Do not CSE convergent calls with memory effects
D149348 did this for readnone calls, which are handled by SimpleValue.
This patch does the same for all other CSEable calls, which are handled
by CallValue.

Differential Revision: https://reviews.llvm.org/D153151
2023-07-14 11:43:41 +01:00
Jay Foad
c2f8fe7cd8 [EarlyCSE] Precommit test for D153151
Differential Revision: https://reviews.llvm.org/D155210
2023-07-14 11:43:41 +01:00
Nikita Popov
cd1dcd2c95 [InstCombine] Handle const select arm in foldSelectCtlzToCttz()
The select arm that takes the ctlz result can also instead be a
constant with the bit width (as this is what the ctlz evaluates to
for a==0).

This avoids a regression when strengthening the
simplifyWithOpReplaced() fold.

Proof: https://alive2.llvm.org/ce/z/DMRL5A
2023-07-14 12:00:39 +02:00
Nikita Popov
701a8b348e [InstCombine] Add test for ctlz->cttz fold with constant in select (NFC) 2023-07-14 11:52:14 +02:00
Noah Goldstein
ddd18d02c7 [InstCombine] Transform icmp eq/ne ({su}div exact X,Y),C -> icmp eq/ne X, Y*C
We can do this if `Y*C` doesn't overflow. This is trivial if `C` is
0/1. Otherwise we actually generate a `mul` instruction iff the `div`
has one use.

Alive2 Links:
    udiv: https://alive2.llvm.org/ce/z/GWPW67
    sdiv: https://alive2.llvm.org/ce/z/bUoX9h

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150091
2023-07-13 19:36:59 -05:00
Noah Goldstein
fd691fce59 [InstCombine] Add tests for icmp eq/ne ({su}div exact X, Y), C; NFC
Differential Revision: https://reviews.llvm.org/D150090
2023-07-13 19:36:59 -05:00
Alexey Bataev
ec6b40ab9b [SLP]Add a test with the stores with long distances between them, NFC. 2023-07-13 15:14:09 -07:00
Nikita Popov
ddb46abd3c [LSR] Don't consider users of constant outside loop
In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.

This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.

The same reasoning also applies to other types of constants, such
as global variable references.

Differential Revision: https://reviews.llvm.org/D155073
2023-07-13 12:22:38 +02:00
Nikita Popov
e8a5df7beb [LSR] Add test variant with global variables (NFC)
A variant of the test using globals instead of inttoptr expressions
for D155073.
2023-07-13 12:12:48 +02:00
khei4
36a6eb7d12 Revert "[MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas"
This reverts commit 96ae0851c26237378fa1280b0a9ad713e1b72bdb.
2023-07-13 18:04:49 +09:00
khei4
96ae0851c2 [MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas
Differential Revision: https://reviews.llvm.org/D153453
2023-07-13 14:52:30 +09:00
khei4
393215649b [MemCpyOpt] precommit test to add single BB stack-move optimization (NFC)
Differential Revision: https://reviews.llvm.org/D152277
2023-07-13 14:52:30 +09:00
Noah Goldstein
d50c1fcb5d [InstCombine] Fold (icmp eq/ne (zext i1 X) (sext i1 Y))-> (icmp eq/ne (or X, Y), 0)
This comes up when adding two `bool` types in C/C++
```
    bool foo(bool a, bool b) {
        return a + b;
    }
    ...
    ->
    define i1 @foo(i1 %a, i1 %b) {
        %conv = zext i1 %a to i32
        %conv3.neg = sext i1 %b to i32
        %tobool4 = icmp ne i32 %conv, %conv3.neg
        ret i1 %tobool4
}
```

Proof: https://alive2.llvm.org/ce/z/HffWAN

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D154574
2023-07-12 17:17:52 -05:00
Noah Goldstein
83ad4cb61f [InstCombine] Add tests for folding (icmp eq/ne (zext i1) (sext i1)); NFC
Differential Revision: https://reviews.llvm.org/D154573
2023-07-12 17:17:52 -05:00
Shilei Tian
bcba20b5d0 [Attributor] Add AAAddressSpace to deduce address spaces
This patch adds initial support for the `AAAddressSpace` abstract
attributor interface to deduce and query address space information for a
pointer. We simply query the underlying objects that a pointer can point
to and find a common address space if they exist. This is the minimal
support for the interface, we currently manifest changes on loads and
stores. Additionally we should use the target transform information to
deduce if an address space transformation is a no-op for the target
machine when calculating compatibility.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120586
2023-07-12 15:47:41 -04:00
Eli Friedman
60712732ea [IndVars] Teach replaceCongruentIVs to avoid scrambling induction variables
replaceCongruentIVs analysis is based on ScalarEvolution; this makes
comparing different PHIs and performing the replacement straightforward.
However, it can have some side-effects: it isn't aware whether an
induction variable is in canonical form, so it can perform replacements
which obscure the meaning of the IR.

In test22 in widen-loop-comp.ll, the resulting loop can't be analyzed by
ScalarEvolution at all.

My attempted solution is to restrict the transform: don't try to replace
induction variables using PHI nodes that don't represent simple
induction variables.

I'm not sure if this is the best solution; suggestions welcome.

Differential Revision: https://reviews.llvm.org/D121950
2023-07-12 12:27:39 -07:00
Anna Thomas
1159266734 [SLP] Add support for fmaximum/fminimum reduction
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.

Differential Revision: https://reviews.llvm.org/D154463
2023-07-12 15:22:38 -04:00
Anna Thomas
a43aebcd91 [SLP] Test for minimum/maximum reduction
minimum/maximum tests from D154463. This contains tests where we vectorize
minimum/maximum as well as the tests where we currently do not identify
reduction patterns.

Differential Revision: https://reviews.llvm.org/D155096
2023-07-12 15:22:37 -04:00
Matt Arsenault
6ed48ebf2e ValueTracking: Recognize fpclass clamping select patterns
Improve computeKnownFPClass select handling to cover the case where
the condition performs a class test. This allows us to recognize
no-nans in cases like:

  %not.nan = fcmp ord float %x, 0.0
  %select = select i1 %not.nan, float %x, float 0.0

Math library code has similar edge case filtering on the inputs and
final results.

https://reviews.llvm.org/D153089
2023-07-12 13:14:05 -04:00
Matt Arsenault
05f0de3d74 ValueTracking: Add base computeKnownFPClass select handling tests
Prepare to handle class clamping patterns. Working around some kind of
select special casing bug in attributor where computeKnownFPClass is
never called on select.
2023-07-12 13:14:05 -04:00
Peixin Qiao
ab73bd3897 [InstCombine] Enhance select icmp and folding
This folds (a << k) ? 2^k * a : 0 to 2^k * a.

https://alive2.llvm.org/ce/z/_dDRjo

Fix #62155.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148420
2023-07-12 22:39:45 +08:00
Maciej Gabka
5b0e19a7ab [TLI][AArch64] Add mappings to vectorized functions from ArmPL
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.

Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries

Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
2023-07-12 12:53:18 +00:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Krasimir Georgiev
c256e19671 Revert "Revert "IRBuilder: Fix not handling strictfp minnum/maxnum""
This reverts commit 593797ab9bedca6e9b0b7a9ed0589cf76023ab00.

I didn't realize that there was already a fix for the broken tests fd2254b7358d0f78a79784688bd8012c1a52b9cf.
2023-07-12 14:13:31 +02:00