25910 Commits

Author SHA1 Message Date
zhongyunde
df19d87227 [LV] Add option to tune the cost model, NFC
For Neon, the default nonconst stride cost is conservative,
and it is a local variable, which is not convenience to
to tune the loop vectorize.
So I try to use a option, which is similar to SVEGatherOverhead brought in D115143.
Fix https://github.com/llvm/llvm-project/issues/63082.

Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D152253
2023-06-07 22:08:29 +08:00
Serguei Katkov
d57ed844fe [CGP] Add test to show the missed case in remove llvm.assume 2023-06-07 17:20:57 +07:00
luxufan
e9ddb584e8 [LoopIdiom] Freeze BitPos if !isGuaranteedNotToBeUndefOrPoison
Fixes: https://github.com/llvm/llvm-project/issues/62873

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151690
2023-06-07 14:50:22 +08:00
Joshua Cao
cb9f1aadda [ValueTracking] Implied conditions for lshr
`V1 >> V2 u<= V1` for any V1, V2

This works for lshr and any div's that are changed to lshr's

This fixes issues in clang and rustc:
https://github.com/llvm/llvm-project/issues/62441
https://github.com/rust-lang/rust/issues/110971

Reviewed By: goldstein.w.n

Differential Revision: https://reviews.llvm.org/D151541
2023-06-06 21:06:22 -07:00
Joshua Cao
f23b4faaff [InstSimplify] Add tests for shl implied conditions 2023-06-06 21:06:22 -07:00
Chuanqi Xu
84c033d9ba [LICM] [Coroutines] Don't hoist threadlocals within presplit coroutines
Close https://github.com/llvm/llvm-project/issues/63022

This is the following of https://reviews.llvm.org/D135550, which is
discussed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
In my imagination, we could fix the issue fundamentally after we
introduces new memory kind thread id. But I am not very sure if we can
fix the issue fundamentally in time.

Besides that, I think the correctness is the most important. So it
should not be bad to land this given it is innocent.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151774
2023-06-07 10:25:47 +08:00
Chuanqi Xu
eab8c1eb62 [Coroutines] [LICM] Precommit test for D151774
This is required in the review.

Differential Revision: https://reviews.llvm.org/D151774
2023-06-07 10:11:22 +08:00
Paul Kirth
9ad3ca4e9a Revert "[TypePromotion] Don't treat bitcast as a Source"
This reverts commit 27aea17fe061f9778bb1e8ff5fdf9fc0fb03abe1.
For details, see: https://reviews.llvm.org/D152112

Fuchsia CI failure: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-arm64/b8779118297575483793/overview
2023-06-07 00:51:20 +00:00
Matt Arsenault
95a3ae58b8 ValueTracking: Add baseline test for ldexp computeKnownFPClass 2023-06-06 17:09:25 -04:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Guozhi Wei
84bcfa0e1b [GVN] Improve PRE on load instructions
This patch implements the enhancement proposed by
https://github.com/llvm/llvm-project/issues/59312.

Suppose we have following code

   v0 = load %addr
   br %LoadBB

LoadBB:
   v1 = load %addr
   ...

PredBB:
   ...
   br %cond, label %LoadBB, label %SuccBB

SuccBB:
   v2 = load %addr
   ...

Instruction v1 in LoadBB is partially redundant, edge (PredBB, LoadBB) is a
critical edge. SuccBB is another successor of PredBB, it contains another load
v2 which is identical to v1. Current GVN splits the critical edge
(PredBB, LoadBB) and inserts a new load in it. A better method is move the load
of v2 into PredBB, then v1 can be changed to a PHI instruction.

If there are two or more similar predecessors, like the test case in the bug
entry, current GVN simply gives up because otherwise it needs to split multiple
critical edges. But we can move all loads in successor blocks into predecessors.

Differential Revision: https://reviews.llvm.org/D141712
2023-06-06 19:45:34 +00:00
Jolanta Jensen
a963dbb5ac [SVE ACLE] Extend existing aarch64_sve_mul combines to also act on aarch64_sve_mul_u.
Differential Revision: https://reviews.llvm.org/D152004
2023-06-06 15:26:33 +00:00
Mikhail Goncharov
df3a8f3760 Revert "Reland [MergeICmps] Adapt to non-eq comparisons, bugfix"
Causes miscompile. See https://reviews.llvm.org/D141188.

This reverts commit fb2c98a929aa65603e9d984307a41325e577e9d3
2023-06-06 16:26:52 +02:00
Florian Hahn
8f781b96e2
Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Florian Hahn
f47084ecfb
[LV] Use force-vector-width for X86 recurrence test.
This makes sure that all tests that can be vectorized in the file are
vectorized.
2023-06-06 11:27:35 +02:00
Florian Hahn
4c51a45e80
[LV] Add test for #62954. 2023-06-06 11:20:22 +02:00
khei4
116670d192 [InstCombine] add overflow checking on Add ~X + C --> (C-1) - X
Differential Revision: https://reviews.llvm.org/D152088
2023-06-06 12:24:45 +09:00
khei4
0505fcdccd [InstCombine] precommit test for D152088(NFC)
Differential Revision: https://reviews.llvm.org/D152089
2023-06-06 12:24:45 +09:00
Johannes Doerfert
cb17c48fdd [Attributor] Identify and remove no-op fences
The logic and implementation follows the removal of no-op barriers. If
the fence is not making updates visible, either to the world or the
current thread, it is not needed. Said differently, the fences we remove
do not establish synchronization (happens-before) edges.
This allows us to eliminate some of the regression caused by:
  https://reviews.llvm.org/D145290
2023-06-05 17:14:00 -07:00
Johannes Doerfert
532356e82d [Attributor] Merge ranges by expansion, avoid unknown ranges
Different offsets can be handled by expansion rather than defaulting to
an unknown offset. Thus, [4,4] & [8,8] will result in [4, 12] rather
than [unknown, unknown].
2023-06-05 16:53:46 -07:00
Johannes Doerfert
87d13b8776 [Attributor][NFC] Precommit vector write range tests 2023-06-05 16:53:45 -07:00
Johannes Doerfert
8f4fadd1b4 [OpenMP] Use "kernel" attribute consistently 2023-06-05 16:33:53 -07:00
Johannes Doerfert
dbbe9b3776 [Attributor] Create AAMustProgress for the mustprogress attribute
Derive the mustprogress attribute based on the willreturn attribute
or the fact that all callers are mustprogress.

Differential Revision: https://reviews.llvm.org/D94740
2023-06-05 16:33:52 -07:00
Noah Goldstein
73ce343125 [InstCombine] Add transform (icmp pred (shl {nsw and/or nuw} X, Y), C) -> (icmp pred X, C)
Three new transforms:
    1) `(icmp pred (shl nsw nuw X, Y), C)` [if `C <= 0`] -> `(icmp pred X, C)`
        - ugt: https://alive2.llvm.org/ce/z/K_57J_
        - sgt: https://alive2.llvm.org/ce/z/BL8u_a
        - sge: https://alive2.llvm.org/ce/z/yZZVYz
        - uge: https://alive2.llvm.org/ce/z/R4jwwJ
        - ule: https://alive2.llvm.org/ce/z/-gbmth
        - sle: https://alive2.llvm.org/ce/z/ycZVsh
        - slt: https://alive2.llvm.org/ce/z/4MzHYm
        - sle: https://alive2.llvm.org/ce/z/fgNfex
        - ult: https://alive2.llvm.org/ce/z/cXfvH5
        - eq : https://alive2.llvm.org/ce/z/sZh_Ti
        - ne : https://alive2.llvm.org/ce/z/UrqSWA

    2) `(icmp eq/ne (shl {nsw|nuw} X, Y), 0)` -> `(icmp eq/ne X, 0)`
        - eq+nsw: https://alive2.llvm.org/ce/z/aSJN6D
        - eq+nuw: https://alive2.llvm.org/ce/z/r2_-br
        - ne+nuw: https://alive2.llvm.org/ce/z/RkETtu
        - ne+nsw: https://alive2.llvm.org/ce/z/8iSfW3

    3) `(icmp slt (shl nsw X, Y), 0/1)` -> `(icmp pred X, 0/1)`
       `(icmp sgt (shl nsw X, Y), 0/-1)` -> `(icmp pred X, 0/-1)`
        - slt: https://alive2.llvm.org/ce/z/eZYRan
        - sgt: https://alive2.llvm.org/ce/z/QQeP26

    Transform 3) is really sle/slt/sge/sgt with 0, but sle/sge
    canonicalize to slt/sgt respectively so its implemented as such.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145341
2023-06-05 13:01:03 -05:00
Noah Goldstein
e8e8528085 [InstCombine] Add tests for tranforming (icmp pred (shl {nsw and/or nuw} X, Y), C); NFC
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145340
2023-06-05 13:01:03 -05:00
Krzysztof Drewniak
23098bd454 [AMDGPU] Add intrinsic for converting global pointers to resources
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148957
2023-06-05 17:07:59 +00:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
Nikita Popov
79115aebb7 [LoopUnroll] Add test for SCEV invalidation issue (NFC)
Test for the issue reported at https://reviews.llvm.org/D149331#4387931.
2023-06-05 17:28:32 +02:00
Antonio Frighetto
420cf63ead [ConstraintElimination] Refactor checkAndReplaceCondition (NFC)
Handling `true` and `false` constant replacements is now abstracted
out into a single lambda function `ReplaceCmpWithConstant`, so as to
reduce code duplication.
2023-06-05 16:54:58 +02:00
David Green
27aea17fe0 [TypePromotion] Don't treat bitcast as a Source
This removes BitCasts from isSource in Type Promotion, as I don't believe they
need to be treated as Sources. They will usually be from floats or hoisted
constants, where constants will be handled already.

This fixes #62513, but didn't otherwise cause any differences in the tests I
ran.

Differential Revision: https://reviews.llvm.org/D152112
2023-06-05 14:42:08 +01:00
khei4
4db8d4f839 [InstCombine] add overflow checking on AddSub C-(X+C2) --> (C-C2)-X
Differential Revision: https://reviews.llvm.org/D152068
2023-06-05 20:05:06 +09:00
khei4
41588b5880 [InstCombine] precommit test for D152068(NFC)
Differential Revision: https://reviews.llvm.org/D152091
2023-06-05 20:05:06 +09:00
Mateja Marjanovic
88421ea973 [AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.

This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.

For image stores it also trims DMask if necessary.

Reviewed by: arsenm, foad, nhaehnle, piotr
2023-06-05 12:30:21 +02:00
Mikhail Gudim
d37d4072f2 Reapply [SCCP] Constant propagation through freeze instruction
Reapply with extra check for struct types, which caused buildbot
failures last time.

-----

The freeze instruction has not been handled by SCCPInstVisitor.
This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I)
method to handle freeze instructions.

Differential Revision: https://reviews.llvm.org/D151659
2023-06-05 11:47:36 +02:00
zhongyunde
34d380e1f6 [IndVars] Add check of loop invariant for indirect use
We usually only check direct use instruction of IV, while the
bitcast of 'ptrtoint ptr to i64' doesn't affect the result, so go
a step further.
Fix https://github.com/llvm/llvm-project/issues/59633.

Reviewed By: markoshorro
Differential Revision: https://reviews.llvm.org/D151877
2023-06-03 22:29:09 +08:00
luxufan
1ac99bc452 [InstSimplify] Simplify select i1 ConstExpr, i1 true, i1 false to ConstExpr
`select i1 non-const, i1 true, i1 false` has been optimized to
`non-const`. There is no reason that we can not optimize `select i1
ConstExpr, i1 true, i1 false` to `ConstExpr`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151631
2023-06-03 15:09:00 +08:00
Kazu Hirata
d6f994acb3 [InlineCost] Check for conflicting target attributes early
When we inline a callee into a caller, the compiler needs to make sure
that the caller supports a superset of instruction sets that the
callee is allowed to use.  Normally, we check for the compatibility of
target features via functionsHaveCompatibleAttributes, but that
happens after we decide to honor call site attribute
Attribute::AlwaysInline.  If the caller contains a call marked with
Attribute::AlwaysInline, which can happen with
__attribute__((flatten)) placed on the caller, the caller could end up
with code that cannot be lowered to assembly code.

This patch fixes the problem by checking the target feature
compatibility before we honor Attribute::AlwaysInline.

Fixes https://github.com/llvm/llvm-project/issues/62664

Differential Revision: https://reviews.llvm.org/D150396
2023-06-02 16:00:47 -07:00
Matt Arsenault
1536e299e6 InstSimplify: Require instruction be parented
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
2023-06-02 18:14:28 -04:00
Sami Tolvanen
2831a271c8 [KCFI] Emit debugtrap to make indirect call checks recoverable
KCFI traps should always be recoverable, but as Intrinsic::trap
is marked noreturn, it's not possible to continue execution after
handling the trap as the compiler is free to assume we never
return. Switch to debugtrap instead to ensure we have the option
to resume execution after the trap.
2023-06-02 19:39:13 +00:00
Matt Arsenault
2fef38f82d SimpleLoopUnswitch: Add missing test coverage for divergent target check
No tests failed when I removed the hasBranchDivergence check, so
add one.
2023-06-02 08:30:06 -04:00
Nikita Popov
fa45fb7f0c [InstCombine] Handle assumes in multi-use demanded bits simplification
This fixes the largest remaining discrepancy between results of
computeKnownBits() and SimplifyDemandedBits(). We only care about
the multi-use case here, because the assume necessarily introduces
an extra use.
2023-06-02 14:24:24 +02:00
Jolanta Jensen
dc63b35b02 [SVE ACLE] Extend IR combines for fmul, fsub, fadd to cover _u variants
This patch extends existing IR combines for: fmul, fsub and fadd,
relying on all active predicate to also apply to their equivalent
undef (_u) intrinsics.

Differential Revision: https://reviews.llvm.org/D150768
2023-06-02 11:06:57 +00:00
Matt Arsenault
8609df7c6e AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic
This barely matters since 99% are converted to the generic intrinsic now,
and the only real difference is the target intrinsic supports a variable
test mask. Start propagating poison. Prefer folding to a defined result (false)
for an undef test mask. Propagate undef for the first operand.
2023-06-01 18:35:55 -04:00
Alexandros Lamprineas
b1f41685a6 [IPSCCP] Decouple queries for function analysis results.
The SCCPSolver is using a structure (AnalysisResultsForFn) where it keeps
pointers to various analyses needed by the IPSCCP pass. These analyses are
requested all at the same time, which can become problematic in some cases.
For example one could be retrieved via getCachedAnalysis() prior to the
actual execution of the analysis. In more detail:

The IPSCCP pass uses a DomTreeUpdater to preserve the PostDominatorTree
in case the PostDominatorTreeAnalysis had run before IPSCCP. Starting with
commit 1b1232047e83b the IPSCCP pass may use BlockFrequencyAnalysis for
some functions in the module. As a result, the PostDominatorTreeAnalysis
may not run until the BlockFrequencyAnalysis has run, since the latter
analysis depends on the former. Currently, we setup the DomTreeUpdater
using getCachedAnalysis to retrieve a PostDominatorTree. This happens
before BlockFrequencyAnalysis has run, therefore the cached analysis can
become invalid by the time we use it.

Differential Revision: https://reviews.llvm.org/D151666
2023-06-01 16:38:04 +01:00
Florian Hahn
3b912e269a
[LV] Bail out on loop-variant steps when rewriting SCEV exprs.
If the step is not loop-invariant, we cannot create a modified AddRec,
as the start needs to be loop-invariant. Mark those cases as
CannotAnalyze and bail out, to fix a crash.
2023-06-01 16:14:02 +01:00
Nikita Popov
0213c6d0df [InstCombine] Use DL-aware constant folding for phi compare
Serves the dual purpose of avoiding an extra InstCombine iteration
for the DL-aware folding and removing one icmp constexpr use.
2023-06-01 16:02:36 +02:00
Paulo Matos
9485d983ac [InstCombine] Disable generation of fshl/fshr for rotates
Disable conversion of funnel shifts (fshl/fshr) into rotates
unless one of the operands is known to be a constant value.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150670
2023-06-01 15:31:49 +02:00
Nikita Popov
223f9b096e Revert "[SCCP] Constant propagation through freeze instruction"
This reverts commit 559d47a1790e1a9f9b1f8838a443eb7624ef1ac7.

Caused failure on sanitizer-aarch64-linux-bootstrap-ubsan:

    clang++: /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Transforms/Utils/SCCPSolver.cpp:442: llvm::ValueLatticeElement &llvm::SCCPInstVisitor::getValueState(llvm::Value *): Assertion `!V->getType()->isStructTy() && "Should use getStructValueState"' failed.
2023-06-01 15:30:46 +02:00
Mikhail Gudim
559d47a179 [SCCP] Constant propagation through freeze instruction
The freeze instruction has not been handled by SCCPInstVisitor.
This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I)
method to handle freeze instructions.

Differential Revision: https://reviews.llvm.org/D151659
2023-06-01 15:06:59 +02:00
Igor Kirillov
50dfc9e35d [LoopLoadElimination] Add support for stride equal to -1
This patch allows us to gain all the benefits provided by
LoopLoadElimination pass to descending loops.

Differential Revision: https://reviews.llvm.org/D151448
2023-06-01 12:10:53 +00:00