22124 Commits

Author SHA1 Message Date
Sanjay Patel
6fedc6a2b4 Revert "[InstCombine] add narrowing transform for low-masked binop with zext operand"
This reverts commit afa192cfb6049a15c5542d132d500b910b802c74.
This can cause an infinite loop as shown with an example in the
post-commit thread.
2022-06-10 08:25:10 -04:00
David Sherwood
8daaea206b [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds
In foldSelectIntoOp we sometimes transform a select of a fadd into a
fadd of a select, where we select between data and an identity value.
For both fadd and fsub the identity is always -0.0, but if the nsz
flag is set on the select instruction we can use +0.0 instead. Doing
so then triggers other optimisations, such as when folding the select
of masked load into a new masked load.

Differential Revision: https://reviews.llvm.org/D126774
2022-06-10 12:42:34 +01:00
Bin Cheng
8b360c69e9 [FuncSpec]Fix assertion failure when value is not added to solver
This patch improves the fix in D110529 to prevent from crashing on value
with byval attribute that is not added in SCCP solver.

Authored-by: sinan.lin@linux.alibaba.com
Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D126355
2022-06-10 18:45:53 +08:00
David Green
4a5cb957a1 [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat
This adds a fold for aggressive instcombine that converts
smin(smax(fptosi(x))) into a llvm.fptosi.sat, providing that the
saturation constants are correct and the cost of the llvm.fptosi.sat is
lower.

Unfortunately, a llvm.fptosi.sat cannot always be converted back to a
smin/smax/fptosi. The llvm.fptosi.sat intrinsic is more defined that the
original, which produces poison if the original fptosi was out of range.
The llvm.fptosi.sat will saturate any value, so needs to be expanded to
a fptosi(fpmin(fpmax(x))), which can be worse for codegeneration
depending on the target.

So this change thais conditional on the backend reporting that the
llvm.fptosi.sat is cheaper that the original smin+smax+fptost.  This is
a change to the way that AggressiveInstrcombine has worked in the past.
Instead of just being a canonicalization pass, that canonicalization can
be dependant on the target in certain specific cases.

Differential Revision: https://reviews.llvm.org/D125755
2022-06-10 09:36:09 +01:00
Nikita Popov
3c514d31d7 [EarlyCSE] Update tests to use opaque pointers (NFC)
Update the EarlyCSE tests to use opaque pointers.

Worth noting that this leaves some bitcast ptr to ptr instructions
in the input IR behind which are no longer necessary. This is
because these use numbered instructions, so it's hard to drop them
in an automated fashion (as it would require renumbering all other
instructions as well). I'm leaving that as a problem for another day.

The test updates have been performed using
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34.

Differential Revision: https://reviews.llvm.org/D127278
2022-06-10 09:53:35 +02:00
Nikita Popov
c10921fa1a [CGP] Also freeze ctlz/cttz operand when despeculating
D125887 changed the ctlz/cttz despeculation transform to insert
a freeze for the introduced branch on zero. While this does fix
the "branch on poison" issue, we may still get in trouble if we
pick a different value for the branch and for the ctz argument
(i.e. non-zero for the branch, but zero for the ctz). To avoid
this, we should use the same frozen value in both positions.

This does cause a regression in RISCV codegen by introducing an
additional sext. The DAG looks like this:

    t0: ch = EntryToken
        t2: i64,ch = CopyFromReg t0, Register:i64 %3
      t4: i64 = AssertSext t2, ValueType:ch:i32
    t23: i64 = freeze t4
          t9: ch = CopyToReg t0, Register:i64 %0, t23
          t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32>
        t18: ch = TokenFactor t9, t16
            t25: i64 = sign_extend_inreg t23, ValueType:ch:i32
          t24: i64 = setcc t25, Constant:i64<0>, seteq:ch
        t28: i64 = and t24, Constant:i64<1>
      t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68>
    t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80>

I don't see a really obvious way to improve this, as we can't push
the freeze past the AssertSext (which may produce poison).

Differential Revision: https://reviews.llvm.org/D126638
2022-06-10 09:46:10 +02:00
chenglin.bi
cde377db85 [InstCombine] Add negative vector tests for lshr+shl+and/shl+lshr+and transforms; NFC 2022-06-10 11:36:39 +08:00
chenglin.bi
87b5840b34 [InstCombine] Add baseline tests for lshr+shl+and transforms; NFC 2022-06-10 11:00:41 +08:00
chenglin.bi
de7a6ae1ff [InstCombine] Optimize shl+lshr+and conversion pattern
if `C1` and `C3` are pow2 and `Log2(C3)+C2 < BitWidth`:
    ((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126617
2022-06-10 09:36:58 +08:00
Philip Reames
206f10d3f6 Plumb InstructionCost through unroll costing
Teach the unroller(s) how to handle an invalid cost. This avoids crashes when the backend can't provide a cost due to either a fundemental limitation or an unimplemented cost model case.

Differential Revision: https://reviews.llvm.org/D127305
2022-06-09 15:42:53 -07:00
Philip Reames
f85c5079b8 Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.

On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.

I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.

Differential Revision: https://reviews.llvm.org/D127131
2022-06-09 15:17:24 -07:00
Sanjay Patel
afa192cfb6 [InstCombine] add narrowing transform for low-masked binop with zext operand
https://alive2.llvm.org/ce/z/hRy3rE

As shown in D123408, we can produce this pattern when moving
cast around, and we already have a related fold for a binop
with a constant operand.
2022-06-09 16:59:26 -04:00
Sanjay Patel
48a606d0c7 [InstCombine] add tests for masked binop narrowing; NFC 2022-06-09 16:55:24 -04:00
David Green
f8f50a4975 [AggressiveInstcombine] Add target tests for fptosi.sat fold. NFC 2022-06-09 21:47:05 +01:00
Johannes Doerfert
6555558a80 Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit da50dab1ae111e9e6cb0248a47a038b17f798705.

Patch broke AMD GPU OpenMP offload buildbots.
https://lab.llvm.org/buildbot/#/builders/193/builds/13246
2022-06-09 17:04:01 +02:00
Johannes Doerfert
da50dab1ae [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.

Fixes: https://github.com/llvm/llvm-project/issues/54981
2022-06-09 16:48:53 +02:00
Johannes Doerfert
94841c713f [Attributor] Try to delete stores and simplify stored values
By default we should try to eliminate unused stores and simplify values
stored while we are at it.
2022-06-09 16:48:53 +02:00
Johannes Doerfert
a3273c0c06 [Attributor] Ensure to use the proper liveness AA
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.
2022-06-09 16:48:53 +02:00
Philip Reames
0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Florian Hahn
20d798bd47
Recommit "[SCEV] Look through single value PHIs." (take 3)
This reverts commit 1fbdbb559569641f6d509b569966901c8fb02b63.

All known issues surfaced by this patch should have been fixed now.

The fixes included fixing issues with SCEV expansion in LV and DA's
reliance on LCSSA phis.
2022-06-09 15:20:10 +01:00
Johannes Doerfert
ae10b8a582 [Attributor][FIX] Give registered simplification callbacks precedence
We accidentally checked for constants before we looked for registered
simplification callbacks. The latter needs to take precedence though.
2022-06-09 15:31:53 +02:00
Johannes Doerfert
393be12b74 [Attributor] Look at base values for align, nonnull, and deref
Stripping bitcasts and 0-geps helps normalization and minimizes the
impact of a follow up change.
2022-06-09 13:41:23 +02:00
Johannes Doerfert
cb8adf76f7 [Attributor] Simplify loads from constant globals
If a global is constant and the initializer is known we can simplify
loads from it as the value has to be the initializer.
2022-06-09 13:41:23 +02:00
Florian Hahn
85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
Johannes Doerfert
1df6e171c3 [Attributor] Simplify (integer range) state handling
We used to be very conservative when integer states were merged.
Instead of adding the known range (which is large due to uncertainty)
into the assumed range (which is hopefully small), we can also only
allow to merge in both at the same time into their respective
counterpart. This will ensure we keep the invariant that assumed is part
of known.
2022-06-09 12:00:26 +02:00
Johannes Doerfert
4277c1be88 [Attributor][FIX] Avoid metadata and duplicate replication assertion
When we recreate instructions as part of simplification we need to take
care of debug metadata and replacing the value multiple times. For now,
we handle both conservatively.
2022-06-09 12:00:26 +02:00
Biplob Mishra
d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Chenbing Zheng
38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Nikita Popov
56c9976d46 [IndVarSimplify] Don't assert that terminator is not SCEVable (PR55925)
The IV widening code currently asserts that terminators aren't SCEVable
-- however, this is not the case for invokes with a returned attribute.

As far as I can tell, this assertions is not necessary -- even if we
have a critical edge (the second test case), the trunc gets inserted
in a legal position.

Fixes https://github.com/llvm/llvm-project/issues/55925.

Differential Revision: https://reviews.llvm.org/D127288
2022-06-09 10:12:13 +02:00
chenglin.bi
226c564329 [InstCombine] Add vector tests for shl+lshr+and transforms; NFC
D126617
2022-06-09 11:14:26 +08:00
Danila Malyutin
ed6c309d4b [APFloat] Fix truncation of certain subnormal numbers
Certain subnormals would be incorrectly rounded away from zero.

Fixes #55838

Differential Revision: https://reviews.llvm.org/D127140
2022-06-08 21:54:35 +03:00
Florian Hahn
3d663308a5
[LV] Add test that caused revert of D123720. 2022-06-08 12:25:17 +01:00
Max Kazantsev
16c028a8c8 [Test] Add XFAIL test for PR55689
SCEV issues in dynamically unreached code, see details at https://github.com/llvm/llvm-project/issues/55689

1st reduced test by Nikic!
2022-06-08 16:01:29 +07:00
Chuanqi Xu
733d7cf964 [Debug] [Coroutines] Add deref operator for non complex expression
Background:

When we construct coroutine frame, we would insert a dbg.declare
intrinsic for it:
```
%hdl = call void @llvm.coro.begin() ; would return coroutine handle
call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
```

And in the splitted coroutine, it looks like:
```
define void @coro_func.resume(ptr *hdl) {
entry.resume:
    call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
}
```

And we would salvage the debug info by inserting a new alloca here:
```
define void @coro_func.resume(ptr %hdl) {
entry.resume:
    %frame.debug = alloca ptr
    call void @llvm.dbg.declare(metadata ptr %frame.debug, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
    store ptr %hdl, %frame.debug
}
```

But now, the problem comes since the `dbg.declare` refers to the address
of that alloca instead of actual coroutine handle. I saw there are codes
to solve the problem but it only applies to complex expression only. I
feel if it is OK to relax the condition to make it work for
`__coro_frame`.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D126277
2022-06-08 10:53:51 +08:00
Wael Yehia
0952cf5bbb [InstCombine] decomposeSimpleLinearExpr should bail out on negative operands.
InstCombine tries to rewrite

  %prod = mul nsw i64 %X,   Scale
  %acc = add nsw i64 %prod,   Offset
  %0 = alloca i8, i64 %acc, align 4
  %1 = bitcast i8* %0 to i32*
  Use ( %1 )

into

  %prod = mul nsw i64 %X,   Scale/4
  %acc = add nsw i64 %prod,   Offset/4
  %0 = alloca i32, i64 %acc, align 4
  Use (%0)

But it assumes Scale is unsigned, and performs an unsigned division.
So we should bail out if Scale cannot be interpreted as an unsigned safely.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126546
2022-06-08 00:57:25 +00:00
Sanjay Patel
cae993d4c8 [InstCombine] [InstCombine] reduce left-shift-of-right-shifted constant via demanded bits
If we don't demand low bits and it is valid to pre-shift a constant:
(C2 >> X) << C1 --> (C2 << C1) >> X

https://alive2.llvm.org/ce/z/_UzTMP

This is the reverse-order shift sibling to 82040d414b3c ( D127122 ).
It seems likely that we would want to add this to the SDAG version of
the code too to keep it on par with IR.
2022-06-07 18:43:27 -04:00
Sanjay Patel
0856a6cb7a [InstCombine] add tests for left-shift-of-right-shifted constant; NFC
The tests are adapted from the sibling folds' tests (see D127122).
2022-06-07 18:43:27 -04:00
Martin Sebor
dd2a6d78ee [InstCombine] Fold memchr of sequences of same characters
Enhance memchr libcall folder to handle constant arrays consisting
of one or two sequences of cosecutive equal characters.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126515
2022-06-07 13:45:10 -06:00
Sanjay Patel
82040d414b [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits
If we don't demand high bits (zeros) and it is valid to pre-shift a constant:
(C2 << X) >> C1 --> (C2 >> C1) << X

https://alive2.llvm.org/ce/z/P3dWDW

There are a variety of related patterns, but I haven't found a single solution
that gets all of the motivating examples - so pulling this piece out of
D126617 along with more tests.

We should also handle the case where we shift-right followed by shift-left,
but I'll make that a follow-on patch assuming this one is ok. It seems likely
that we would want to add this to the SDAG version of the code too to keep it
on par with IR.

Differential Revision: https://reviews.llvm.org/D127122
2022-06-07 13:28:18 -04:00
Sanjay Patel
8956f80e4b [InstCombine] add vector tests for shift-shift; NFC
D127122
2022-06-07 13:28:18 -04:00
Philip Reames
d20f3fb6a2 Add initial coverage for invalid instruction costs in LoopRotate
Once extended with a case which requires duplication, will serve as test for crash being fixed in D127131.
2022-06-07 09:55:41 -07:00
Craig Topper
d73684e223 [LoopFlatten] Fix crash if the inner loop trip count comes from a sext instruction.
If we look through a truncate in matchLinearIVUser, it's possible
we find a sext/zext instruction that didn't come from widening.
This will fail the MatchedItCount->getType() == InnerInductionPHI->getType()
assertion.

Fix this by checking that we did not look through a truncate already.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D127149
2022-06-07 08:21:21 -07:00
Alexey Bataev
3731bbc425 [SLP]Add a test for geps with non-const indeces in scatter vectorize
nodes, NFC.
2022-06-07 08:02:14 -07:00
David Sherwood
bc92045013 [NFC][InstCombine] Add two more tests to select-binop-foldable-floating-point.ll
Pre-commit some tests as part of https://reviews.llvm.org/D126774
2022-06-07 11:16:04 +01:00
David Sherwood
997ecb0036 [LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding
Based on reviewer comments on https://reviews.llvm.org/D126692 I've
added FastMathFlags to the select instruction used when tail-folding
with reductions. These flags can then be used by InstCombine to
decide upon the most optimal floating point identity value for
fadd/fsub. Doing so unlocks further optimisations, such as folding
selects into masked loads.

Differential Revision: https://reviews.llvm.org/D126778
2022-06-07 10:21:31 +01:00
Nikita Popov
7fa97b473c [SCCP] Don't mark ranges from branch conditions as potentially undef
Now that transforms introducing branch on poison have been removed,
we can stop marking ranges that have been derived from branch
conditions as containing undef. The existing comment explains why
this is legal. I've checked that alive2 is happy with SCCP tests
after this change.

Differential Revision: https://reviews.llvm.org/D126647
2022-06-07 10:20:24 +02:00
Philip Reames
6071de3db6 [RISCV] Autogen a test for ease of update 2022-06-06 12:44:34 -07:00
Vasileios Porpodas
6c6ad5143a [SLP][NFC] Precommit test for followup patch that fixes vector phi poison input.
Differential Revision: https://reviews.llvm.org/D126938
2022-06-06 10:00:27 -07:00
Sanjay Patel
4b2681ffa8 [InstCombine] add/move tests for opposite direction shifts; NFC 2022-06-06 11:35:50 -04:00
Florian Hahn
eaf48dd9b0
[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF.
Try to simplify BranchOnCount to `BranchOnCond true` if TC <= UF * VF.

This is an alternative to D121899 which simplifies the VPlan directly
instead of doing so late in code-gen.

The potential benefit of doing this in VPlan is that this may help
cost-modeling in the future. The reason this is done in prepareToExecute
at the moment is that a single plan may be used for multiple VFs/UFs.

There are further simplifications that can be applied as follow ups:

1. Replace inductions with constants
2. Replace vector region with regular block.

Fixes #55354.

Depends on D126679.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126680
2022-06-06 09:38:53 +01:00