20587 Commits

Author SHA1 Message Date
Roman Lebedev
6a563e2570
[NFC][SCEV][IndVars] Add more tests for exit count w/ select
See https://github.com/llvm/llvm-project/issues/53020
2022-01-07 01:30:21 +03:00
Congzhe Cao
c251bfc3b9 [LoopInterchange] Remove a limitation in LoopInterchange legality
There was a limitation in legality that in the original inner loop latch,
no instruction was allowed between the induction variable increment
and the branch instruction. This is because we used to split the
inner latch at the induction variable increment instruction. Since
now we have split at the inner latch branch instruction and have
properly duplicated instructions over to the split block, we remove
this limitation.

Please refer to the test case updates to see how we now interchange
loops where instructions exist between the induction variable
increment and the branch instruction.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D115238
2022-01-06 15:56:32 -05:00
Alexey Bataev
d130df544d [SLP]Improve reordering for the nodes beeing used in alternate vectorization.
No need to include the order of the scalars beeing used as part of the
alternate vectorization into account when trying to reorder the whole
graph. Such elements better to reorder in the following phase because
the subtree still ends up in shuffle.

Part of D116688, fixes the regression in D116690.

Differential Revision: https://reviews.llvm.org/D116740
2022-01-06 11:18:57 -08:00
Alexey Bataev
7cb19fe493 [SLP]Initialize the lane with the given value instead of default 0.
There is a bug in the reordering analysis stage. If the element with the
given hash is not added to the map but has the same number of APOs and
instructions with same parent, but different instruction opcode, it will
be initalized with default values and then the counter is increased by
1. But the lane is not updated and default to 0 instead of the actual
   `Lane` value. It leads to the fact that the analysis is useless in
   many cases and default to lane 0 instead of actual lane with the
   minimum amount of APO operands.

Differential Revision: https://reviews.llvm.org/D116690
2022-01-06 10:57:11 -08:00
Nikita Popov
918015c9ba [EarlyCSE] Support opaque pointers
Explicitly check the load/store value type, because this is no
longer implicitly checked through the pointer type.
2022-01-06 17:08:50 +01:00
Alexey Bataev
bf5a688252 [SLP][NFC]Add a test for the extra shuffle after alternate node, NFC. 2022-01-06 06:34:58 -08:00
Nikita Popov
41a522779d [LICM] Check for noalias call instead of alloc like fn
When determining whether the memory is local to the function (and
we can thus introduce spurious writes without thread-safety issues),
check for a noalias call rather than the hardcoded list of memory
allocation functions. Noalias calls are the more general way to
determine allocation functions, as long as we're only interested
in the property that the returned value is distinct from any other
accessible memory.

Differential Revision: https://reviews.llvm.org/D116728
2022-01-06 14:38:19 +01:00
Nikita Popov
f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
Florian Hahn
86d113a8b8
[SCEVExpand] Do not create redundant 'or false' for pred expansion.
This patch updates SCEVExpander::expandUnionPredicate to not create
redundant 'or false, x' instructions. While those are trivially
foldable, they can be easily avoided and hinder code that checks the
size/cost of the generated checks before further folds.

I am planning on look into a few other similar improvements to code
generated by SCEVExpander.

I remember a while ago @lebedev.ri working on doing some trivial folds
like that in IRBuilder itself, but there where concerns that such
changes may subtly break existing code.

Reviewed By: reames, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116696
2022-01-06 11:52:19 +00:00
Nikita Popov
0fa174398b [LICM] Add test for noalias call (NFC)
Add a test with a noalias call that is not a known allocation
function.
2022-01-06 11:46:27 +01:00
Nikita Popov
c41aa41957 [ConstFold] Add missing check for inbounds gep
If the gep is not inbounds, then the gep might compute a null
value even if the base pointer is non-null.
2022-01-06 09:59:40 +01:00
Nikita Popov
37c9171764 [ConstantFold] Add test for invalid non-inbounds gep icmp fold
The gep evaluated to null in this case, and as such is not ne null.
2022-01-06 09:59:40 +01:00
Congzhe Cao
8ade3d43a3 Revert "[LoopInterchange] Remove a limitation in LoopInterchange legality"
This reverts commit 15702ff9ce28b3f4aafec13be561359d4c721595 while I
investigate a ppc build bot failure at
https://lab.llvm.org/buildbot#builders/36/builds/16051.
2022-01-05 23:34:36 -05:00
Congzhe Cao
15702ff9ce [LoopInterchange] Remove a limitation in LoopInterchange legality
There was a limitation in legality that in the original inner loop latch,
no instruction was allowed between the induction variable increment
and the branch instruction. This is because we used to split the
inner latch at the induction variable increment instruction. Since
now we have split at the inner latch branch instruction and have
properly duplicated instructions over to the split block, we remove
this limitation.

Please refer to the test case updates to see how we now interchange
loops where instructions exist between the induction variable increment
and the branch instruction.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D115238
2022-01-05 22:37:54 -05:00
Philip Reames
cfcd7af8de [instcombine] Add test coverage for (x >>u y) pred x [part 2] 2022-01-05 13:37:17 -08:00
Philip Reames
8cc52ca734 [instcombine] Add test coverage for (x >>u y) pred x 2022-01-05 13:34:05 -08:00
Philip Reames
4016d440fe Precommit test for D116683 2022-01-05 11:57:15 -08:00
Philip Reames
1a97138a1c Add test case from 356ada9 2022-01-05 11:19:16 -08:00
Philip Reames
58a0e449e1 [instcombine] Allow sinking of calls with known writes to uses
If we have a call whose only side effect is a write to a location which is known to be dead, we can sink said call to the users of the call's result value. This is analogous to the recent changes to delete said calls if unused, but framed as a sinking transform instead.

Differential Revision: https://reviews.llvm.org/D116200
2022-01-05 10:37:22 -08:00
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
Sanjay Patel
e2165e0968 [InstCombine] remove trunc user restriction for match of bswap
This does not appear to cause any problems, and it
fixes #50910

Extra tests with a trunc user were added with:
3a239379
...but they don't match either way, so there's an
opportunity to improve the matching further.
2022-01-05 13:04:11 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Sanjay Patel
3a2393795f [InstCombine] add tests for bswap; NFC 2022-01-05 08:33:04 -05:00
Sanjay Patel
4a8c0aa094 [InstSimplify] add tests for udiv/urem with known bits; NFC 2022-01-05 08:33:04 -05:00
Nikita Popov
6e474d3308 [GlobalOpt][Evaluator] Fix off by one error in bounds check (PR53002)
We should bail out if the index is >= the size, not > the size.

Fixes https://github.com/llvm/llvm-project/issues/53002.
2022-01-05 14:06:02 +01:00
Sander de Smalen
95a93722db [LV] Remove what seems like stale code in collectElementTypesForWidening.
This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be
although that patch doesn't really mention any reasons for ignoring the
pointer type in this calculation if the memory access isn't consecutive.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D115356
2022-01-05 12:20:59 +00:00
Nikita Popov
3dc1907d06 [ConstantFold] Use ConstantFoldLoadFromUniformValue() in more places
In particular, this also preserves undef when loading from padding,
rather than converting it to zero through a different codepath.

This is the remaining part of D115924.
2022-01-05 12:47:50 +01:00
Nikita Popov
4e62d210c4 [ConstantFold] Add test for load of padding (NFC)
This currently load zero rather than undef.
2022-01-05 12:47:49 +01:00
Nikita Popov
99c6b12b92 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.

This is part of D115924.
2022-01-05 12:30:46 +01:00
Nikita Popov
00686ab4af [ConstantFold] Add additional load from uniform value tests (NFC) 2022-01-05 12:30:46 +01:00
Benjamin Kramer
5f0a349738 Revert "Revert "[InferAttrs] Add writeonly to all the math functions""
This reverts commit 29b6e967f3e99ac45340ea37a70262c70e4e7528. The bug it
found in PartiallyInlineLibCalls was fixed in
c8ffc73350dbb6044ca947bbead127b9b914cdf3.
2022-01-05 12:16:35 +01:00
Benjamin Kramer
c8ffc73350 [PartiallyInlineLibCalls] Don't crash when there's a writeonly attribute on the call
readnone subsumes writeonly, so just swap out the attributes. The
verifier doesn't allow us to have both on a call.
2022-01-05 12:16:26 +01:00
Florian Hahn
65c4d6191f
[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable.
At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the
primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D113223
2022-01-05 10:46:06 +00:00
Martin Storsjö
29b6e967f3 Revert "[InferAttrs] Add writeonly to all the math functions"
This reverts commit ea75be3d9df448b6abafaf752a8141764d93ca33 and
1eb5b6e85045d22720f177a02aaf7097930e4b4f.

That commit caused crashes with compilation e.g. like this
(not fixed by the follow-up commit):

$ cat sqrt.c
float a;
b() { sqrt(a); }
$ clang -target x86_64-linux-gnu -c -O2 sqrt.c
Attributes 'readnone and writeonly' are incompatible!
  %sqrtf = tail call float @sqrtf(float %0) #1
in function b
fatal error: error in backend: Broken function found, compilation aborted!
2022-01-05 11:12:19 +02:00
Nikita Popov
00e6869463 [MemCpyOpt] Look through pointer casts when checking capture
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-05 09:50:33 +01:00
Nikita Popov
487a34ed9d [MemCpyOpt] Make capture check during call slot optimization more precise
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-05 09:39:25 +01:00
Nikita Popov
c2e77c9122 [MemCpyOpt] Add additional call slot capture tests (NFC) 2022-01-05 09:33:04 +01:00
Nikita Popov
787f86e68c [GlobalOpt][Evaluator] Don't create bitcast for same type (PR52994)
isBitOrNoopPointerCastable() returns true if the types are the
same, but it's not actually possible to create a bitcast for all
such types. The assumption seems to be that the user will omit
creating the cast in that case, as it is unnecessary.

Fixes https://github.com/llvm/llvm-project/issues/52994.
2022-01-05 09:17:07 +01:00
Fangrui Song
1eb5b6e850 [InferAttrs] If readonly is already set, set readnone instead of writeonly
D116426 may lead to an assertion failure `Attributes 'readonly and writeonly' are incompatible!` if the builtin function already has `readonly`.
2022-01-04 18:59:35 -08:00
Chuanqi Xu
c75cedc237 [Coroutines] Set presplit attribute in Clang and mlir
This fixes bug49264.

Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.

This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.

Reviewed By: rjmccall, ezhulenev

Differential Revision: https://reviews.llvm.org/D115790
2022-01-05 10:25:02 +08:00
Philip Reames
11a46b1749 precommit tests for a planned followon to D116200 2022-01-04 12:02:25 -08:00
Philip Reames
1be54bc764 precommit additional tests for D116200 2022-01-04 11:50:44 -08:00
Philip Reames
0b09313cd5 [funcattrs] Infer writeonly argument attribute [part 2]
This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward.

One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred.

Differential Revision: https://reviews.llvm.org/D115003
2022-01-04 09:07:54 -08:00
Benjamin Kramer
ea75be3d9d [InferAttrs] Add writeonly to all the math functions
All of these functions would be `readnone`, but can't be on platforms
where they can set `errno`. A `writeonly` function with no pointer
arguments can only write (but never read) global state.

Writeonly theoretically allows these calls to be CSE'd (a writeonly call
with the same arguments will always result in the same global stores) or
hoisted out of loops, but that's not implemented currently.

There are a few functions in this list that could be `readnone` instead
of `writeonly`, if someone is interested.

Differential Revision: https://reviews.llvm.org/D116426
2022-01-04 16:58:05 +01:00
Florian Hahn
d8276208be
[LAA] Remove overeager assertion for aggregate types.
0a00d64 turned an early exit here into an assertion, but the assertion
can be triggered, as PR52920 shows.

The later code is agnostic to the accessed type, so just drop the
assert. The patch also adds tests for LAA directly and
loop-load-elimination to show the behavior is sane.
2022-01-04 15:20:35 +00:00
Nikita Popov
6c031780aa [ConstantFold] Remove another incorrect icmp of gep fold
This folded (null + X) == g to false, but of course this is
incorrect if X == g.

Possibly this got confused with the null == g case, which is
already handled elsewhere.
2022-01-04 16:08:09 +01:00
Nikita Popov
25448826dd [InstSimplify] Update test to make miscompile more obvious (NFC)
This is now testing (null + g3) != g3 and still coming up with
"true" as the answer. The original case was a less obvious
miscompile with index overflow involved.
2022-01-04 16:08:09 +01:00
Nikita Popov
d74212987b [ConstantFold] Remove unnecessary bounded index restriction
The fold for merging a GEP of GEP into a single GEP currently bails
if doing so would result in notional overindexing. The justification
given in the comment above this check is dangerously incorrect: GEPs
with notional overindexing are perfectly fine, and if some code
treats them incorrectly, then that code is broken, not the GEP.
Such a GEP might legally appear in source IR, so only preventing
its creation cannot be sufficient. (The constant folder also ends
up canonicalizing the GEP to remove the notional overindexing, but
that's neither here nor there.)

This check dates back to
bd4fef4a89,
and as far as I can tell the original issue this was trying to
patch around has since been resolved.

Differential Revision: https://reviews.llvm.org/D116587
2022-01-04 15:23:09 +01:00
Nikita Popov
75db002725 [ConstantFold] Remove another incorrect icmp of GEP fold
This fold is not correct, because indices might evaluate to zero
even if they are not a literal zero integer. Additionally, this
fold would be wrong (in the general case) for non-i8 types as well,
due to index overflow.

Drop this fold and instead let the target-dependent constant
folder compute the actual offset and fold the comparison based
on that.
2022-01-04 12:27:40 +01:00
Nikita Popov
aefab6f8d5 [InstSimplify] Use weak symbol in test to show miscompile (NFC)
This fold is incorrect, because it assumes that all indices are
non-zero. This happens to be true for the test as written, but
doesn't hold if we use an extern weak global instead, for which
ptrtoint might be zero.

Add separate tests for the simple constant int case.
2022-01-04 12:27:40 +01:00