529 Commits

Author SHA1 Message Date
Han Zhu
68a07c156e [SROA] Fix bug where CandidateTys is appended while being iterated
Fix a crash when compiling Skia. See https://reviews.llvm.org/D143225#4180342
for more details
2023-03-09 01:20:08 -08:00
Han Zhu
f9c2a341b9 [SROA] Create additional vector type candidates based on store and load slices
Second try at A-Wadhwani's https://reviews.llvm.org/D132096, which was reverted.
The original patch had three issues:
* https://reviews.llvm.org/D134032, which bjope kindly fixed. That patch is merged into this one.
* [GHI #57796](https://github.com/llvm/llvm-project/issues/57796). Fixed and added a test.
* [GHI #57821](https://github.com/llvm/llvm-project/issues/57821). I believe this is an undefined behavior which is not the fault of the original patch. Please see the issue for more details.

Original diff summary:

This patch adds additional vector types to be considered when doing promotion in
SROA, based on the types of the store and load slices. This provides more
promotion opportunities, by potentially using an optimal "intermediate" vector
type.

For example, the following code would currently not be promoted to a vector,
since `__m128i` is a `<2 x i64>` vector.
```

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
```
```
packfoo0(int, int, int, int):
  mov     dword ptr [rsp - 24], edi
  mov     dword ptr [rsp - 20], esi
  mov     dword ptr [rsp - 16], edx
  mov     dword ptr [rsp - 12], ecx
  movaps  xmm0, xmmword ptr [rsp - 24]
  ret
```
By also considering the types of the elements, we could find that the `<4 x i32>` type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can
provide us more promotion opportunities.

Additionally, the step for removing duplicate elements from the `CandidateTys` vector was not using an equality comparator, which has been fixed.

Differential Revision: https://reviews.llvm.org/D143225
2023-03-08 12:01:31 -08:00
Han Zhu
d888496e3c [SROA] Fix bug where RankVectorTypes is used in std::unique
`RankVectorTypes` is a not an equivalence relation so when it is used in
`std::unique`, the behavior is undefined. Create `RankVectorTypesEq` and use
that instead.
2023-03-07 14:10:58 -08:00
Arthur Eubanks
edd021368e [SROA] Make order of analysis fetching more predictable
For pipeline tests.
2023-03-06 09:01:29 -08:00
J. Ryan Stinnett
0bbe6040be [DebugInfo] Remove dbg.addr from Transforms
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144797
2023-03-02 09:29:43 +00:00
OCHyams
620a529760 [Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call
Enabling assignment tracking without this patch, a significant amount of
additional compiler run time comes from the RemoveRedundantDbgInstrs call in
InstCombine. This patch reduces compiler run time by choosing better places to
call RemoveRedundantDbgInstrs.

In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by
InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_
called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We
still need to clean up redundant intrinsics to avoid a large performance hit
due to the number of instructions, so the current approach is to have
InstCombine _always_ call RemoveRedundantDbgInstrs.

Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and
dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and
LoopVectorize give us a bigger bang (number removed) for buck (times pass is
run).

The compile time tracker reports that this patch reduces the number of
instructions retired building CTMark projects by an average of 1.1%.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D144483
2023-02-22 16:28:06 +00:00
Sander de Smalen
462227f115 [SROA] NFC: Look at TypeStoreSize scalable property, rather than at type directly.
Some places in the code have checks for isa<ScalableVectorType> and use
that to bail out of the code. It's also possible to look directly at the
allocated type-size and check if the size is scalable. This means it's
possible to also support other scalable types that are not vectors (i.e.
TargetExtType).

This is split out from D136861.
2023-02-15 09:01:14 +00:00
OCHyams
295f5fafcb [Assignment Tracking] Fix migrateDebuginfo in SROA
Without this patch, migrateDebugInfo doesn't understand how to handle existing
fragments that are smaller than the to-be-split store. This can occur
if. e.g. a vector store (1 dbg.assign) is split (many dbg.assigns - 1 fragment
for each scalar) and later those stores are re-vectorized (many dbg.assigns),
and then SROA runs on that.

The approach taken in this patch is to drop intrinsics with fragments outside
of the slice.

For example, starting with:

  store <2 x float> %v, ptr %dest !DIAssignID !1
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !1, ...)
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !1, ...)

When visiting the slice of bits 0 to 31 we get:

  store float %v.extract.0, ptr %dest !DIAssignID !2
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !2, ...)

The other dbg.assign associated with the currently-split store is dropped for
this split part. And visiting bits 32 to 63 we get the following:

  store float %v.extract.1, ptr %adjusted.dest !DIAssignID !3
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !3, ...)

I've added two tests that cover this case.

Implementing this meant re-writing the fragment-calculation part of
migrateDebugInfo to work with the absolute offset of the new slice in terms of
the base alloca (instead of the offset of the slice into the new alloca), the
fragment (if any) of the variable associated with the base alloca, and the
fragment associated with the split store. Because we need the offset into the
base alloca for the variables being split, some careful wiring is required for
memory intrinsics due to the fact that memory intrinsics can be split when
either the source or dest allocas are split. In the case where the source
alloca drives the splitting, we need to be careful to pass migrateDebugInfo the
information in relation to the dest alloca.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D143146
2023-02-10 18:10:11 +00:00
OCHyams
bb059e85d6 [Assignment Tracking][SROA] Delete dbg.assigns linked to rewritten stores
AggLoadStoreRewriter splits aggregate loads and stores into scalars (before the
alloca is split up). The new stores and debug intrinsics are already wired up
correctly - we just need to also delete the dbg.assign that is linked to the
split to-be-deleted store too.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D142882
2023-02-10 09:57:05 +00:00
Nikita Popov
e7fbb381c8 [SROA] Break typed pointer support
This removes typed pointer support in a prominent place in the
optimization pipeline, to ensure that any non-trivial consumers
of tip-of-tree LLVM are aware that this is no longer a supported
configuration.
2023-01-26 11:42:12 +01:00
Nikita Popov
d49b842ea2 [SROA] Use copyMetadataForLoad() helper
Instead of copying just nonnull metadata, use the generic helper
to copy metadata to the new load. This helper is specifically
designed for the case where the load type may change, so it's
safe to use in this context.
2023-01-20 15:24:10 +01:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Roman Lebedev
2cb393590e
Reland "[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now"
This reverts commit bf88ba0f8718c1e89e28e977839ad0a6186d44fe,
relands 9f27f4536e19e93349b0662338408efe6d1cb2fd, but without a bug:
we *REALLY* should not be defaulting to address space 0
when address space is not specified...
2022-12-22 00:47:40 +03:00
Max Kazantsev
bf88ba0f87 Revert "[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now"
This reverts commit 9f27f4536e19e93349b0662338408efe6d1cb2fd.

Supposed to be NFC, but broke buildbots (test addrspacecast.ll is failing).
2022-12-21 11:21:56 +07:00
Roman Lebedev
9f27f4536e
[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now
As requested in https://reviews.llvm.org/D138238#inline-1356685
2022-12-21 05:17:02 +03:00
Joshua Cranmer
e6b02214c6 [IR] Add a target extension type to LLVM.
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.

Reviewed By: nikic, barannikov88

Differential Revision: https://reviews.llvm.org/D135202
2022-12-20 11:02:11 -05:00
Roman Lebedev
96d3c82645
Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)"
While the PPC litte-endian miscompile did get addressed
by https://reviews.llvm.org/D140046
the PPV big-endian bots are still unhappy.
https://lab.llvm.org/buildbot/#/builders/93/builds/12560

This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.
2022-12-16 22:58:41 +03:00
Roman Lebedev
cfd594f8bb
[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)
* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f,
* which was reverted in 25f01d593ce296078f57e872778b77d074ae5888,
  because it exposed a miscompile in PPC backend,  which was resolved
  in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5.
* which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a,
* which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df,
  because the cut-off on the number of vector elements was not low enough,
  and it triggered both SDAG SDNode operand number assertions,
  5and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-12-16 19:27:38 +03:00
OCHyams
f354716b05 Reapply [Assignment Tracking][13/*] Account for assignment tracking in SROA
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Split dbg.assign intrinsics into fragments similarly to what SROA already does
for dbg.declares, except that there's many more intrinsics to split. The
function migrateDebugInfo generates new dbg.assigns intrinsic for each part of
a split store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133296
2022-12-13 12:52:45 +00:00
Krasimir Georgiev
5e6351d927 Revert "[Assignment Tracking][13/*] Account for assignment tracking in SROA"
This reverts commit 3bfba672afd52dfd5bde54dc8b67ec96275f9e15.

Temporary revert since this potentially causes
https://github.com/llvm/llvm-project/issues/59490.
2022-12-13 09:54:26 +00:00
Roman Lebedev
fd21361a79
[NFC][SROA] rewriteMemOpOfSelect(): play nice with typed pointers for now
89a6106ce5 (commitcomment-92824429)
2022-12-12 21:16:14 +03:00
OCHyams
3bfba672af [Assignment Tracking][13/*] Account for assignment tracking in SROA
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Split dbg.assign intrinsics into fragments similarly to what SROA already does
for dbg.declares, except that there's many more intrinsics to split. The
function migrateDebugInfo generates new dbg.assigns intrinsic for each part of
a split store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133296
2022-12-12 09:24:06 +00:00
Roman Lebedev
89a6106ce5
[SROA] Rewrite store-into-selected-address into predicated stores
Same basic idea as with unfolding loads into predicated loads,
but we obviously can't have speculative stores.
2022-12-10 21:07:03 +03:00
Roman Lebedev
4f7e5d2206
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node, take 2
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238

This reverts commit 739611870d3b06605afe25cc07833f6a62de9545,
and recommits 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5
with a fixed assertion - we should check that DTU is there,
not just assert false...
2022-12-08 20:19:55 +03:00
Roman Lebedev
739611870d
Revert "[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node"
The assertion about not modifying the CFG seems to not hold,
will recommit in a bit.

https://lab.llvm.org/buildbot#builders/139/builds/32412

This reverts commit 03e6d9d9d1d48e43f3efc35eb75369b90d4510d5.
This reverts commit 4f90f4ada33718f9025d0870a4fe3fe88276b3da.
2022-12-08 19:51:15 +03:00
Roman Lebedev
4f90f4ada3
[NFCI][SROA] rewriteSelectInstLoads(): add forgotten false into assertion 2022-12-08 19:40:35 +03:00
Roman Lebedev
03e6d9d9d1
[SROA] For non-speculatable loads of selects -- split block, insert then/else blocks, form two-entry PHI node
Currently, SROA is CFG-preserving.
Not doing so does not affect any pipeline test. (???)
Internally, SROA requires Dominator Tree, and uses it solely for the final `-mem2reg` call.

By design, we can't really SROA alloca if their address escapes somehow,
but we have logic to deal with `load` of `select`/`PHI`,
where at least one of the possible addresses prevents promotion,
by speculating the `load`s and `select`ing between loaded values.

As one would expect, that requires ensuring that the speculation is actually legal.
Even ignoring complexity bailouts, that logic does not deal with everything,
e.g. `isSafeToLoadUnconditionally()` does not recurse into hands of `select`.
There can also be cases where the load is genuinely non-speculate.

So if we can't prove that the load can be speculated,
unfold the select, produce two-entry phi node, and perform predicated load.

Now, that transformation must obviously update Dominator Tree,
since we require it later on. Doing so is trivial.
Additionally, we don't want to do this for the final SROA invocation (D136806).

In the end, this ends up having negative (!) compile-time cost:
https://llvm-compile-time-tracker.com/compare.php?from=c6d7e80ec4c17a415673b1cfd25924f98ac83608&to=ddf9600365093ea50d7e278696cbfa01641c959d&stat=instructions:u

Though indeed, this only deals with `select`s, `PHI`s are still using speculation.

Should we update some more analysis?

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138238
2022-12-08 16:51:32 +03:00
Matt Arsenault
27387896cf SROA: Simplify addrspacecasted allocas with volatile accesses
If the alloca is accessed through an addrspacecasted pointer, allow
the normal changes on the alloca. Cast back to the original use
address space instead of the new alloca's natural address space.
2022-12-02 15:20:56 -05:00
OCHyams
9517806064 Revert "[Assignment Tracking][13/*] Account for assignment tracking in SROA"
This reverts commit e16d59973ffec77eeef73409570bdf04a69c2405.

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/236/builds/1205
2022-11-28 16:07:34 +00:00
OCHyams
5e0b29bf23 Revert "[Assignment Tracking][SROA] Follow-up for failing test"
This reverts commit 285d46ef4b60c0919c00661199c1b010996cc2c1.

Failing buildbot:
https://lab.llvm.org/buildbot/#/builders/236/builds/1205
2022-11-28 16:07:34 +00:00
OCHyams
285d46ef4b [Assignment Tracking][SROA] Follow-up for failing test
Follow-up for D133296 / e16d59973ffec77eeef73409570bdf04a69c2405

Buildbot example: https://lab.llvm.org/buildbot/#/builders/6/builds/16989
2022-11-28 12:16:37 +00:00
OCHyams
e16d59973f [Assignment Tracking][13/*] Account for assignment tracking in SROA
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Split dbg.assign intrinsics into fragments similarly to what SROA already does
for dbg.declares, except that there's many more intrinsics to split. The
function migrateDebugInfo generates new dbg.assigns intrinsic for each part of
a split store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133296
2022-11-28 11:31:59 +00:00
Roman Lebedev
25f01d593c
Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots.
Sent a mail with request for help.

This reverts commit 3c4d2a03968ccf5889bacffe02d6fa2443b0260f.
2022-11-27 00:00:06 +03:00
Roman Lebedev
3c4d2a0396
[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc5d5a6161706d1663def49380ff816a,
which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it (to 64/128) later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-26 23:19:15 +03:00
Benjamin Kramer
5cfc22cafe Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes"
This reverts commit cf624b23bc5d5a6161706d1663def49380ff816a. It
triggers crashes in clang, see the comments on github on the original
change.
2022-11-23 13:11:16 +01:00
Roman Lebedev
655d857325
[SROA] isVectorPromotionViable(): avoid allowing overly large vectors
Otherwise, `compiler-rt/test/asan/TestCases/pr33372.cpp` fails with an assertion:
```
clang-16: /repositories/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:11988: void llvm::SelectionDAG::createOperands(llvm::SDNode *, ArrayRef<llvm::SDValue>): Assertion `SDNode::getMaxNumOperands() >= Vals.size() && "too many operands to fit into SDNode"' failed.
```
I'm not sure if this should be even more conservative,
or if we have a named constant for this in middle-end.
2022-11-23 03:23:08 +03:00
Roman Lebedev
cf624b23bc
[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes
Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-23 02:38:25 +03:00
Roman Lebedev
529eafd9be
[SROA] isVectorPromotionViable(): integer-ify non-pointer non-common types
This rectifies a FIXME that dates all the way back
to 2014 about not doing so due to the backend issues.

Presumably sufficient amount of time has passes
and all the known issues have been addressed,
or at least we will find out of there are some left...
2022-11-23 00:23:00 +03:00
Roman Lebedev
4e18d51ac5
[SROA] isVectorPromotionViable(): pointer-ness is sticky
As it has been established previously by precedent,
if we see a pointer type, then that is the type we must use.
Essentially, we don't want to introduce `inttoptr`'s.
2022-11-23 00:23:00 +03:00
Arthur Eubanks
6219ec07c6 [SROA] Don't speculate phis with different load user types
Fixes an SROA crash.

Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D136119
2022-10-18 08:44:13 -07:00
Arthur Eubanks
308b4bca14 [NFC][SROA] Update comment to use opaque pointers for clarity 2022-10-17 16:37:29 -07:00
Douglas Yung
91e0423595 Revert "[SROA] Create additional vector type candidates based on store and load slices"
This reverts commit de3445e0ef15c420955ad720fccf08473f460443.

This is causing GHI #57796 and #57821.
2022-09-23 12:24:07 -07:00
Douglas Yung
0a7f4e03a9 Revert "[SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable"
This reverts commit 3f08d248c44c744deda38423409b86720822739e.

The commit this change is fixing is being reverted due to GHI #57796 and #37821, so revert this commit as well.
2022-09-23 12:24:07 -07:00
Bjorn Pettersson
3f08d248c4 [SROA] Check typeSizeEqualsStoreSize in isVectorPromotionViable
Commit de3445e0ef15c4209 (https://reviews.llvm.org/D132096) made
changes to isVectorPromotionViable basically doing

  // Create Vector with size of V, and each element of type Ty
  ...
  uint64_t ElementSize = DL.getTypeStoreSizeInBits(Ty).getFixedSize();
  uint64_t VectorSize = DL.getTypeSizeInBits(V).getFixedSize();
  ...
  VectorType *VTy = VectorType::get(Ty, VectorSize / ElementSize, false);

Not quite sure why it uses the TypeStoreSize for the ElementSize,
but the new vector would only match in size with the old vector in
situations when the TypeStoreSize equals the TypeSize for Ty.
Therefore this patch adds a typeSizeEqualsStoreSize check as yet
another condition for allowing the the new type as a promotion
candidate.

Without this fix the new @test15 test would fail with an assert
like this:

opt: ../lib/Transforms/Scalar/SROA.cpp:1966:
  auto isVectorPromotionViable(llvm::sroa::Partition &,
                               const llvm::DataLayout &)
       ::(anonymous class)::operator()(llvm::VectorType *,
                                       llvm::VectorType *) const:
  Assertion `DL.getTypeSizeInBits(RHSTy).getFixedSize() ==
             DL.getTypeSizeInBits(LHSTy).getFixedSize() &&
             "Cannot have vector types of different sizes!"' failed.
 ...
 #8  isVectorPromotionViable(...)::$_10::operator()...
 #9  llvm::SROAPass::rewritePartition(...)
#10  llvm::SROAPass::splitAlloca(...)
#11  llvm::SROAPass::runOnAlloca(...)
#12  llvm::SROAPass::runImpl(...)
#13  llvm::SROAPass::run(...)

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D134032
2022-09-21 09:45:05 +02:00
A-Wadhwani
de3445e0ef [SROA] Create additional vector type candidates based on store and load slices
This patch adds additional vector types to be considered when doing
promotion in SROA, based on the types of the store and load slices. This
provides more promotion opportunities, by potentially using an optimal
"intermediate" vector type.

For example, the following code would currently not be promoted to a
vector, since `__m128i` is a `<2 x i64>` vector.

```

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
```

```
packfoo0(int, int, int, int):
        mov     dword ptr [rsp - 24], edi
        mov     dword ptr [rsp - 20], esi
        mov     dword ptr [rsp - 16], edx
        mov     dword ptr [rsp - 12], ecx
        movaps  xmm0, xmmword ptr [rsp - 24]
        ret
```

By also considering the types of the elements, we could find that the
`<4 x i32>` type would be valid for promotion, hence removing the memory
accesses for this function. In other words, we can explore other new
vector types, with the same size but different element types based on
the load and store instructions from the Slices, which can provide us
more promotion opportunities.

Additionally, the step for removing duplicate elements from the
`CandidateTys` vector was not using an equality comparator, which has
been fixed.

Differential Revision: https://reviews.llvm.org/D132096
2022-09-12 09:55:37 -07:00
Vang Thao
257251247a [SROA] Try harder to find a vector promotion viable type when rewriting
We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128073
2022-08-08 11:04:01 -07:00
Alex Bradbury
9bf2d8cbbe [NFC] Use AllocaInst's getAddressSpace helper 2022-08-01 10:11:16 +01:00
Nikita Popov
0af53fcb99 [SROA] Don't create constant expressions (NFC)
Use IRBuilder instead, which will fold these. Just to clarify
that this does not actually create any udiv expression.
2022-06-29 11:51:22 +02:00
Guillaume Chatelet
12ccdd67aa [NFC] Use proper getSliceAlign type in SROA 2022-06-10 12:37:41 +00:00