560 Commits

Author SHA1 Message Date
Nikita Popov
54067c5fbe [SROA] Use memcpy if type size does not match store size
The original memcpy also copies the padding, so make sure that
this is still the case after splitting.

Fixes https://github.com/llvm/llvm-project/issues/64081.
2023-12-22 10:19:22 +01:00
Kazu Hirata
4b4dcb4988 [Transforms] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Scalar/SROA.cpp:4855:9: error: unused variable
  'NewAssign' [-Werror,-Wunused-variable]
2023-12-12 08:50:49 -08:00
Orlando Cazalet-Hyams
3d42557872
[RemoveDI] Handle DPValues in SROA (#74089)
Handle dbg.declares in SROA using DPValues.

In order to reduce duplication, the migrate-debug-info loop has been changed
to a generic lambda with some helper function overloads, which is called
for dbg.declares, dbg.assigns, and DPValues alike.

The tests will become "live" once #74090 lands (see for more info).
2023-12-12 15:49:24 +00:00
Orlando Cazalet-Hyams
2d9d9a1a55
[NFC] Change FindDbgDeclareUsers interface to match findDbgUsers/values (#73498)
This simplifies an upcoming patch to support the RemoveDIs project (tracking
variable locations without using intrinsics).

Next in this series is #73500.
2023-12-12 09:43:58 +00:00
Youngsuk Kim
c419f3e10e [llvm][SROA] Replace calls to Type::getPointerTo (NFC)
NFC cleanup towards removing method Type::getPointerTo.

* Remove unnecessary call to Type::getPointerTo
* Replace call to Type::getPointerTo with IRB.getPtrTy
2023-11-26 20:17:06 -06:00
Craig Topper
aba040182a
[IR] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC (#73154) 2023-11-22 12:24:18 -08:00
Bruno De Fraine
cd88047765
[NFC][SROA] Remove implementation details from SROA header (#72846)
This moves the SROA implementation from SROAPass into a separate SROA
class that is defined in the cpp file, and reduces the SROAPass class to
a thin NewPM wrapper. This allows to remove all implementation details
from the SROA header, and the SROALegacyPass can wrap the SROA class
instead of the NewPM SROAPass.

The trigger for this change is a GCC warning about visibility of
implementation details in the SROA header after D138238. Credits to
Nikita Popov for suggesting this reorganization.
2023-11-20 14:00:22 +01:00
Antonio Frighetto
ebbb9cdbb3 [SROA] Allow llvm.launder.invariant.group intrinsic to be splittable
Let `llvm.launder.invariant.group` intrinsic as well as instructions
operating on memory addresses, whose invariance may be broken by the
intrinsic, to be rewritten.

Fixes: https://github.com/llvm/llvm-project/issues/72035.
2023-11-16 17:56:34 +01:00
Nikita Popov
ed86e740ef Revert "[SROA] Limit the number of allowed slices when trying to split allocas"
This reverts commit e13e808283f7fd9e873ae922dd1ef61aeaa0eb4a.

This causes performance regressions on GPU targets, see
https://github.com/llvm/llvm-project/issues/69785. Revert the
change for now.
2023-11-09 16:38:52 +01:00
Kazu Hirata
8a7f4eeb60 [llvm] Use llvm::is_contained (NFC) 2023-09-22 17:09:27 -07:00
Nikita Popov
57a554800b [SROA] Don't shrink volatile load past end
For volatile atomic, this may result in a verifier errors, if the
new alloca type is not legal for atomic accesses.

I've opted to disable this special case for volatile accesses in
general, as changing the size of the volatile access seems
dubious in any case.

Fixes https://github.com/llvm/llvm-project/issues/64721.
2023-09-20 14:12:31 +02:00
Nikita Popov
ddf7cc27f7 [SROA] Remove unnecessary IsStorePastEnd handling (NFCI)
Unlike the load case, stores past the end of the alloca are
removed by SROA as undefined behavior. As such, there is no need
to handle this case when rewriting stores.
2023-09-19 16:53:24 +02:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Dhruv Chawla
e13e808283
[SROA] Limit the number of allowed slices when trying to split allocas
This patch adds a hidden CLI option "--sroa-max-alloca-slices", which is
an integer that controls the maximum number of alloca slices SROA can
consider before bailing out. This is useful because it may not be
profitable to split memcpys into (possibly tens of) thousands of loads/stores.
This also prevents an issue with exponential compile time explosion in passes
like DSE and MemCpyOpt caused by excessive alloca splitting.

Fixes https://github.com/rust-lang/rust/issues/88580.

Differential Revision: https://reviews.llvm.org/D159354
2023-09-09 11:00:47 +05:30
Zhongyunde
c570531c3d [SROA] Skip uses of allocas where the type is scalable
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, similar to D85725.

Fix https://gcc.godbolt.org/z/o561P9zj4

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158631
2023-08-24 18:44:09 +08:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Bjorn Pettersson
4ce7c4a92a [llvm] Drop some typed pointer handling/bitcasts
Differential Revision: https://reviews.llvm.org/D157016
2023-08-03 22:54:33 +02:00
Nikita Popov
61e0822efa [llvm][clang] Remove uses of isOpaquePointerTy() (NFC)
This now always returns true (for pointer types).
2023-07-14 10:27:58 +02:00
eopXD
c8eb535aed [1/11][IR] Permit load/store/alloca for struct of the same scalable vector type
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.

To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.

The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198

---

This is the 1st patch of the patch-set. This patch is originated from
D98169.

This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.

https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527

The main changes in this patch are:

Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.

Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.

Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors

Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.

Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>

Reviewed By: craig.topper, nikic

Differential Revision: https://reviews.llvm.org/D146872
2023-05-19 09:39:36 -07:00
Nikita Popov
53500e333d Reapply [SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating
This exposed another miscompile in GVN, which was fixed by
20e9b31f88149a1d5ef78c0be50051e345098e41.

-----

After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.

This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.

Differential Revision: https://reviews.llvm.org/D146629
2023-04-20 14:17:15 +02:00
OCHyams
571eaead17 Reapply "[Assignment Tracking] Fix fragment error for some DSE-shortened stores"
This reverts commit 6db6ab4815a44bfcaabfcdd84a0ff458394f6f52 which reverts
D148536.

Build issues addressed in D148698.
2023-04-19 13:36:47 +01:00
OCHyams
6db6ab4815 Revert "[Assignment Tracking] Fix fragment error for some DSE-shortened stores"
This reverts commit fca3e8e024f0015604d21e6f76f3e199345679c5.

Buildbot: https://lab.llvm.org/buildbot/#/builders/121/builds/29766
2023-04-19 10:03:33 +01:00
OCHyams
fca3e8e024 [Assignment Tracking] Fix fragment error for some DSE-shortened stores
`shortenAssignment` inserts dbg.assigns with fragments describing the dead part
of a shortened store after each dbg.assign linked to the store.

Without this patch it doesn't take into account that the dead part of a
shortened store may be outside the bounds of a variable of a linked
dbg.assign. It also doesn't correctly account for a non-zero offset in the
address modifying `DIExpression` of the dbg.assign (which is possible for
fragments now even though whole variables currently cannot have a non-zero
offset in their alloca).

Fix this by moving the dead slice into variable-space and performing an
intersect of that adjusted slice with the existing fragment.

This fixes a verifier error reported when building fuchsia with assignment
tracking enabled:
https://ci.chromium.org/ui/p/fuchsia/builders/ci/
        clang_toolchain.ci.core.x64-release/b8784000953022145169/overview

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D148536
2023-04-19 09:32:09 +01:00
Krasimir Georgiev
bf7f6b4436 Revert "Reapply [SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating"
This reverts commit 6f7e5c0f1ac6cc3349a2e1479ac4208465b272c6.

Seems to expose a miscompile in rust, possibly exposing a bug in LLVM
somewhere. Investigation thread over at:
https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/LLVM.20D146629.20breakage
2023-04-19 08:28:48 +00:00
Nikita Popov
6f7e5c0f1a Reapply [SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating
This exposed a miscompile in GVN, which was fixed by D148129.

-----

After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.

This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.

Differential Revision: https://reviews.llvm.org/D146629
2023-04-17 14:15:14 +02:00
DianQK
2832d7941f
[SROA] Remove UB-implying metadata when promoting speculative instruction.
After D138238 introduced the then/else blocks, we should remove UB-implying metadata for the promoted speculative instruction.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148456
2023-04-16 22:35:52 +08:00
Kazu Hirata
1ca496bd61 Remove redundant initialization of std::optional (NFC) 2023-04-16 00:40:05 -07:00
OCHyams
9106960724 [Assignment Tracking][SROA] Don't un-poison dbg.assigns using multiple loc ops
Some dbg.assigns using poison become un-poisoned in SROA. The reason this
happens at all is because dbg.assigns linked to memory intrinsics use poison to
indicate they can't describe the stored value, but the value becomes available
after some optimisations. This needs reworking eventually, but for now we need
to ensure that when it does occur we don't create invalid expressions.

D147312 prevented this occuring when the dbg.assign uses DIArgLists, but that
wasn't a complete fix. We also need to ensure we avoid un-poisoning when the
existing expression uses more than one location operand (DW_OP_arg, n).

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D148020
2023-04-11 18:18:11 +01:00
OCHyams
086635d6b9 [Assignment Tracking][SROA] Fix fragment when slice size equals variable size
Correctly handle the case of splitting an alloca which backs contiguous
distinct variables, where a slice's size equals the size of a backed variable.

We need to ensure that we don't generate fragments expressions with fragments
of the same size as the variable as this is a verifier error.

Prior to this patch a fragment expression would be created in this
situation. e.g. splitting an alloca i64 with two adjacent 32-bit variables into
two 32-bit allocas, the new dbg.assign expressions would contain
(DW_OP_LLVM_fragment, 0, 32) and (DW_OP_LLVM_fragment, 32, 32) even though
those fragments cover each variable entirely.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147696
2023-04-06 15:29:18 +01:00
OCHyams
76740fb40e [Assignment Tracking][SROA] Handle createFragmentExpression failure
createFragmentExpression will fail if it determines that the expression cannot
be split over fragments. Handle this case in SROA. Similarly to D147312 this
should be a rare occurrence as the `dbg.assign` will usually reference the
`Value` being stored without modifying it with a `DIExpression`.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147431
2023-04-05 11:20:32 +01:00
OCHyams
01c02cded1 [Assignment Tracking][SROA] Handle DIArgList in migrateDebugInfo
If the to-be-split dbg.assign has a `DIArgList` and a new `Value` has been
requested then use a kill-location for the new dbg.assign. We can't simply
replace the value component (a `DIArgList`) with the new `Value` as that would
leave the `DIExpression` in an invalid state (`DW_OP_LLVM_arg` operands with no
arglist).

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147312
2023-03-31 12:38:47 +01:00
Han Zhu
68a07c156e [SROA] Fix bug where CandidateTys is appended while being iterated
Fix a crash when compiling Skia. See https://reviews.llvm.org/D143225#4180342
for more details
2023-03-09 01:20:08 -08:00
Han Zhu
f9c2a341b9 [SROA] Create additional vector type candidates based on store and load slices
Second try at A-Wadhwani's https://reviews.llvm.org/D132096, which was reverted.
The original patch had three issues:
* https://reviews.llvm.org/D134032, which bjope kindly fixed. That patch is merged into this one.
* [GHI #57796](https://github.com/llvm/llvm-project/issues/57796). Fixed and added a test.
* [GHI #57821](https://github.com/llvm/llvm-project/issues/57821). I believe this is an undefined behavior which is not the fault of the original patch. Please see the issue for more details.

Original diff summary:

This patch adds additional vector types to be considered when doing promotion in
SROA, based on the types of the store and load slices. This provides more
promotion opportunities, by potentially using an optimal "intermediate" vector
type.

For example, the following code would currently not be promoted to a vector,
since `__m128i` is a `<2 x i64>` vector.
```

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
```
```
packfoo0(int, int, int, int):
  mov     dword ptr [rsp - 24], edi
  mov     dword ptr [rsp - 20], esi
  mov     dword ptr [rsp - 16], edx
  mov     dword ptr [rsp - 12], ecx
  movaps  xmm0, xmmword ptr [rsp - 24]
  ret
```
By also considering the types of the elements, we could find that the `<4 x i32>` type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can
provide us more promotion opportunities.

Additionally, the step for removing duplicate elements from the `CandidateTys` vector was not using an equality comparator, which has been fixed.

Differential Revision: https://reviews.llvm.org/D143225
2023-03-08 12:01:31 -08:00
Han Zhu
d888496e3c [SROA] Fix bug where RankVectorTypes is used in std::unique
`RankVectorTypes` is a not an equivalence relation so when it is used in
`std::unique`, the behavior is undefined. Create `RankVectorTypesEq` and use
that instead.
2023-03-07 14:10:58 -08:00
Arthur Eubanks
edd021368e [SROA] Make order of analysis fetching more predictable
For pipeline tests.
2023-03-06 09:01:29 -08:00
J. Ryan Stinnett
0bbe6040be [DebugInfo] Remove dbg.addr from Transforms
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144797
2023-03-02 09:29:43 +00:00
OCHyams
620a529760 [Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call
Enabling assignment tracking without this patch, a significant amount of
additional compiler run time comes from the RemoveRedundantDbgInstrs call in
InstCombine. This patch reduces compiler run time by choosing better places to
call RemoveRedundantDbgInstrs.

In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by
InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_
called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We
still need to clean up redundant intrinsics to avoid a large performance hit
due to the number of instructions, so the current approach is to have
InstCombine _always_ call RemoveRedundantDbgInstrs.

Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and
dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and
LoopVectorize give us a bigger bang (number removed) for buck (times pass is
run).

The compile time tracker reports that this patch reduces the number of
instructions retired building CTMark projects by an average of 1.1%.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D144483
2023-02-22 16:28:06 +00:00
Sander de Smalen
462227f115 [SROA] NFC: Look at TypeStoreSize scalable property, rather than at type directly.
Some places in the code have checks for isa<ScalableVectorType> and use
that to bail out of the code. It's also possible to look directly at the
allocated type-size and check if the size is scalable. This means it's
possible to also support other scalable types that are not vectors (i.e.
TargetExtType).

This is split out from D136861.
2023-02-15 09:01:14 +00:00
OCHyams
295f5fafcb [Assignment Tracking] Fix migrateDebuginfo in SROA
Without this patch, migrateDebugInfo doesn't understand how to handle existing
fragments that are smaller than the to-be-split store. This can occur
if. e.g. a vector store (1 dbg.assign) is split (many dbg.assigns - 1 fragment
for each scalar) and later those stores are re-vectorized (many dbg.assigns),
and then SROA runs on that.

The approach taken in this patch is to drop intrinsics with fragments outside
of the slice.

For example, starting with:

  store <2 x float> %v, ptr %dest !DIAssignID !1
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !1, ...)
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !1, ...)

When visiting the slice of bits 0 to 31 we get:

  store float %v.extract.0, ptr %dest !DIAssignID !2
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 0, 32), !2, ...)

The other dbg.assign associated with the currently-split store is dropped for
this split part. And visiting bits 32 to 63 we get the following:

  store float %v.extract.1, ptr %adjusted.dest !DIAssignID !3
  call void @llvm.dbg.assign(..., DIExpression(DW_OP_LLVM_fragment, 32, 32), !3, ...)

I've added two tests that cover this case.

Implementing this meant re-writing the fragment-calculation part of
migrateDebugInfo to work with the absolute offset of the new slice in terms of
the base alloca (instead of the offset of the slice into the new alloca), the
fragment (if any) of the variable associated with the base alloca, and the
fragment associated with the split store. Because we need the offset into the
base alloca for the variables being split, some careful wiring is required for
memory intrinsics due to the fact that memory intrinsics can be split when
either the source or dest allocas are split. In the case where the source
alloca drives the splitting, we need to be careful to pass migrateDebugInfo the
information in relation to the dest alloca.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D143146
2023-02-10 18:10:11 +00:00
OCHyams
bb059e85d6 [Assignment Tracking][SROA] Delete dbg.assigns linked to rewritten stores
AggLoadStoreRewriter splits aggregate loads and stores into scalars (before the
alloca is split up). The new stores and debug intrinsics are already wired up
correctly - we just need to also delete the dbg.assign that is linked to the
split to-be-deleted store too.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D142882
2023-02-10 09:57:05 +00:00
Nikita Popov
e7fbb381c8 [SROA] Break typed pointer support
This removes typed pointer support in a prominent place in the
optimization pipeline, to ensure that any non-trivial consumers
of tip-of-tree LLVM are aware that this is no longer a supported
configuration.
2023-01-26 11:42:12 +01:00
Nikita Popov
d49b842ea2 [SROA] Use copyMetadataForLoad() helper
Instead of copying just nonnull metadata, use the generic helper
to copy metadata to the new load. This helper is specifically
designed for the case where the load type may change, so it's
safe to use in this context.
2023-01-20 15:24:10 +01:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Roman Lebedev
2cb393590e
Reland "[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now"
This reverts commit bf88ba0f8718c1e89e28e977839ad0a6186d44fe,
relands 9f27f4536e19e93349b0662338408efe6d1cb2fd, but without a bug:
we *REALLY* should not be defaulting to address space 0
when address space is not specified...
2022-12-22 00:47:40 +03:00
Max Kazantsev
bf88ba0f87 Revert "[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now"
This reverts commit 9f27f4536e19e93349b0662338408efe6d1cb2fd.

Supposed to be NFC, but broke buildbots (test addrspacecast.ll is failing).
2022-12-21 11:21:56 +07:00
Roman Lebedev
9f27f4536e
[NFC][SROA] speculateSelectInstLoads(): play nice with typed pointers for now
As requested in https://reviews.llvm.org/D138238#inline-1356685
2022-12-21 05:17:02 +03:00
Joshua Cranmer
e6b02214c6 [IR] Add a target extension type to LLVM.
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.

Reviewed By: nikic, barannikov88

Differential Revision: https://reviews.llvm.org/D135202
2022-12-20 11:02:11 -05:00
Roman Lebedev
96d3c82645
Revert "[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)"
While the PPC litte-endian miscompile did get addressed
by https://reviews.llvm.org/D140046
the PPV big-endian bots are still unhappy.
https://lab.llvm.org/buildbot/#/builders/93/builds/12560

This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.
2022-12-16 22:58:41 +03:00
Roman Lebedev
cfd594f8bb
[SROA] isVectorPromotionViable(): memory intrinsics operate on vectors of bytes (take 3)
* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f,
* which was reverted in 25f01d593ce296078f57e872778b77d074ae5888,
  because it exposed a miscompile in PPC backend,  which was resolved
  in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5.
* which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a,
* which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df,
  because the cut-off on the number of vector elements was not low enough,
  and it triggered both SDAG SDNode operand number assertions,
  5and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-12-16 19:27:38 +03:00