33436 Commits

Author SHA1 Message Date
OCHyams
086635d6b9 [Assignment Tracking][SROA] Fix fragment when slice size equals variable size
Correctly handle the case of splitting an alloca which backs contiguous
distinct variables, where a slice's size equals the size of a backed variable.

We need to ensure that we don't generate fragments expressions with fragments
of the same size as the variable as this is a verifier error.

Prior to this patch a fragment expression would be created in this
situation. e.g. splitting an alloca i64 with two adjacent 32-bit variables into
two 32-bit allocas, the new dbg.assign expressions would contain
(DW_OP_LLVM_fragment, 0, 32) and (DW_OP_LLVM_fragment, 32, 32) even though
those fragments cover each variable entirely.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147696
2023-04-06 15:29:18 +01:00
Dmitry Makogon
3d7242f05e Reapply "[LSR] Preserve LCSSA when rewriting instruction with PHI user"
This reverts commit efd34ba60f3839b0a68b2e32ff9011b6823bc16f.

Reapplies 8ff4832679e1. Missed a failing test. Needed to just
update test checks.
2023-04-06 17:31:27 +07:00
Serguei Katkov
6bda53c591 [GuardWidening] Re-factor freezeAndPush.
Re-write the code to avoid iteration over users of
constants and global values.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147450
2023-04-06 16:46:47 +07:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
Nikita Popov
503ef0a8e7 [InstCombine] Remove addrspacecast bitcast extraction fold (NFC)
This is not relevant for opaque pointers, and as such no longer
necessary.
2023-04-06 09:53:32 +02:00
Bjorn Pettersson
44773b798a [SimpleLoopUnswitch] Fix SCEV invalidation issue
This patch is making sure that we use getTopMostExitingLoop when
finding out which loops to forget, when dealing with
unswitchNontrivialInvariants and unswitchTrivialSwitch. It seems
to at least be needed for unswitchNontrivialInvariants as detected
by the included test case.

Note that unswitchTrivialBranch already used getTopMostExitingLoop.
This was done in commit 4a9cde5a791cd49b96993e6. The commit
message in that commit says "If the patch makes sense, I will also
update those places to a similar approach ...", referring to these
functions mentioned above. As far as I can tell that never happened,
but this is an attempt to finally fix that.

Fixes https://github.com/llvm/llvm-project/issues/61080

Differential Revision: https://reviews.llvm.org/D147058
2023-04-06 09:46:42 +02:00
Nikita Popov
a162ddf7f2 [InstCombine] Remove various checks for opaque pointers (NFC)
All pointers are opaque now, so these are no longer necessary.
2023-04-06 09:45:51 +02:00
Nikita Popov
db6b30b183 [InstCombine] Remove GEP of bitcast folds (NFC)
These only support typed pointers, and as such are no longer
relevant.
2023-04-06 09:15:33 +02:00
Nikita Popov
cf9f1a8203 [InstCombine] Remove visitGEPOfBitcast() fold (NFC)
This does not apply to opaque pointers, and as such is no longer
necessary.
2023-04-06 09:04:31 +02:00
David Green
28c8616a5b [LV] Cleanup and reformatting for some debug messages. NFC
This is just some cleanup of various debug messages, pulled out of another
patch to simplify it a little.
2023-04-05 17:50:01 +01:00
Alexey Bataev
40105a9933 [SLP]Find reused scalars in buildvector sequences, if any.
Patch generalizes analysis of scalars. The main part is outlined into
lambda, which can be used to find reused inserted scalars and emit
shuffle for them instead of multiple insertelement instructions, if the
permutation is found alreadyi. I.e. some scalars are transformed by the
permutation of previously vectorized nodes, and some are inserted
directly.

Reworked part of D110978

Differential Revision: https://reviews.llvm.org/D146564
2023-04-05 09:37:05 -07:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Nikita Popov
7c78cb4b1f Revert "[SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating"
This reverts commit 78b1fbc63f78660ef10e3ccf0e527c667a563bc8.

This causes or exposes miscompiles in Rust, revert until they
have been investigated.
2023-04-05 17:05:39 +02:00
Florian Hahn
04681243b4
[Matrix] Limit dot lowering to column major matrixes.
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
2023-04-05 15:49:06 +01:00
Nikita Popov
238a59c3f1 [InstCombine] Remove varargs cast transform (NFC)
This is no longer relevant with opaque pointers.

Also drop the CastInst::isLosslessCast() method, which was only
used here.
2023-04-05 16:36:21 +02:00
Jie Fu
ae5f049378 [Transforms] Fix -Wunused-function for 'GetReplicateRegion' with -DLLVM_ENABLE_ASSERTIONS=OFF (NFC)
/Users/jiefu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:614:23: error: unused function 'GetReplicateRegion' [-Werror,-Wunused-function]
static VPRegionBlock *GetReplicateRegion(VPRecipeBase *R) {
                      ^
1 error generated.
2023-04-05 22:34:42 +08:00
Nikita Popov
032e5d403e [InstCombine] Remove convertBitCastToGEP() fold (NFC)
This only applies to typed pointers, so the fold is no longer
necessary.
2023-04-05 16:20:14 +02:00
Jie Fu
d1dd995196 [InstCombine] Remove unneeded internal function 'decomposeSimpleLinearExpr' in InstCombineCasts.cpp (NFC)
/data/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp:32:15: error: function 'decomposeSimpleLinearExpr' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static Value *decomposeSimpleLinearExpr(Value *Val, unsigned &Scale,
              ^
1 error generated.
2023-04-05 22:18:39 +08:00
Eric Gullufsen
b9bbe2f603 [InstCombine] Preserve nsw/nuw flags in canonicalization
canonicalizeLogicFirst reorders logic op / math op for suitable
constants, and this commit makes this function pass through
nsw/nuw flags on the Add.

Differential Revision: https://reviews.llvm.org/D147568
2023-04-05 10:12:54 -04:00
Nikita Popov
3cbdcd6ebf [InstCombine] Remove PromoteCastOfAllocation() fold (NFC)
This fold does not apply to opaque pointers, and as such is no
longer needed.
2023-04-05 15:55:43 +02:00
Florian Hahn
c18bc7f7fe
[VPlan] Replace check for replicate regions with assert (NFCI).
After recent changes, replication regions only get introduced later, so
there's no need to check for them.
2023-04-05 14:29:24 +01:00
Nikita Popov
53280dba83 [InstCombine] Use CreateGEP() API (NFC)
Use the IRBuilder API that accepts inbounds as a boolean parameter,
rather than using a ternary.
2023-04-05 15:02:45 +02:00
Nikita Popov
b066505d88 [ArgPromotion] Require noundef to copy poison-generating metadata
For poison-generating (rather than IUB) metadata, only copy it
from the dominating must-exec load if it is combined with !noundef.
This could be further extended by additionall intersecting the
metadata from all loads, which does not require !noundef.
2023-04-05 14:34:33 +02:00
OCHyams
76740fb40e [Assignment Tracking][SROA] Handle createFragmentExpression failure
createFragmentExpression will fail if it determines that the expression cannot
be split over fragments. Handle this case in SROA. Similarly to D147312 this
should be a rare occurrence as the `dbg.assign` will usually reference the
`Value` being stored without modifying it with a `DIExpression`.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147431
2023-04-05 11:20:32 +01:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
Nikita Popov
7553bad1ac [LICM] Don't require optimized uses
LICM currently requests optimized use MSSA form. This is wasteful,
because LICM doesn't actually care about most uses, only those of
invariant pointers in loops. Everything else doesn't need to be
optimized.

LICM already uses the clobber walker in most places. This patch
adjusts one place that was using getDefiningAccess() to use it as
well, so we no longer have a dependence on pre-optimized uses.

This change is not NFC in that the fallback on the defining access
when there are too many clobber calls may now fall back to an
unoptimized use. In practice, I've not seen any problems with this
though. If desired, we could also increase licm-mssa-optimization-cap
to a higher value (increasing this from 100 to 200 has no impact on
average compile-time -- but also doesn't appear to have any impact
on LICM quality either).

This makes for a 0.9% geomean compile-time improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D147437
2023-04-05 11:20:25 +02:00
Evgenii Stepanov
e0f7ef4b9c [msan] Fix handling of ParamTLS overflow.
Ironically, MSan copies uninitialized data off the stack into
VAArgTLSCopy in the callee-side handling of va_start. Clamp the copy
size to the actual length of the buffer, and zero-initialize the
remainder.

Differential Revision: https://reviews.llvm.org/D146858
2023-04-04 13:52:09 -07:00
Jeff Byrnes
9b79d0b610 [MergedLoadStoreMotion] Merge stores with conflicting value types
Since memory does not have an intrinsic type, we do not need to require value type matching on stores in order to sink them. To facilitate that, this patch finds stores which are sinkable, but have conflicting types, and bitcasts the ValueOperand so they are easily sinkable into a PHINode. Rather than doing fancy analysis to optimally insert the bitcast, we always insert right before the relevant store in the diamond branch. The assumption is that later passes (e.g. GVN, SimplifyCFG) will clean up bitcasts as needed.

Differential Revision: https://reviews.llvm.org/D147348
2023-04-04 12:01:29 -07:00
serge-sans-paille
ad9ad3735c
Do not move "auto-init" instruction if they're volatile
This is overly conservative, but at least it's safe.

This is a follow-up to https://reviews.llvm.org/D137707
2023-04-04 20:42:05 +02:00
David Sherwood
b4089cfa2f [NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.

Differential Revision: https://reviews.llvm.org/D146127
2023-04-04 14:00:49 +00:00
Nikita Popov
78b1fbc63f [SimplifyCFG][LICM] Preserve nonnull, range and align metadata when speculating
After D141386, violation of nonnull, range and align metadata
results in poison rather than immediate undefined behavior,
which means that these are now safe to retain when speculating.
We only need to remove UB-implying metadata like noundef.

This is done by adding a dropUBImplyingAttrsAndMetadata() helper,
which lists the metadata which is known safe to retain on speculation.

Differential Revision: https://reviews.llvm.org/D146629
2023-04-04 10:03:45 +02:00
Craig Topper
1f60c8d025 [IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
2023-04-03 23:14:02 -07:00
serge-sans-paille
50b2a113db
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

This is a recommit of cca01008cc31a891d0ec70aff2201b25d05d8f1b with
MemorySSA update fixes.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-04 07:30:03 +02:00
Philip Reames
f6b217c7cb [LV] Remmove unused default argument to isLegalGatherOrScatter [nfc] 2023-04-03 11:03:35 -07:00
Alexey Bataev
c1660006b2 [SLP]Reorder counters for same values, if the root node is reordered.
The counters for the repeated scalars are ordered in the natural order,
but the original scalars might be reordered during SLP graph reordering
and this order can be dropped. Need to use the scalars after the
reordering, not the original ones, to emit correct code for same value
counters.
2023-04-03 07:52:49 -07:00
Nikita Popov
9b5ff4436e [EarlyCSE] Call combineMetadataForCSE() when CSEing loads
We may have to adjust metadata on the replacement load if the
metadata is poison-generating.
2023-04-03 16:10:19 +02:00
Nikita Popov
d68800d15d [Local] Preserve !invariant.load of dominating instruction
Per LangRef:

> If a load instruction tagged with the !invariant.load metadata
> is executed, the memory location referenced by the load has to
> contain the same value at all points in the program where the
> memory location is dereferenceable; otherwise, the behavior is
> undefined.

As invariant.load violation is immediate undefined behavior, it
is sufficient for it to be present on the dominating load (for
the case where K does not move).
2023-04-03 16:05:02 +02:00
serge-sans-paille
11ae47dfc6
Revert "Move "auto-init" instructions to the dominator of their users"
This reverts commit cca01008cc31a891d0ec70aff2201b25d05d8f1b.

This change breaks memory ssa checks, see https://lab.llvm.org/buildbot#builders/109/builds/60970
2023-04-03 15:46:18 +02:00
Nikita Popov
e20331cec0 [Local] Use combineMetadataForCSE() in patchReplacementInstruction()
patchReplacementInstruction() is used for CSE-style transforms.
Avoid the need to maintain two separate lists of known metadata IDs,
which can and do go out of sync.
2023-04-03 15:30:21 +02:00
Nikita Popov
0b5068695a [Local] Add MD_fpmath to combineMetadataForCSE()
This was present in patchReplacementInstruction() but not
combineMetadataForCSE(). combineMetadata() already knows how to
merge these properly.
2023-04-03 15:27:59 +02:00
serge-sans-paille
cca01008cc
Move "auto-init" instructions to the dominator of their users
As a result of -ftrivial-auto-var-init, clang generates instructions to
set alloca'd memory to a given pattern, right after the allocation site.
In some cases, this (somehow costly) operation could be delayed, leading
to conditional execution in some cases.

This is not an uncommon situation: it happens ~500 times on the cPython
code base, and much more on the LLVM codebase. The benefit greatly
varies on the execution path, but it should not regress on performance.

Differential Revision: https://reviews.llvm.org/D137707
2023-04-03 15:27:27 +02:00
Zain Jaffal
1d23d60c8d [ConstraintElimination] Add function arguments to constraint system before solving
If there is an optimisation opportunity and the function argument hasn’t been added to constraint
system through previous facts we fail to optimise it.

It might be a good idea to start the constraint system with all the function arguments added to the system

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144879
2023-04-03 14:16:49 +01:00
Nikita Popov
b58a697f3e [LICM] Don't promote store to global even in single-thread mode
Even if there are no thread-safety concerns, we should not promote
(not guaranteed-to-execute) stores to globals without further
analysis: While the global may be writable, we may not have
provenance to perform the write. The @promote_global_noalias test
case illustrates a miscompile in the presence of a noalias pointer
to the global.

Worth noting that the load-only promotion may also not be well-defined
depending on precise semantics (we don't specify whether load
violating noalias is poison or UB -- though I believe the general
inclination is to make it poison, and only stores UB), but that's
a more general issue.

This is inspired by https://github.com/llvm/llvm-project/issues/60860,
which is a related issue with TBAA metadata.

Differential Revision: https://reviews.llvm.org/D146233
2023-04-03 14:20:06 +02:00
Serguei Katkov
2b9509627c [GuardWidening] Fix the crash while replacing the users of poison.
When we replace poison with freeze poison it might appear
that user of poison is a constant (for example vector constant).

In this case we will get that constant will get non-constant operand.

Moreover replacing poison and GlobalValue everywhere in module seems
to be overkill. So the solution will be just make a replacement
only in instructions we visited (contributing to hoisted condition).
Moreover if user of posion is constant, this constant also should need
a freeze and it does not make sense to replace poison with frozen version,
just freeze another constant.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D147429
2023-04-03 17:20:38 +07:00
Nikita Popov
0b9259c00d [LICM] Extract helper for getClobberingMemoryAccess()
Extract a helper that does the clobber walk while taking into
account the cap. Slightly reflow things to check this first in
the store case, before we start walking over all accesses in the
loop.
2023-04-03 12:02:55 +02:00
Kazu Hirata
52dd9deb15 [Scalar] Use SmallPtrSet::contains (NFC) 2023-03-31 23:50:17 -07:00
Alexey Bataev
c1bcf5dd0a [SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase
instruction with users."' failed.

If the externally used scalar is part of the tree and is replaced by
extractelement instruction, need to add generated extractelement
instruction to the list of the ExternallyUsedValues to avoid deletion
during vectorization.
2023-03-31 14:21:19 -07:00
Jie Fu
297242a2bb [InstCombine] Fix -Wimplicit-fallthrough in InstCombinerImpl::visitCallInst (NFC)
/data/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:3078:3: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
  default:
  ^
/data/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:3078:3: note: insert 'break;' to avoid fall-through
  default:
  ^
  break;
1 error generated.
2023-03-31 22:52:46 +08:00
Nikita Popov
6261adfa51 [InstCombine] Fold more intrinsics over selects
Move this handling to a centralized place and extend it to handle
saturating add/sub intrinsics.

I originally wanted to make this fully generic rather than
whitelist based, because this is legal and likely profitable for all
speculatable intrinsics. The caveat is that for vector selects,
the intrinsic can't perform cross-lane operations like a shuffle
or reduction, which we don't really expose as a generic property
right now. So for now I'm just extending the list.
2023-03-31 16:32:21 +02:00
Nikita Popov
cbca9ce91c [InstCombine] Remove min/max special case when folding into select
Now that we canonicalize to min/max intrinsics, we no longer need
to guard against this here.

In fact, it seems like the issue from PR46271 was the final push
for introducing the intrinsics in the first place...
2023-03-31 13:48:21 +02:00