Given two Store instructions with equivalent pointer operands,
they could be merged into their common successor basic block if
the value operand of one is bitcasted to match the type of the
other.
Differential Revision: https://reviews.llvm.org/D150900
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
- As the address space cast may not be valid on a specific target,
`addrspacecast` is not handled when an `alloca` is able to be replaced
with the source of memcpy/memmove. This patch addresses that by
querying a target hook on whether that address space cast is valid.
For example, on most GPU targets, the cast from a global pointer to a
generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
replacement is enhanced to handle that `addrspacecast` as well.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D147025
Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.
Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.
This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.
In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.
This commit does not impact
- C library-related rewriting (memcpy()), which are operating under
the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.
- Code generation and below. Note that the use of getIntPtrType() in
CodeGenPrepare will be changed in a future commit.
- Usage of getIntPtrType() in any backend
Depends on D143435
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D143437
InstCombine is supposed to be a superset of InstSimplify, but
failed to invoke load simplification.
Unfortunately, this causes a minor compile-time regression, which
will be mitigated in a future commit.
Allow iterating through SelectInst use of the alloca when
checking if it is only ever overwritten from constant memory.
Recursively determine if the SelectInst is replacable and insert
it into the Worklist if so. Finally, define a new SelectInst to
replace the old one, with both of it's values replaced according
to the WorkMap.
Differential Revision: https://reviews.llvm.org/D136524
This patch adds on to the functionality implemented
in rG42ab5dc5a5dd6c79476104bdc921afa2a18559cf,
where PHI nodes are supported in the use-def traversal
algorithm to determine if an alloca ever overwritten
in addition to a memmove/memcpy. This patch implements
the support needed by the PointerReplacer to collect
all (indirect) users of the alloca in cases where a PHI
is involved. Finally, a new PHI is defined in the replace
method which takes in replaced incoming values and
updates the WorkMap accordingly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136201
As long as the memcpy occurs on a phi input (rather than the phi
output), we can look through phi nodes in
isOnlyCopiedFromConstantMemory().
This is split out of D136201, to only handle the case where the
address spaces are the same, and no pointer rewrite is necessary.
This fold currently performs an unbounded recursive use walk.
Make sure that we don't visit too many instructions (the limit is
chosen arbitrarily).
This is with an eye on also handling phi nodes, which will further
extend the considered use graph.
I don't think this matters right now (because InstCombine cleans
up unreachable code early), but this will help to make sure that
we don't infinite loop once we handle phi nodes. The added test
is an example where this would happen.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Most of the updates here are just to ensure DIAssignID attachments are
maintained and propagated correctly.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133307
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.
It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.
The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.
Differential Revision: https://reviews.llvm.org/D136659
Currently, InstCombine can elide a memcpy from a constant to a local alloca if
that alloca is passed as a nocapture parameter to a *function* that's readnone
or readonly, but it can't forward the memcpy if the *argument* is marked
readonly nocapture, even though readonly guarantees that the callee won't
mutate the pointee through that pointer. This patch adds support for detecting
and handling such situations, which arise relatively frequently in Rust, a
frontend that liberally emits readonly.
A more general version of this optimization would use alias analysis to check
the call's ModRef info for the pointee, but I was concerned about blowing up
compile time, so for now I'm just checking for one of readnone on the function,
readonly on the function, or readonly on the parameter.
Differential Revision: https://reviews.llvm.org/D136822
InstCombine can replace memcpy to an alloca with a pointer directly to the
source in certain cases. Unfortunately, it also did so for volatile memcpys.
This patch makes it stop doing that.
This was discovered in D136822.
Differential Revision: https://reviews.llvm.org/D137031
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
This removes memset with undef char. We already do this for stores
of undef value.
This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.
Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.
Differential Revision: https://reviews.llvm.org/D124173
This code replaces the address space of the pointers while keeping
the element type. Use the appropriate helpers to make this work
with opaque pointers.
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
Officially this is currently required to always use the datalayout's
alloca address space. This may change in the future, and it's cleaner
to propagate the existing alloca's addrspace anyway.
This is a triple fix. Initially the change in simplifyAllocaArraySize
would drop the address space, but produce output. Fixing this hit an
assertion in the cast combine.
This patch also makes the changes to handle this situation from
a33e12801279a947c74fdee2655b24480941fb39 dead, so eliminate
it. InstCombine should not take it upon itself to introduce
addrspacecasts, and preserve the original address space instead.
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.
Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:
- Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
- Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
- Unblocked induction variable simpliciation
- Allow empty loop deletion by treating probe intrinsic isDroppable
- Some refactoring.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110847
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
Patch by Mohammad Fawaz
This issues started happening after
b373b5990d
Basically, if the memcpy is volatile, the collectUsers() function should
return false, just like we do for volatile loads.
Differential Revision: https://reviews.llvm.org/D106950
Patch by Mohammad Fawaz
This patch allows lifetime calls to be ignored (and later erased) if we
know that the copy-constant-to-alloca optimization is going to happen.
The case that is missed is when the global variable is in a different address
space than the alloca (as shown in the example added to the lit test.)
This used to work before 6da31fa4a6
Differential Revision: https://reviews.llvm.org/D106573
While we might eventually want to disallow allocas that do not have the
alloca-AS set, it seems undesirable to crash on them. Add a cast when
required so that we can support such allocas (at least here).
Differential Revision: https://reviews.llvm.org/D104866
When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.
Differential Revision: https://reviews.llvm.org/D104718
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.
This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104602
As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention.
The optimizations unblocked are:
- In-block load propagation.
- In-block dead store elimination
- Memory copy optimization that turns stores to consecutive memories into a memset.
These optimizations are local to a block, so they shouldn't affect the profile quality.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100075
This transformation is fundamentally broken when it comes to dominance,
it just happened to work when the source of the memcpy can be moved into
the place of the alloca. The bug shows up a lot more often since
077bff39d46364035a5dcfa32fc69910ad0975d0 allows the source to be a
switch.
It would be possible to check dominance of the source and all its
operands, but that seems very heavy for instcombine.
After 077bff39d46364035a5dcfa32fc69910ad0975d0,
isDereferenceableForAllocaSize() can recurse into selects,
which is causing a problem for the new test case,
reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html
because the replacement (the select) is defined after the first use
of an alloca, so we'd end up with a verifier error.
Now, this new check is too restrictive.
We likely can handle *some* cases, by trying to sink all uses of an alloca
to after the the def.
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.
If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.