The undetectable FMV features predres and ls64 have been removed,
therefore the optimization is now re-enabled. The llvm testsuite
Graviton4 bots are expected to remain green.
Adds `optimize-non-fmv-callers` (disabled by default) as a short term
workaround to keep the llvm testsuite bots green, until we decide what
is the right solution for the problem which was previously addressed
with https://github.com/llvm/llvm-project/pull/123383.
This fixes a runtime regression in the llvm testsuite:
https://lab.llvm.org/buildbot/#/builders/198/builds/1237
On clang-aarch64-sve2-vla:
predres
FAIL
A 'predres' version is unexpectedly trapping on GravitonG4. My
explanation is that when the caller in not a versioned function, the
compiler exclusively relies on the command line option, or target
attribute to deduce whether a feature is available. However, there is no
guarantee that in reality the host supports those implied features.
This is a quickfix. We may rather change the mcpu option in the llvm
testsuite build instead.
To deduce whether the optimization is legal we need to compare the target
features between caller and callee versions. The criteria for bypassing
the resolver are the following:
* If the callee's feature set is a subset of the caller's feature set,
then the callee is a candidate for direct call.
* Among such candidates the one of highest priority is the best match
and it shall be picked, unless there is a version of the callee with
higher priority than the best match which cannot be picked from a
higher priority caller (directly or through the resolver).
* For every higher priority callee version than the best match, there
is a higher priority caller version whose feature set availability
is implied by the callee's feature set.
Example:
Callers and Callees are ordered in decreasing priority.
The arrows indicate successful call redirections.
Caller Callee Explanation
=========================================================================
mops+sve2 --+--> mops all the callee versions are subsets of the
| caller but mops has the highest priority
|
mops --+ sve2 between mops and default callees, mops wins
sve sve between sve and default callees, sve wins
but sve2 does not have a high priority caller
default -----> default sve (callee) implies sve (caller),
sve2(callee) implies sve (caller),
mops(callee) implies mops(caller)
The logic had a flaw where the alignment from the original aggregate is
unintentionally retained for elements when the calculated known
alignment is not higher than the element's ABI type alignment.
Fixes#115282.
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make ARM backend generate full word loads instead of
loading a single byte (ldrb) and/or half word (ldrh). Only pads
destination when it's a stack allocated constant size array and source
when it's constant string. Heuristic to decide whether to pad or not
is very basic and could be improved to allow more examples to be
padded.
- Pass works at the midend level
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
This patch fixes an issue in the GlobalOpt pass where deleting a global
variable fails to update the corresponding dbg.value and it references
an empty metadata entry. The SalvageDebugInfo() function has been
updated to handle dbg.value intrinsic when globals are deleted.
Fixes https://github.com/llvm/llvm-project/issues/96197.
A global alias should always point to a definition. Ifuncs are
definitions, so far so good. However an ifunc may be statically resolved
to a function that is declared but not defined in the translation unit.
With this patch we perform static resolution if:
* the resolvee is defined, else if
* none of the ifunc users is a global alias
https://godbolt.org/z/frjhqMKqc for an example.
Removal of allocations due to empty `__cxa_atexit` destructor calls is
done by the following globalopt pass.
This pass currently does not look for `atexit` handlers generated for
platforms that do not use `__cxa_atexit`.
By default Win32 and AIX use `atexit`.
I don't see an easy way to only remove `atexit` calls that the compiler
generated without looking at the generated mangled name of the atexit
handler that is being registered.
However we can easily remove all `atexit` calls that register empty
handlers since it is trivial to ensure the removed call still returns
`0` which is the value for success.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
Teaching ConstantFoldLoadFromUniformValue that types that are padded in
memory can't be considered as uniform.
Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.
Main problem solved would be something like this:
store i17 -1, ptr %p, align 4
%v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't
really tell where the padding goes. And even if we assume that the 15
most significant bits are padding, then they should be considered as
undefined (even if LLVM backend typically would pad with zeroes).
Anyway, for a big-endian target the load would read those most
significant bits, which aren't guaranteed to be one's. So it would be
wrong to constant fold the load as returning -1.
If LLVM IR had been more explicit about the placement of padding, then
we could allow the constant fold of the load in the example, but only
for little-endian.
Fixes: https://github.com/llvm/llvm-project/issues/81793
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.
We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.
Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
Fix crash on RAUW due to locals and globals having different address
spaces. This is the intent of the original code, but it assumes the
alloca address space is 0. This patch fixes the code to check that the
global's address space matches `DL.getAllocaAddrSpace()` instead.
Fixes#65155
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
Fixes assert with pointers with different address spaces. We
could keep looking through addrspacecast, but it would require
checking for null handling of the access address space.
Fixes#62384
As long as aliasee has `@llvm.used` or `@llvm.compiler.used` references, we cannot do the related replace or delete operations. Even if it is a Local Linkage, we cannot infer if there is no other use for it, such as asm or other future added cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D145293
The llvm.used (or llvm.compiler.used) global variable is an array that contains a list of pointers to global variables and functions.
The GlobalOpt (Global Variable Optimizer) pass is not preserving the address space for llvm.used and llvm.compiler.used global variables.This patch updates the setUsedInitializer() function in GlobalOpt.cpp, so the address space is preserved.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D144518
Rather than cleanup up dead constant expressions as we go along,
do this once at the end. This aligns it with the
CleanupConstantGlobalUsers() implementation and avoids
any invalidation issues.
Fixes https://github.com/llvm/llvm-project/issues/61674.
Update the code to collect the initializer when collecting the types for
a global, as suggested by @nikic in D144476.
This replaces code which bailed out if *any* part doesn't have an
initializer with a continue to just skip parts which don't have an
initializer.
Depends on D145489.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D145490
If all stores only store the initializer value of a global, consider it
as not stored in the heuristic. GlobalOpt will remove such stores later
on.
Depends on D129857.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144476
Pointer bitcasts no longer occur with opaque pointers -- and in
this case not handling them allows us to drop the code for
promoting constant expressions to instructions as well.
Extend CleanupPointerRootUsers to iterate over a worklist, add users of
constant expressions to the worklist to enable additional cleanups.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144468
When limiting the number of parts we split a global into, ignore
any parts that are either only loaded or only stored, because we
expect these to be optimized away after SRA.
Differential Revision: https://reviews.llvm.org/D129857
This patch adds several missing GlobalList modifier functions, like
removeGlobalVariable(), eraseGlobalVariable() and insertGlobalVariable().
There is no longer need to access the list directly so it also makes
getGlobalList() private.
Differential Revision: https://reviews.llvm.org/D144027