Specifically in the context of the once-stored transformation, GlobalOpt
would strip
all pointer casts unconditionally, even though addrspacecasts might be
runtime operations.
This manifested particularly on CHERI targets.
This patch was inspired by an existing change in CHERI LLVM
(91afa60f17),
but has been reimplemented with updated conventions, and a testcase
constructed from scratch.
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
Partially reverts e37d736def5b95a2710f92881b5fc8b0494d8a05.
The transform has a number of correctness and code quality issues, and
will benefit from a from-scratch re-review more than incremental fixes.
The correctness issues are hinted at in
https://github.com/llvm/llvm-project/pull/144641, but I think it needs a
larger rework to stop working on ArrayTypes and the implementation could
use some other improvements (like callInstIsMemcpy should just be
`dyn_cast<MemCpyInst>`). I can comment in more detail on a resubmission
of the patch.
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.
This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).
Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.
When converting a malloc stored to a global into a global, we will
introduce an i1 flag to track whether the global has been initialized.
In case of atomic loads/stores, this will result in verifier failures,
because atomic ops on i1 are illegal. Even if we changed this to i8, I
don't think it is a good idea to change atomic types in that way.
Instead, bail out of the transform is we encounter any atomic
loads/stores of the global.
Fixes https://github.com/llvm/llvm-project/issues/137152.
Some optimizations in globalopt simplify uses of a global value to uses
of a generated global bool value; in some cases where this happens, the
newly-generated instructions would not have the original source
location(s) of the instructions they replaced propagated to them; this
patch properly preserves those source locations.
Found using https://github.com/llvm/llvm-project/pull/107279.
Querying TTI creates a Subtarget object, but an llvm.memcpy declaration
doesn't have target-cpu and target-feature attributes like functions
with definitions. This can cause a warning to be printed on RISC-V
because the target-abi in the Module requires floating point, but the
subtarget features don't enable floating point. So far we've only seen
this in LTO when an -mcpu is not supplied for the TargetMachine.
To fix this, get TTI for the calling function instead.
Fixes the issue reported here
https://github.com/llvm/llvm-project/issues/69780#issuecomment-2665273161
The undetectable FMV features predres and ls64 have been removed,
therefore the optimization is now re-enabled. The llvm testsuite
Graviton4 bots are expected to remain green.
Adds `optimize-non-fmv-callers` (disabled by default) as a short term
workaround to keep the llvm testsuite bots green, until we decide what
is the right solution for the problem which was previously addressed
with https://github.com/llvm/llvm-project/pull/123383.
This fixes a runtime regression in the llvm testsuite:
https://lab.llvm.org/buildbot/#/builders/198/builds/1237
On clang-aarch64-sve2-vla:
predres
FAIL
A 'predres' version is unexpectedly trapping on GravitonG4. My
explanation is that when the caller in not a versioned function, the
compiler exclusively relies on the command line option, or target
attribute to deduce whether a feature is available. However, there is no
guarantee that in reality the host supports those implied features.
This is a quickfix. We may rather change the mcpu option in the llvm
testsuite build instead.
To deduce whether the optimization is legal we need to compare the target
features between caller and callee versions. The criteria for bypassing
the resolver are the following:
* If the callee's feature set is a subset of the caller's feature set,
then the callee is a candidate for direct call.
* Among such candidates the one of highest priority is the best match
and it shall be picked, unless there is a version of the callee with
higher priority than the best match which cannot be picked from a
higher priority caller (directly or through the resolver).
* For every higher priority callee version than the best match, there
is a higher priority caller version whose feature set availability
is implied by the callee's feature set.
Example:
Callers and Callees are ordered in decreasing priority.
The arrows indicate successful call redirections.
Caller Callee Explanation
=========================================================================
mops+sve2 --+--> mops all the callee versions are subsets of the
| caller but mops has the highest priority
|
mops --+ sve2 between mops and default callees, mops wins
sve sve between sve and default callees, sve wins
but sve2 does not have a high priority caller
default -----> default sve (callee) implies sve (caller),
sve2(callee) implies sve (caller),
mops(callee) implies mops(caller)
The logic had a flaw where the alignment from the original aggregate is
unintentionally retained for elements when the calculated known
alignment is not higher than the element's ABI type alignment.
Fixes#115282.
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make ARM backend generate full word loads instead of
loading a single byte (ldrb) and/or half word (ldrh). Only pads
destination when it's a stack allocated constant size array and source
when it's constant string. Heuristic to decide whether to pad or not
is very basic and could be improved to allow more examples to be
padded.
- Pass works at the midend level
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
This patch fixes an issue in the GlobalOpt pass where deleting a global
variable fails to update the corresponding dbg.value and it references
an empty metadata entry. The SalvageDebugInfo() function has been
updated to handle dbg.value intrinsic when globals are deleted.
Fixes https://github.com/llvm/llvm-project/issues/96197.
A global alias should always point to a definition. Ifuncs are
definitions, so far so good. However an ifunc may be statically resolved
to a function that is declared but not defined in the translation unit.
With this patch we perform static resolution if:
* the resolvee is defined, else if
* none of the ifunc users is a global alias
https://godbolt.org/z/frjhqMKqc for an example.
Removal of allocations due to empty `__cxa_atexit` destructor calls is
done by the following globalopt pass.
This pass currently does not look for `atexit` handlers generated for
platforms that do not use `__cxa_atexit`.
By default Win32 and AIX use `atexit`.
I don't see an easy way to only remove `atexit` calls that the compiler
generated without looking at the generated mangled name of the atexit
handler that is being registered.
However we can easily remove all `atexit` calls that register empty
handlers since it is trivial to ensure the removed call still returns
`0` which is the value for success.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
Teaching ConstantFoldLoadFromUniformValue that types that are padded in
memory can't be considered as uniform.
Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.
Main problem solved would be something like this:
store i17 -1, ptr %p, align 4
%v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't
really tell where the padding goes. And even if we assume that the 15
most significant bits are padding, then they should be considered as
undefined (even if LLVM backend typically would pad with zeroes).
Anyway, for a big-endian target the load would read those most
significant bits, which aren't guaranteed to be one's. So it would be
wrong to constant fold the load as returning -1.
If LLVM IR had been more explicit about the placement of padding, then
we could allow the constant fold of the load in the example, but only
for little-endian.
Fixes: https://github.com/llvm/llvm-project/issues/81793
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.
We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.
Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
Fix crash on RAUW due to locals and globals having different address
spaces. This is the intent of the original code, but it assumes the
alloca address space is 0. This patch fixes the code to check that the
global's address space matches `DL.getAllocaAddrSpace()` instead.
Fixes#65155
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
Fixes assert with pointers with different address spaces. We
could keep looking through addrspacecast, but it would require
checking for null handling of the access address space.
Fixes#62384
As long as aliasee has `@llvm.used` or `@llvm.compiler.used` references, we cannot do the related replace or delete operations. Even if it is a Local Linkage, we cannot infer if there is no other use for it, such as asm or other future added cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D145293
The llvm.used (or llvm.compiler.used) global variable is an array that contains a list of pointers to global variables and functions.
The GlobalOpt (Global Variable Optimizer) pass is not preserving the address space for llvm.used and llvm.compiler.used global variables.This patch updates the setUsedInitializer() function in GlobalOpt.cpp, so the address space is preserved.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D144518