Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
If FunctionAttrs infers additional attributes on a function, it also
invalidates analysis on callers of that function. The way it does this
right now limits this to calls with matching signature. However, the
function attributes will also be used when the signatures do not match.
Use getCalledOperand() to avoid a signature check.
This is not a correctness fix, just improves analysis quality. I noticed
this due to
https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709,
where LICM ends up with a stale MemoryDef that could be a MemoryUse
(which is a bug in LICM, but still non-optimal).
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Specifically in the context of the once-stored transformation, GlobalOpt
would strip
all pointer casts unconditionally, even though addrspacecasts might be
runtime operations.
This manifested particularly on CHERI targets.
This patch was inspired by an existing change in CHERI LLVM
(91afa60f17),
but has been reimplemented with updated conventions, and a testcase
constructed from scratch.
Fixes#139023.
This PR essentially removes unused global variables:
- Restores the `GlobalDCE` Legacy pass and adds it to the DirectX
backend after the finalize linkage pass
- Converts external global variables with no usage to internal linkage
in the finalize linkage pass
- (so they can be removed by `GlobalDCE`)
- Makes the `dxil-finalize-linkage` pass usable using the new pass
manager flag syntax
- Adds tests to `finalize_linkage.ll` that make sure unused global
variables are removed
- Adds a use for variable `@CBV` in `opaque-value_as_metadata.ll` so it
isn't removed
- Changes the `scalar-data.ll` run command to avoid removing its global
variables
---------
Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>
Call `recordInliningWithCalleeDeleted` before dropping the contents of
the Callee. Otherwise the handlers don't have access to e.g. the
DebugLoc, so the Callee DebugLoc was missing in inlining remarks for
functions with internal linkage.
The test is the same as `optimization-remarks-passed-yaml.ll` except
that the function `foo` has internal linkage instead of external linkage.
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.
Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.
std::make_optional<T> is a lot like std::make_unique<T> in that it
performs perfect forwarding of arguments for T's constructor. As a
result, we don't have to repeat type names twice.
With the current aliases metadata we lose information about which groups
of aliases survive symbol resolution. This causes various problems such
as #150075 where symbol resolution breaks the link between alias groups.
In this redesign of the aliases metadata, we stop representing the
individual aliases in !aliases. Instead, the individual aliases are
represented in !cfi.functions in the same way as functions, and the
alias groups (i.e. groups of symbols with the same address) are stored
in !aliases. At symbol resolution time, we filter out all non-prevailing
members of !aliases; the resulting set is used by LowerTypeTests to
recreate the aliases.
With this change it is now possible for a jump table entry to refer
to an alias in one of the ThinLTO object files (e.g. if a function is
non-prevailing but its alias is prevailing), so instead of deleting them,
rename them with the ".cfi" suffix.
Fixes#150070.
Fixes#150075.
Reviewers: teresajohnson, vitalybuka
Reviewed By: vitalybuka
Pull Request: https://github.com/llvm/llvm-project/pull/150690
There is no reason to use std::map for the call maps maintained for
function clones during function clone assignment, as we don't iterate
over them and don't need deterministic ordering, so use the more
efficient DenseMap.
When inferring attributes, we should not bail out early on unknown calls
(such as virtual calls), as we may still have call-site attributes that
can be used for inference.
Fixes https://github.com/llvm/llvm-project/issues/150817.
This patch fixes:
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9:
error: non-void lambda does not return a value in all control paths
[-Werror,-Wreturn-type]
This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying
PR150735 with a fix for the unstable iteration order exposed by the new
tests (PR151039).
We iterate over a std::map indexed by FuncInfo, which is a pair of a
pointer and a clone number. In the ThinLTO case, this isn't an issue as
the function pointer always points to the same FunctionSummary object.
However, for regular LTO, this is a pointer to a Function object, which
is different for each clone. This will lead to unstable iteration order.
This was exposed in a test case added for PR150735, which added a new
instance of iteration over this map.
Since these function clones are added and numbered sequentially, change
this to a vector indexed by clone number, which points to a structure
containing the clone FuncInfo and the call map (the old map's key and
value, respectively).
Fix a bug in function assignment where we were not assigning all
callsite clones to a function clone. This led to incorrect call updates
because multiple callsite clones could look like they were assigned to
the same function clone.
Add in a stat and debug message to help identify and debug cases where
this is still happening.
We already included the assigned clone of the callsite node's callee in
the dot graph after function assignment. This adds the same information
for the enclosing caller function to aid debugging.
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
In rare cases the declaration of a function may not match its callsite
after function importing, when the declaration was imported from a
module where the function had void return type (presumably due to
incomplete types). Instead of using setCalledFunction() to change a call
to call its clone, which updates the call's function type as well, just
call setCalledOperand directly so the only thing changed is the function
target.
Note this can't happen for the other places where we call
setCalledFunction: FullLTO requires the cloned callee to be defined in
the same FullLTO merged module; ThinLTO memprof ICP calls an ICP
facility to first perform the promotion and that will be blocked if the
function type doesn't match the callsite (the new test explicitly tests
this latter case).
Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.
The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
Avoid constructing invalid ConstantRange when Offset + Length in memset
overflows signed 64-bit integer space. This prevents assertion failures
when inferring the initializes attribute.
Fixes#140345
Partially reverts e37d736def5b95a2710f92881b5fc8b0494d8a05.
The transform has a number of correctness and code quality issues, and
will benefit from a from-scratch re-review more than incremental fixes.
The correctness issues are hinted at in
https://github.com/llvm/llvm-project/pull/144641, but I think it needs a
larger rework to stop working on ArrayTypes and the implementation could
use some other improvements (like callInstIsMemcpy should just be
`dyn_cast<MemCpyInst>`). I can comment in more detail on a resubmission
of the patch.
Expensive checks complains when we mark them as preserved. The bitcode
being embedded generally doesn't change anything important in the
module, but some things are modified under ThinLTO, like vtables under
WPD. This became a non-issue when we cloned the module, but after we had
to revert that in #145987, we need to handle this case properly.
The comment was outdated, as getDeclarationIfExists has been
introduced in the meantime.
We also only use this in one place where neither the Tys.empty()
case nor the FT is relevant, so just include the call to
getDeclarationIfExists().
When a customer class inherits from a libc++ class, and is built with
"-flto -fwhole-program-vtables -static-libstdc++ \
-Wl,-plugin-opt=-whole-program-visibility", the libc++ class's vtable is
available_externally, meanwhile the customer class vtable is private.
And
both of them are !vcall_visibility == Linkage Unit.
In this case, icall.branch.funnel might be generated.
But the icall.branch.funnel would cause crash in LowerTypeTests because
available_externally Global_Object's GlobalTypeMember would not be
saved and finally leads to a NULL GlobalTypeMember which causes a crash.
Even saving the available_externally GO's GlobalTypeMember so that it is
not NULL to avoid the crash in LowerTypeTests, it still will crash in
SelectionDAGBuilder or Verifier, because operands linkage type
consistency
check of icall.branch.funnel can not pass.
So any one of available externally vtable would stop to generate
icall.branch.funnel.
This patch fixes FullLTO mode and split-LTO-unit ThinLTO mode.
Reverts llvm/llvm-project#141800
The implementation critically misunderstands the `AAMemoryBehavior`
attributor, which it relies on heavily.
@shiltian, since I do not have commit permissions.
The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.
Reviewers: fmayer
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/144462
Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.
This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.
A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.
It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
Related to #143219.
Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.
While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.
I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.
It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.
With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.
With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.
This threshold has been changed before also due to improved
alias information:
2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)
Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
I ran into the same issue as
https://github.com/llvm/llvm-project/pull/139962 regarding the comdat
corresponding to a renamed key function but for thinlto. My last patch
had not considered the thinlto case, so this applies the same fix for
imported functions.