- Put the helper function in `ProfDataUtil.h/cpp`, which is already a
dependency of `Instructions.cpp`
- The helper function could be re-used to update profiles of
`InvokeInst` (in a follow-up pull request)
As part of the migration to ptradd
(https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699),
we need to change the representation of the `inrange` attribute, which
is used for vtable splitting.
Currently, inrange is specified as follows:
```
getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2)
```
The `inrange` is placed on a GEP index, and all accesses must be "in
range" of that index. The new representation is as follows:
```
getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2)
```
This specifies which offsets are "in range" of the GEP result. The new
representation will continue working when canonicalizing to ptradd
representation:
```
getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48)
```
The inrange offsets are relative to the return value of the GEP. An
alternative design could make them relative to the source pointer
instead. The result-relative format was chosen on the off-chance that we
want to extend support to non-constant GEPs in the future, in which case
this variant is more expressive.
This implementation "upgrades" the old inrange representation in bitcode
by simply dropping it. This is a very niche feature, and I don't think
trying to upgrade it is worthwhile. Let me know if you disagree.
When inlining across functions with different target features, we
perform roughly two checks:
1. The caller features must be a superset of the callee features.
2. Calls in the callee cannot use types where the target features would
change the call ABI (e.g. by changing whether something is passed in a
zmm or two ymm registers). The latter check is very crude right now.
The latter check currently also catches inline asm "calls". I believe
that inline asm should be excluded from this check, as it is independent
from the usual call ABI, and instead governed by the inline asm
constraint string.
Fixes https://github.com/llvm/llvm-project/issues/67054.
Originally, when `EnableImportMetadata` enabled, `SourceFileName` will
be recorded as `thinlto_src_module`. Now `SourceFileName` will be
recorded as `thinlto_src_file` and `ModuleIdentifier` will be recorded
as `thinlto_src_module`.
If the signing scheme is different that maybe the functions assumes
different behaviours and dangerous to inline them without analysing
them. This should be a rare case.
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
Calling a `__arm_locally_streaming` function from a function that
is not a streaming-SVE function would lead to incorrect inlining.
The issue didn't surface because the tests were not testing what
they were supposed to test.
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6
Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|
The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
With intrinsics representing debug-info, we just clone all the
intrinsics when inlining a function and don't think about it any
further. With non-instruction debug-info however we need to be a bit
more careful and manually move the debug-info from one place to another.
For the most part, this means keeping a "cursor" during block cloning of
where we last copied debug-info from, and performing debug-info copying
whenever we successfully clone another instruction.
There are several utilities in LLVM for doing this, all of which now
need to manually call cloneDebugInfo. The testing story for this is not
well covered as we could rely on normal instruction-cloning mechanisms
to do all the hard stuff. Thus, I've added a few tests to explicitly
test dbg.value behaviours, ahead of them becoming not-instructions.
This is a stacked PR following on from #68415
This patch has two purposes:
(1) It tries to make inlining more likely when it can avoid a
streaming-mode change.
(2) It avoids inlining when inlining causes more streaming-mode changes.
An example of (1) is:
```
void streaming_compatible_bar(void);
void foo(void) __arm_streaming {
/* other code */
streaming_compatible_bar();
/* other code */
}
void f(void) {
foo(); // expensive streaming mode change
}
->
void f(void) {
/* other code */
streaming_compatible_bar();
/* other code */
}
```
where it wouldn't have inlined the function when foo would be a
non-streaming function.
An example of (2) is:
```
void streaming_bar(void) __arm_streaming;
void foo(void) __arm_streaming {
streaming_bar();
streaming_bar();
}
void f(void) {
foo(); // expensive streaming mode change
}
-> (do not inline into)
void f(void) {
streaming_bar(); // these are now two expensive streaming mode changes
streaming_bar();
}```
The use-case here is to support things like:
int foo(int x, int y) __arm_streaming { return std::max<int>(x, y); }
where the call to non-streaming `std::max<int>(x, y)` can be safely
inlined into the streaming function.
This is a first step and will need further work to allow more cases
(e.g. more finegrained analysis of the function calls to ensure they
don't result in any incompatible instructions for the requested mode).
This patch switched the default value of the mandatory-inlining-first
flag from true to false. This broke one of the MLGO tests that relied on
the default value of this flag. This patch explicitly sets the value to
fix the test and avoid future breakages.
This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3.
This is a long time coming re-application that was originally reverted due to
regressions, unrelated to the actual inlining change. These regressions have since
been fixed due to another long-in-the-making change of a66051c6 landing.
Original commit message for reference:
---
We have several situations where it's beneficial for code size to ensure that every
call to always-inline functions are inlined before normal inlining decisions are
made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
it only does it on a per-SCC basis, rather than the whole module. Ensuring that
all mandatory inlinings are done before any heuristic based decisions are made
just makes sense.
Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.
This also fixes an exponential compile time blow up in
https://github.com/llvm/llvm-project/issues/59126
Differential Revision: https://reviews.llvm.org/D143624
---
Post #68263, the inline advisor printer tries to print SCC Nodes' names,
but if we perform a full pipeline (like O1), there'll be some DCE-ing
happening and the Node pointers kept in the advisor for this (printing)
purpose are dangling. Using the more eager printer post each scc inline
pass is sufficient.
For attributes assosiated with a value (like `dereferenceable(N)`)
instead of always using the attribute from the to-be inlined caller,
it should keep using the value at existing callsites that have the
attribute if the value is higher (provides more information).
- The motivation is to expose tunable knobs to control the aggressiveness of inlines for different backend (e.g., machines with different icache size, and workload with different icache/itlb PMU counters). Tuning inline aggressiveness shows a small (~+0.3%) but stable improvement on workload/hardware that is more frontend bound.
- Both multipliers could be overridden from command line.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D153154
Poison generating return attributes can't be propagated the same as
others, as they can change the behavior of other uses and/or create UB
where it otherwise wouldn't have occurred.
For example:
```
define nonnull ptr @foo() {
%p = call ptr @bar()
call void @use(ptr %p)
ret ptr %p
}
```
If we inline `@foo` and propagate `nonnull` to `@bar`, it could change
the behavior of `@use` as instead of taking `null`, `@use` will
now be passed `poison`.
This can be even worth in a case like:
```
define nonnull ptr @foo() {
%p = call noundef ptr @bar()
ret ptr %p
}
```
Where propagating `nonnull` to `@bar` will cause UB on `null` return
of `@bar` (`noundef` + `poison`) where it previously wouldn't
have occurred.
To fix this, we only propagate poison generating return attributes if
either 1) The only use of the callsite to propagate too is return and
the callsite to propagate too doesn't have `noundef`. Or 2) the
callsite to be be inlined has `noundef`.
The former case ensures no new UB or `poison` values will be
added. The latter is UB anyways if the value is `poison` so we can go
ahead without worrying about behavior changes.
When updating the return type of deoptimize call during inline, we need
to drop incompatible return attributes. This bug was exposed once we
relaxed the contraint of adding the attributes through D156844. With
that change deoptimize (are not willreturn) will start having return
attributes added to it.
Fixes https://github.com/llvm/llvm-project/issues/64804.
Differential Revision: https://reviews.llvm.org/D158286
When a convergencectrl token is passed to a convergent call, and the called
function in turn calls the entry intrinsic, the intrinsic is now now replaced
with the convergencectrl token.
The spec requires the following check:
A call from function F to function G can be inlined only if:
- at least one of F or G does not make any convergent calls, or,
- both F and G make the same kind of convergent calls: controlled or
uncontrolled.
But this change does not implement this complete check. A proper implemenation
require a whole new analysis that identifies convergence in every function. For
now, we skip that and just do a cursory check for the entry intrinsic. The
underlying assumption is that in a compiler flow that fully implements
convergence control tokens, there is no mixing of controlled and uncontrolled
convergent operations in the whole program.
This is a reboot of the original change D85606 by
Nicolai Haehnle <nicolai.haehnle@amd.com>.
Reviewed By: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D152431
The actual callsite we are adding to doesn't need to be
`willreturn`/`nounwind`, only ever instructions between the callsite
and the return.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D156844
We can do this by just querying attribute in the callsite itself. This
is both cleaner code and produces bette results.
Differential Revision: https://reviews.llvm.org/D156843
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
Before this patch, the compiler gave a bump to the inline-threshold
when the total size of the allocas passed as arguments to the
callee was below 256 bytes.
This heuristic ignores that some of these allocas could have be removed
by SROA if inlining was applied.
Ideally, this bonus would be attributed to the threshold once the
size of all the allocas that could not be handled by SROA is known:
at the end of the InlineCost analysis.
However, we may never reach this point if the inline-cost analysis exits
early when the inline cost goes over the threshold mid-analysis.
This patch proposes:
* Attribute the bonus in the inline-threshold when allocas are passed
as arguments (regardless of their total size).
* Assigns a cost to each alloca proportional to its size,
such that the cost of all the allocas cancels the bonus.
Potential problems:
* This patch assumes that removing alloca instructions with SROA is
always profitable. This may not be the case if the total size of the
allocas is still too big to be promoted to registers/LDS.
* Redundant calls to getTotalAllocaSize
* Awkwardly, the threshold attributed contributes to the single-bb and
vector bonus.
Reviewed By: scchan
Differential Revision: https://reviews.llvm.org/D149741
This reverts commit 464dcab8a6c823c9cb462bf4107797b8173de088.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
This was failing to inline the opencl libraries with daz enabled. As a
modifier to the base mode, denormal-fp-mode-f32 is weird and has no
meaning if it's missing.