1066 Commits

Author SHA1 Message Date
Nikita Popov
e6d9542b77
[X86][Inline] Check correct function for target feature check (#152515)
The check for ABI differences for inlined calls involves the caller, the
callee and the nested callee. Before inlining, the ABI is determined by
the target features of the callee. After inlining it is determined by
the caller. The features of the nested callee should never actually
matter.
2025-08-19 09:44:00 +02:00
Justin Fargnoli
58de8f2c25
[Inliner] Add option (default off) to inline all calls regardless of the cost (#152365)
Add a default off option to the inline cost calculation to always inline
all viable calls regardless of the cost/benefit and cost/threshold
calculations.

For performance reasons, some users require that all calls be inlined.
Rather than forcing them to adjust the inlining threshold to an
arbitrarily high value, offer an option to inline all calls.
2025-08-18 17:48:49 +00:00
Tobias Stadler
d803a93f55
[Inliner] Report inlining decision before deleting Callee contents (#153616)
Call `recordInliningWithCalleeDeleted` before dropping the contents of
the Callee. Otherwise the handlers don't have access to e.g. the
DebugLoc, so the Callee DebugLoc was missing in inlining remarks for
functions with internal linkage.

The test is the same as `optimization-remarks-passed-yaml.ll` except
that the function `foo` has internal linkage instead of external linkage.
2025-08-15 12:00:34 +01:00
choikwa
1d30f71b21
[AMDGPU] Make ds/global load intrinsics IntrArgMemOnly (#152792)
This along with IntrReadMem means that the Intrinsic only reads memory
through the given argument ptr and its derivatives. This allows passes
like Inliner to attach alias.scope to the call instruction as it sees
that no other memory is accessed.

Discovered via SWDEV-543741

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-08-12 21:51:39 +09:00
Mircea Trofin
75b3cc9e6c
[mlgo] Remove tests involving the size estimator, after PR #139357 (#152813)
We'll remove the size estimator after, this change is to get the `ml-*`
build bots green after the aforementioned PR.

We never used the size estimator again after the initial DQN-based
training. Should we want to again, we now have IR2Vec, which the old
estimator was approximating in functionality.
2025-08-08 16:42:40 -07:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Nikita Popov
18e4f775c3
[SystemZ] Remove incorrect areInlineCompatible hook (#152494)
This reverts https://github.com/llvm/llvm-project/pull/132976.

The PR incorrectly claimed that this makes inlining more liberal,
referencing the string comparison in TargetTransformInfoImpl.h.

However, the implementation that actually applies is the one in
BasicTTIImpl.h, which performs a feature subset comparison. As such,
this regressed inlining, most concerningly of functions without +vector
into functions with +vector.

Revert the change to restore the previous behavior.
2025-08-08 10:06:19 +02:00
Sander de Smalen
76ce464073
[AArch64] Dont inline streaming fn into non-streaming caller (#150595)
Without this change, the following test would fail to compile
with `-march=armv8-a+sme`:

```
  void func1(const svuint32_t *in, svuint32_t *out) {
    [&]() __arm_streaming { *out = *in; }();
  }
```

But in general, it's probably better never to inline
streaming functions into non-streaming functions, because
they will have been marked as 'streaming' for a reason
by the user.
2025-08-01 09:05:19 +01:00
Sander de Smalen
d97566a5f5 [AArch64] NFC: Precommit test changes to sme-pstatesm-attrs.ll
The tests need feature attributes, in order for the inliner to
make a more informed decision.
2025-07-25 09:10:27 +00:00
Nikita Popov
18edd82716 [Inline] Regenerate test checks (NFC)
Do not omit check lines for any functions, to avoid spurious diffs
on regeneration. Also update to a newer UTC version which properly
generates the metadata checks.
2025-07-23 14:21:22 +02:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
Teresa Johnson
e57315e6ca
[MemProf] Fix discarding of noncold contexts after inlining (#149599)
When we rebuild the call site tries after inlining of an allocation with
MD_memprof metadata, we don't want to reapply the discarding of small
non-cold contexts (under -memprof-callsite-cold-threshold=) because we
have either no context size info (without -memprof-report-hinted-sizes
or another option that causes us to keep that as metadata), and even
with that information in the metadata, we have imperfect information at
that point as we have already discarded some contexts during matching.

The first case was even worse because we didn't guard our check by
whether the number of cold bytes was 0, leading to very aggressive
pruning during post-inline metadata rebuilding without the context size
information.
2025-07-18 21:11:37 -07:00
Prabhu Rajasekaran
921c6dbeca
[llvm] Introduce callee_type metadata
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.

The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Reviewers: morehouse, petrhosek, nikic, ilovepi

Reviewed By: nikic, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87573
2025-07-18 14:40:54 -07:00
Rahman Lavaee
cd9236d788
Account for inline assembly instructions in inlining cost. (#146628)
Inliner currently treats every "call asm" IR instruction as a single
instruction regardless of how many instructions the inline assembly may
contain. This may underestimate the cost of inlining for a callee
containing long inline assembly. Besides, we may need to assign a higher
cost to instructions in inline assembly since they cannot be analyzed
and optimized by the compiler.

This PR introduces a new option `-inline-asm-instr-cost` -- set zero by
default, which can control the cost of inline assembly instructions in
inliner's cost-benefit analysis.
2025-07-09 10:48:07 -07:00
Tobias Stadler
9186df9b08
[InlineCost] Simplify extractvalue across callsite (#145054)
Motivation: When using libc++, `std::bitset<64>::count()` doesn't
optimize to a single popcount instruction on AArch64, because we fail to
inline the library code completely. Inlining fails, because the internal
bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The
value is built using insertvalue instructions and only one of the array
entries is constant. If we know that this entry is constant, we can
prove that half the function becomes dead. However, InlineCost only
considers operands for simplification if they are Constants, which %arg
is not. Without this simplification the function is too expensive to
inline.

Therefore, we had to teach InlineCost to support non-Constant simplified values
(PR #145083). Now, we enable this for extractvalue, because we want to simplify
the extractvalue with the insertvalues from the caller function. This is enough to
get bitset::count fully optimized.

There are similar opportunities we can explore for BinOps in the future
(e.g. cmp eq %arg1, %arg2 when the caller passes the same value into
both arguments), but we need to be careful here, because InstSimplify
isn't completely safe to use with operands owned by different functions.
2025-06-24 14:15:27 +01:00
Nikita Popov
6157028fea
[BasicAA][ValueTracking] Increase depth for underlying object search (#143714)
This depth limits a linear search (rather than the usual potentially
exponential one) and is not particularly important for compile-time in
practice.

The change in #137297 is going to increase the length of GEP chains, so
I'd like to increase this limit a bit to reduce the chance of
regressions (https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2419
showed a 13% increase in SearchLimitReached). There is no particular
significance to the new value of 10.

Compile-time is neutral.
2025-06-12 09:19:50 +02:00
Hassnaa Hamdi
c81d84c30b
[InlineCost]: Optimize inlining of recursive function. (#139982)
- Consider inlining recursive function of depth 1 only when
    the caller is the function itself instead of inlining it
    for each callsite so that we avoid redundant work.
- Use CondContext instead of DomTree for better compilation time.
2025-06-04 11:41:55 +01:00
Teresa Johnson
49d48c32e0
[MemProf] Emit remarks when hinting allocations not needing cloning (#141859)
The context disambiguation code already emits remarks when hinting
allocations (by adding hotness attributes) during cloning. However,
we did not yet emit hints when applying the hotness attributes during
building of the metadata (during matching and again after inlining).
Add remarks when we apply the hint attributes for these
non-context-sensitive allocations.
2025-05-28 16:44:44 -07:00
Mircea Trofin
ead27e69d7
[mlgo][inliner] Tigher test for interactive mode (#141677)
Prompted by #141453 - have the test check the feature that was
previously clobbered in interactive mode, if the default decision was
requested.
2025-05-27 16:49:43 -07:00
Nikita Popov
904d0c293e
[Inline] Only consider provenance captures for scoped alias metadata (#138540)
When determining whether an escape source may alias with a noalias
argument, only take provenance captures into account. If only the
address of the argument was captured, an access through the escape
source is not legal.
2025-05-27 15:15:57 +02:00
Jellytabby
18fced40d5
[LLVM][MLGO] Fix: Index correctly into features to get default inlining decision (#141453)
Currently, `InlineCostFeatureIndex::NumberOfFeatures` results in an
index in the middle of the feature vector, therefore not correctly
setting the default inlining decision and overwriting another feature.
`FeatureIndex::NumberOfFeatures` is the last index of the feature
vector, where the default inlining decision gets appended to when
enabled.
2025-05-26 14:57:04 +09:00
Daniel Paoliello
97a58b04c6
[aarch64][x86][win] Add compiler support for MSVC's /funcoverride flag (Windows kernel loader replaceable functions) (#125320)
Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).

For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.

This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.

Marked function must never be inlined (otherwise those inline sites
can't be replaced).

Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.

Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...

I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
2025-05-09 14:56:38 -07:00
Hassnaa Hamdi
0159a26744
[InlineCost]: Add a new heuristic to branch folding for better inlining decisions.
Recursive functions are generally not inlined to avoid issues 
like infinite inlining or excessive code expansion. However,
this conservative approach misses opportunities for optimization in
cases where a recursive call is guaranteed to execute only once.

This patch detects a scenario where a guarding branch condition of a recursive
call will become false after the first iteration of the recursive function.
If such a condition is met, and the recursion depth is confirmed to be one,
the Inliner will now consider this recursive function for inlining.

A new test case (`test/Transforms/Inline/inline-recursive-fn.ll`)
has been added to verify this behaviour.
2025-05-06 11:25:37 +01:00
Nikita Popov
c66ce08041 [Inline] Add tests for captures-before check for scoped AA metadata (NFC) 2025-05-05 17:11:58 +02:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
sallto
419a2cb218
[Inliner] Preserve alignment of byval arguments (#137455)
Previously the inliner always produced a memcpy with alignment 1 for src
and destination, leading to potentially suboptimal Codegen.

Since the Src ptr alignment is only available through the CallBase it
has to be passed to HandleByValArgumentInit. Dst Alignment is already
known so it doesn't have to be passed along.

If there is no specified Src Alignment my changes cause the ptr to have
no align data attached instead of align 1 as before (see
inline-tail.ll), I believe this is fine but since I'm a first time
contributor, please confirm.

My changes are already covered by 4 existing regression tests, so I did
not add any additional ones.

The example from #45778 now results in:
```C
opt -S -passes=inline,instcombine,sroa,instcombine test.ll

define dso_local i32 @test(ptr %t) {
entry:
  %.sroa.0.0.copyload = load ptr, ptr %t, align 8       # this used to be align 1 in the original issue
  %arrayidx.i = getelementptr inbounds nuw i8, ptr %.sroa.0.0.copyload, i64 24
  %0 = load i32, ptr %arrayidx.i, align 4
  ret i32 %0
}
```

Fixes #45778.
2025-04-26 21:38:58 +02:00
Matt Arsenault
f819f46284
Reapply "Inline: Propagate callsite nofpclass attribute" (#135018)
This reverts commit 3f38cd07d820248fd2043efb1341fabaac2d84a6.

Fix case where inner callsite has nofpclass but callsite does not.
2025-04-10 07:15:58 +02:00
Stephen Tozer
5039bf4e26
[DebugInfo][Inline] Propagate source locs when simplifying cond branches (#134827)
During inlining, we may opportunistically simplify conditional branches
(incl. switches) to unconditional branches if, after inlining, their
destination is fixed. While we do this, we should propagate any
DILocation attached to the original branch to the simplified branch,
which this patch enables.

Found using https://github.com/llvm/llvm-project/pull/107279.
2025-04-09 16:52:05 +01:00
Stephen Tozer
9344b2196c
[DebugInfo][Inlining] Propagate inlined resume source loc to new br (#134826)
As part of inlining an invoke instruction, we may replace an inlined
resume instruction with a simple branch to the landing pad block. When
this happens, we should also propagate the resume's DILocation to this
branch, which this patch enables.

Found using https://github.com/llvm/llvm-project/pull/107279.
2025-04-09 16:42:06 +01:00
Matt Arsenault
3f38cd07d8 Revert "Inline: Propagate callsite nofpclass attribute"
This reverts commit b0cb672b9968eeee6eb022e98476957dbdf8e6e2.

Breaks bot
2025-04-08 23:15:00 +07:00
Matt Arsenault
b0cb672b99
Inline: Propagate callsite nofpclass attribute
(#134800)

Fixes #134070
2025-04-08 22:53:17 +07:00
Andres Chavarria
9b63a92ca7
Implement areInlineCompatible for SystemZ using feature bitset (#132976)
## What?
Implement `areInlineCompatible` for the SystemZ target using
FeatureBitset comparison.

## Why?
The default implementation in `TargetTransformInfoImpl.h` makes a string
comparison and only inlines when the target-cpu and the target-features
for caller and callee are the same. We are missing out on optimizations
when the callee has a subset of features of the caller.

## How?
Get the FeatureBitset of the caller and callee and check when callee is
a subset or equal to the caller's features. It's a similar
implementation to ARM, PowerPC...

## Testing?
Test cases check for when the callee is a subset of the caller, when
it's not a subset and when both are equals.
2025-04-08 00:50:30 +02:00
Matt Arsenault
5b8d8bb90a
Inliner: Fix missing test coverage for incompatible gc rejection (#133708) 2025-04-01 06:24:59 +07:00
Vasileios Porpodas
973ea045aa Revert "[Analysis][EphemeralValuesAnalysis][NFCI] Remove EphemeralValuesCache class (#132454)"
This reverts commit 4adefcfb856aa304b7b0a9de1eec1814f3820e83.
2025-03-22 10:13:39 -07:00
vporpo
4adefcfb85
[Analysis][EphemeralValuesAnalysis][NFCI] Remove EphemeralValuesCache class (#132454)
This is a follow-up to https://github.com/llvm/llvm-project/pull/130210.
The EphemeralValuesAnalysis pass used to return an EphemeralValuesCache
object which used to hold the ephemeral values and used to provide a
lazy collection of the ephemeral values, and an invalidation using the
`clear()` function.

This patch removes the EphemeralValuesCache class completely and instead
returns the SmallVector containing the ephemeral values.
2025-03-21 18:18:03 -07:00
vporpo
08dda4dcbf
[Analysis][EphemeralValuesCache][InlineCost] Ephemeral values caching for the CallAnalyzer (#130210)
This patch does two things:

1. It implements an ephemeral values cache analysis pass that collects the ephemeral values of a function and caches them for fast lookups. The collection of the ephemeral values is done lazily when the user calls `EphemeralValuesCache::ephValues()`.

2. It adds caching of ephemeral values using the `EphemeralValuesCache` to speed up `CallAnalyzer::analyze()`. Without caching this can take a long time to run in cases where the function contains a large number of `@llvm.assume()` calls and a large number of callsites. The time is spent in `collectEphemeralvalues()`.
2025-03-19 18:18:45 -07:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Henry Jiang
6d0cfbc9c0
[PPC] Implement areInlineCompatible (#126562)
After the default implementation swap from
https://github.com/llvm/llvm-project/pull/117493, where
`areInlineCompatible` checks if the callee features are a subset of
caller features. This is not a safe assumption in general on PPC. We
fallback to check for strict feature set equality for now, and see what
improvements we can make.
2025-02-24 17:53:43 -05:00
Csanád Hajdú
a190f15d2b
[AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (1/3) (#125687)
Add support for the new SHF_AARCH64_PURECODE ELF section flag:
https://github.com/ARM-software/abi-aa/pull/304

The general implementation follows the existing one for ARM targets.
Generating object files with the `SHF_AARCH64_PURECODE` flag set is
enabled by the `+execute-only` target feature.

Related PRs:
* Clang: https://github.com/llvm/llvm-project/pull/125688
* LLD: https://github.com/llvm/llvm-project/pull/125689
2025-02-14 08:56:07 +00:00
Nikita Popov
d8b2e432d6
[IR] Remove mul constant expression (#127046)
Remove support for the mul constant expression, which has previously
already been marked as undesirable. This removes the APIs to create mul
expressions and updates tests to stop using mul expressions.

Part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
2025-02-14 09:28:57 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Kerry McLaughlin
d8d4c18761
[AArch64][SME] Disable inlining of callees with new ZT0 state (#121338)
Inlining must be disabled for new-ZT0 callees as the callee is required
to save ZT0 and toggle PSTATE.ZA on entry.
2025-01-06 12:02:28 +00:00
Sander de Smalen
2ce168baed
[AArch64] SME implementation for agnostic-ZA functions (#120150)
This implements the lowering of calls from agnostic-ZA functions to
non-agnostic-ZA functions, using the ABI routines
`__arm_sme_state_size`, `__arm_sme_save` and `__arm_sme_restore`.

This implements the proposal described in the following PRs:
* https://github.com/ARM-software/acle/pull/336
* https://github.com/ARM-software/abi-aa/pull/264
2024-12-23 19:10:21 +00:00
Sander de Smalen
b85ddba421 [AArch64] NFC: Fix inlining tests for SME ZA state.
By adding inline-asm to test, we now actually test the code-path they're
meant to test.
2024-12-23 15:36:32 +00:00
Nikita Popov
2be41e7aee
[AlwaysInline] Fix analysis invalidation (#119566)
This is a followup to #117750. Currently, AlwaysInline only invalidates
analyses at the end, by returning that no analyses are preserved.
However, this means that analyses fetched during inlining may be
outdated. The aforementioned PR exposed this issue.

Instead, bring the logic closer to what the normal inliner does, by
directly invalidating the caller in FAM. This should make sure that we
don't receive any outdated analyses even if they are fetched during
inlining.

Also drop the BFI updating entirely -- there's no point in doing it if
we're going to invalidate everything anyway.
2024-12-12 12:59:59 +01:00
Owen Anderson
ab15976173
CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119483)
This reapplies #119138 with a defensive fix for the assertion failure
when building libcxx.
Unfortunately the failure does not reproduce on my machine, so I am not
able to extract a test case.

The key insight for the fix comes from Jessica Clarke, who observes that
`VTablePtr` may, in fact,
not be a pointer on return from `FindAvailableLoadedValue`.

Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
2024-12-11 16:49:48 +13:00
Owen Anderson
9b6bb83860 Revert "CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119138)"
Reverting due to ASAN bootstrap failures.

This reverts commit 4027e2f248044d944aaf3d9bc9c8eb6928506d44.
2024-12-11 13:20:17 +13:00
Owen Anderson
4027e2f248
CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119138)
Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
2024-12-11 12:43:40 +13:00
Marina Taylor
8fb748b4a7
[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar (#117876)
When the size is an appropriate constant, __memcpy_chk will turn into a
memcpy that gets folded away by InstCombine. Therefore this patch avoids
counting these as calls for purposes of inlining costs.

This is only really relevant on platforms whose headers redirect memcpy
to __memcpy_chk (such as Darwin). On platforms that use intrinsics,
memcpy and similar functions are already exempt from call penalties.
2024-11-29 18:28:39 +00:00
hev
2523439021
[LoongArch] Add a test case for inline compatibility checks (#117144) 2024-11-25 12:34:46 +08:00