The previous position of llvm.protected.field.ptr lowering for loads
and stores was problematic as it not only inhibited optimizations such
as DSE (as stores to a llvm.protected.field.ptr were not considered to
must-alias stores to the non-protected.field pointer) but also required
changes to other optimization passes to avoid transformations that would
reduce PFP coverage.
Address this by moving the load/store part of the lowering to
InstCombine, where it will run earlier than the PFP-breaking and
AA-relying transformations. The deactivation symbol, null comparison
and EmuPAC parts of the lowering remain in PreISelLowering.
Now that the transformation inhibitions are no longer needed, remove them
(i.e. partially revert #151649, and revert #182976).
This change resulted in a 2.4% reduction in Fleetbench .text size and
the following improvements to PFP performance overhead for BM_PROTO_Arena
on various microarchitectures:
before after
Apple M2 Ultra 3.5% 3.3%
Google Axion C4A 3.3% 2.9%
Google Axion N4A 2.7% 2.2%
Reviewers: fmayer, nikic, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/186548
BasicBlock::getTerminator() is frequently called on valid IR, yet the
function has to check that the last instruction is in fact a terminator,
even in release builds. This check can only be optimized away when the
instruction is dereferenced.
Therefore, introduce the functions hasTerminator() and
getTerminatorOrNull() as replacement and require (assert) that
getTerminator() always returns a valid terminator. As a side effect,
this forces explicit expression of intent at call sites when unfinished
basic blocks should be supported.
Branch weight metadata can overflow when folding large branch weights.
Updated branch weights to uint64_t, added check for overflow, and then
set branch weights using setFittedBranchWeights to ensure branch weight
metadata is not lost.
ptrauth intrinsic to safely construct relative ptr for swift coroutines.
A ptrauth intrinsic for swift co-routine support that allows creation of
signed pointer
from offset stored at address relative to the pointer.
Following C-like pseudo code (ignoring keys,discriminators) explains its
operation:
let rawptr = PACauth(inputptr);
return PACsign( rawptr + signextend64( *(int32*)(rawptr+addend) ))
What: Authenticate a signed pointer, load a 32bit value at offset
'addend' from pointer,
add this value to pointer, sign this new pointer.
builtin: __builtin_ptrauth_auth_load_relative_and_sign
intrinsic: ptrauth_auth_resign_load_relative
Previously, `gatherIncomingValuesToPhi` populated the `IncomingValues`
map with *all* non-undef incoming values from the PHI node. For PHI
nodes with a large number of incoming blocks, this caused the
`SmallDenseMap` to grow significantly, triggering expensive resizing and
rehashing operations, even when the caller
(`redirectValuesFromPredecessorsToPhi`) was only interested in a small
subset of predecessors.
This patch optimizes the logic to prevent this unnecessary map growth.
Instead of collecting all values, we now:
1. Initialize the `IncomingValues` map specifically for the blocks in
`BBPreds` (setting them to `nullptr` initially).
2. Iterate through the PHI node and update the map entries only if the
incoming block is already present in the map.
This ensures that the size of the map is bounded by the size of
`BBPreds`. Since `BBPreds` is typically small, this change keeps the map
within the `SmallDenseMap`'s inline storage in most cases, eliminating
heap allocations and resizing overhead for large PHI nodes.
The `selectIncomingValueForBlock` and `replaceUndefValuesInPhi` helpers
are also updated to handle map entries where the value is `nullptr`.
This adds the analogous metadata to the nofpclass attribute
to assert values are not a certain set of floating-point classes.
This allows the same information to be expressed if a function
argument is passed indirectly. This matches the bitmask encoding
of nofpclass.
I also think this should be allowed for stores to symmetrically handle
sret, but leave that for later.
Alternatively we could add a more expressive !fprange metadata,
but that would be much more complex. It's useful to match the attribute,
and more annotations can always be added.
Fixes#133560
Protected pointer field loads/stores should be paired with the intrinsic
to avoid unnecessary address escapes.
Reviewers: nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/151649
Reverts llvm/llvm-project#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.
While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
This change enables the LoadStoreVectorizer to merge and vectorize
contiguous chains even when their scalar element types differ, as long
as the total bitwidth matches. To do so, we rebase offsets between
chains, normalize value types to a common integer type, and insert the
necessary casts around loads and stores. This uncovers more
vectorization opportunities and explains the expected codegen updates
across AMDGPU tests.
Key changes:
- Chain merging
- Build contiguous subchains and then merge adjacent ones when:
- They refer to the same underlying pointer object and address space.
- They are either all loads or all stores.
- A constant leader-to-leader delta exists.
- Rebasing one chain into the other's coordinate space does not overlap.
- All elements have equal total bit width.
- Rebase the second chain by the computed delta and append it to the
first.
- Type normalization and casting
- Normalize merged chains to a common integer type sized to the total
bits.
- For loads: create a new load of the normalized type, copy metadata,
and cast back to the original type for uses if needed.
- For stores: bitcast the value to the normalized type and store that.
- Insert zext/trunc for integer size changes; use bit-or-pointer casts
when sizes match.
- Cleanups
- Erase replaced instructions and DCE pointer operands when safe.
- New helpers: computeLeaderDelta, chainsOverlapAfterRebase,
rebaseChain, normalizeChainToType, and allElemsMatchTotalBits.
Impact:
- Increases vectorization opportunities across mixed-typed but
size-compatible access chains.
- Large set of expected AMDGPU codegen diffs due to more/changed
vectorization.
This PR resolves#97715.
Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would
convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats
booleans as unsigned values, so this resulted in the attribute
`DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the
debugger would display the value as `255` instead of `true`.
This change modifies the behavior to use zero-extension for 1-bit values
instead, ensuring that `true` is represented as 1. Consequently, the
DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the
debugger to correctly display the boolean as `true`.
GVN currently has two different implementation of equality propagation.
One of them is used for branch conditions (dominating an edge), which
performs replacements across multiple blocks. This is also used for
assumes to handle uses outside the current block.
However, uses inside the block are handled using a completely separate
implementation, which involves populating a replacement map and then
checking it for individual instructions during normal GVN. While this
approach generally makes sense, it is kind of pointless if we already do
a use walk to handle the cross-block case anyway.
This PR generalizes propagateEquality() to accept either a
BasicBlockEdge or an Instruction* and replace dominated users. This
removes the need for special handling of uses in the same block for
assumes, as they're covered by instruction dominance, and ensures that
both implementations do not go out of sync.
This introduces `!captures` metadata on stores, which looks like this:
```
store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
```
The semantics are the same as replacing the store with a call like this:
```
call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)
```
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's `println!()` machinery, which
involves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.
fixes#152348
SimplifyCFG collapses raw buffer store from a if\else load into a
select.
This change prevents the TargetExtType dx.Rawbuffer from being replace
thus preserving the if\else blocks.
A further change was needed to eliminate the phi node before we process
Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
This was always undesirable, and after #149310 it is illegal and will
result in a verifier error.
Fix this by moving SimplifyCFG's check for this into
canReplaceOperandWithVariable(), so it's shared with GVNSink.
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.
This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).
All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
`combineMetadata` helper currently drops `!nosanitize` metadata when
merging two instructions, even if both originally carried `!nosanitize`.
This is problematic because `!nosanitize` is a key mechanism used by
sanitizer (e.g., ASan) to suppress instrumentation. Removing it can lead
to unintended sanitizer behavior.
This patch adds `nosanitize` to the whitelist in combineMetadata,
preserving it only if both instructions carry `!nosanitize`; otherwise,
it is dropped. This patch also adds corresponding tests in a test file
and regenerates it.
---
### Details
**Example (see [Godbolt](https://godbolt.org/z/83P5eWczx) for
details**):
```llvm
%v1 = load i32, ptr %p, !nosanitize
%v2 = load i32, ptr %p, !nosanitize
```
When merged via `combineMetadata(%v1, %v2, ...)`, the resulting
instruction loses its `!nosanitize` metadata.
Tools such as UBSan and AFL rely on `nosanitize` to prevent unwanted
transformations or checks. However, the current implementation of
combineMetadata mistakenly drops !nosanitize. This may lead to
unintended behavior during optimization.
For example, under `-fsanitize=address,undefined -O2`, IR emitted by
UBSan may lose its `!nosanitize` metadata due to the incorrect metadata
merging in optimization. As a result, ASan could unexpectedly instrument
those instructions.
> Note: due to the current UBSan handlers having relatively
coarse-grained attributes, this specific case is difficult to reproduce
end-to-end from source code—UBSan currently inhibits such optimizations
(refer to #135135 for details).
Still, I believe it's necessary to fix this now, to support future
versions of UBSan that might allow such optimizations, and to support
third-party tools (such as AFL-based fuzzers) that rely on the presence
of !nosanitize.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Currently, GlobalObject has an "alignment" property... but it's
basically nonsense: alignment doesn't mean the same thing for variables
and functions, and it's completely meaningless for ifuncs.
This "removes" (actually marking protected) the methods from
GlobalObject, adds the relevant methods to Function and GlobalVariable,
and adjusts the code appropriately.
This should make future alignment-related cleanups easier.
Start removing debug intrinsics support -- starting with the flag that
controls production of their replacement, debug records. This patch
removes the command-line-flag and with it the ability to switch back to
intrinsics. The module / function / block level "IsNewDbgInfoFormat"
flags get hardcoded to true, I'll to incrementally remove things that
depend on those flags.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
Move parts of the logic used by Reassociate to OverflowTracking
(mergeFlags & applyFlags) and move the definition to Local.h.
For now it just moves the NUW/NSW handling, as this matches the uses in
LICM. I'll look into the FP math handling separately, as it looks like
there's a difference between Reassociate (takes all flags from I, while
LICM takes the intersection of the flags on both instructions).
PR: https://github.com/llvm/llvm-project/pull/140403
Fixes#138345. Before this patch, gvn-sink would try to sink inline
assembly statements. Other GVN passes avoid them (see
b4fac94181/llvm/lib/Transforms/Scalar/GVN.cpp (L2932)
Similarly, gvn-sink should skip these instructions, since they are not
safe to move. To do this, we update the early exit in
canReplaceOperandWithVariable, since it should have caught this case.
It's more efficient to also skip numbering in GVNSink if the instruction
is InlineAsm, but that should be infrequent.
The test added is reduced from a failure when compiling Fuchsia with
gvn-sink.