(Reland of #190092 with verifier change to look through GlobalAliases)
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
This fixes a rematerializer issue wherein re-creating the interval of a
non-rematerializable super-register defined over multiple MIs, some of
which defining entirely dead sub-registers, could cause a crash when
changing the order of sub-definitions (for example during scheduling)
because the re-created interval could end up with multiple connected
components, which is illegal. The solution is to split separate
components of the interval in such cases. The added unit test crashes
without that added behavior.
Block pointers are only stored while constructing the analysis, so the
value handle to catch erased blocks is no longer needed when using
stable block numbers.
This is fix for
[187902](https://github.com/llvm/llvm-project/issues/187902), where
`LoopInfo` is not in a valid state at the beginning of `ScalarEvolution::createSCEVIter`.
The reason for the bug is that, `mergeLatch()` is called at a place
where control flow and dominator trees have been updated but `LoopInfo`
has not completed the update yet. `mergeLatch()` calls into
`ScalarEvolution` that uses `LoopInfo`, where out-of-date `LoopInfo` would
result in crash or unpredictable results.
This patch moves `mergeLatch()` to the place where `LoopInfo` has
completed its update and hence is in a valid state.
Fixes found by fuzzer:
OnDiskTrieRawHashMap:
- Bounds-check data slot offsets in TrieVerifier::visitSlot() before
calling getRecord(), preventing asData() assertion on out-of-bounds
trie entries.
- Validate subtrie headers (NumBits, bounds) before constructing
SubtrieHandle, preventing SEGV in getSlots() from corrupt NumBits.
- Validate arena bump pointer alignment, catching misaligned BumpPtr
that would crash store() with an alignment assertion.
- Fix comma operator bug in getOrCreateRoot() where the
compare_exchange_strong result was discarded, causing asSubtrie()
assertion when RootTrieOffset was corrupted to zero.
OnDiskGraphDB:
- Reject invalid (zero) ref offsets in validate callback, preventing
asData() assertion when corrupt data pool refs are resolved via
recoverFromFileOffset().
- Validate DataRecordHandle layout flags before calling getTotalSize(),
preventing llvm_unreachable on corrupt NumRefsFlags/DataSizeFlags.
- Validate data pool bump pointer alignment, catching misaligned
BumpPtr that would crash store() in DataRecordHandle::constructImpl().
- Check data record refs offset alignment before calling getRefs(),
preventing PointerUnion assertion from misaligned refs pointer.
MappedFileRegionArena:
- Convert assertions in initializeHeader() to errors so corrupted
arena headers return an error on CAS open instead of crashing.
Assisted-By: Claude
The rematerializer implements support for rolling back
rematerializations by modifying MIs that should normally be deleted in
an attempt to make them "transparent" to other analyses. This involves:
1. setting their opcode to DBG_VALUE and
2. setting their read register operands to the sentinel register.
This approach has several drawbacks.
1. It forces the rematerializer to support tracking these "dead MIs"
(even if support is optional, these data-structures have to exist).
2. It is not actually clear whether this mechanism will interact well
with all other analyses. This is an issue since the intent of the
rematerializer is to be usable in as many contexts as possible.
3. In practice, it has shown itself to be relatively error-prone.
This commit removes rollback support from the rematerializer and moves
those capabilities to a rematerializer listener than can be instantiated
on-demand and implements the same functionality on top of standard
rematerializer operations. The rematerializer now actually deletes MIs
that are no longer useful after rematerializations, and has support for
re-creating them on-demand without requiring additional tracking on its
part.
The visited set can grow rather large and we can use an unused field in
SDNode to store the same information without the use of a hash set.
This improves compile times: stage2-O3 -0.14%.
Previously, `computeProcResourceMasks()` would print resource masks on
debug mode from multiple call sites, creating noise in the debug output.
This patch aims to fix this and also print more info about the
resources.
It splits to 2 types of debug prints for resources:
1. No simulation - mask only
2. Simulation - mask + other info
For 2, it shares printing on a single place in `ResourceManager`
constructor, that should cover all the other simulation cases
indirectly:
1. `llvm/lib/MCA/HardwareUnits/ResourceManager` - covered
2. `llvm/lib/MCA/InstrBuilder.c` - should be covered indirectly - only
used by `llvm-mca` before simulation that constructs a `ResourceManager`
3. `llvm/tools/llvm-mca/Views/SummaryView.cpp` - after simulation that
constructs a `ResourceManager`
4. `llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp` - after simulation
that constructs a `ResourceManager`
It also adds `BufferSize` to the output, which should be useful to debug
scheduling model + MCA integration.
For 1, it inlines mask-only printing into 2 other callers:
1. `llvm/include/llvm/MCA/Stages/InstructionTables.h`
2. `llvm/tools/llvm-exegesis/lib/SchedClassResolution.cpp`
as they only use the masks there. I think this is a reasonable
duplication across distinguishably different users/tools.
Now every pair of callers, even across groups (1 and 2), effectively
print in a mutually exclusive way.
The patch adds debug tests for the 3 new callers, in the corresponding
root test directories, to drive further location of logically
target-independent tests that just require some target at the root. I
think this convention is more discoverable, and is pretty widely used in
the project.
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
Some functions used `new`/`delete` to allocate/free arrays. To avoid
memory leaks, it would be better to avoid using raw pointers. This patch
replaces the use of them with `SmallVector`.
The patch moves out of SCEV's scope so they can be re-used for SCEVUse.
SCEVUse gets an additional getNoWrapFlags helper that returns the union
of the expressions SCEV flags and the use-specific flags.
SCEVExpander has been updated to use this new helper.
In order to avoid other changes, the original names are exposed via
constexpr in SCEV. Not sure if there's a nicer way. One alternative
would be to define the enum in struct, and have SCEV inherit from it.
The patch also clarifies that the SCEVUse flags encode NUW/NSW, and
hides getInt, setInt, etc to avoid potential mis-use.
PR: https://github.com/llvm/llvm-project/pull/190199
The string literal "stack frame size" passed to the base class
constructor created a temporary Twine that was destroyed after
the base constructor completed, leaving a dangling reference.
Fix by storing the Twine as a member variable in the derived class,
ensuring it lives as long as the diagnostic object itself.
Fixes ASAN stack-use-after-scope error in
Clang :: Misc/backend-stack-frame-diagnostics-fallback.cpp
LLVM :: CodeGen/X86/2007-04-24-Huge-Stack.ll
LLVM :: CodeGen/X86/huge-stack-offset.ll
LLVM :: CodeGen/X86/huge-stack-offset2.ll
LLVM :: CodeGen/X86/huge-stack.ll
LLVM :: CodeGen/X86/large-displacements.ll
LLVM :: CodeGen/X86/stack-clash-extra-huge.ll
LLVM :: CodeGen/X86/warn-stack.ll
LLVM :: CodeGen/X86/win64-stackprobe-overflow.ll
Calls `Streamer.setLFIRewriter` during generic LFIMCStreamer
initialization rather than requiring it to be done during
backend-specific initialization. This better follows the existing
conventions in `create*` functions in `TargetRegistry.h`.
Also re-adds the call to initSections for LFI in `llvm-mc.cpp`
(necessary in order to emit the ABI Note section), along with a test to
make sure ABI note emission with the rewriter is working.
Summary:
Allocation kinds were added after these were introduced. We only needed
the TLI to identify these in the attributor so we can now just use
attributes. Update the usage in OpenMP and drop the TLI interface.
Fixes: https://github.com/llvm/llvm-project/issues/190072
Avoid expensive hash map of block to value by using a vector. To avoid
allocating and clearing the entire vector per query, cache the
allocation and use an epoch to identify stale values from previous
queries.
Fix a bug where `distributeIRToProfileLocationMap` fails to find
location mappings from IR to profile for renamed functions because
`FuncMappings` is indexed by the IR function name while
`distributeIRToProfileLocationMap` looks up by the profile function
name. Fixed by making `FuncMappings` to use profile function name as
key.
Match naming convention for other m_Specific* matchers, and frees up the
m_Opc() matcher for future use in #84940 to allow us to capture the
opcode of a unknown binop
Moving to m_SpecificOpc does mess up the formatting in a few places,
I've tried to refactor to use the m_Value(SDValue, ....) matcher where I
can to retrieve some whitespace
Fix issue reported on
https://github.com/llvm/llvm-project/pull/188296#issuecomment-4179103756
`SwiftErrorValueTracking` holds per-function state used by
`IRTranslator`.
On targets where `TargetLowering::supportSwiftError()` is false, (e.g.
wasm) `SwiftErrorValueTracking::setFunction()` exits early.
Historically, that early return happened before clearing per-function
containers, and pointer members (including `SwiftErrorArg`) had no
in-class initialization.
The bad case is a function with a swifterror argument on such a target:
`IRTranslator` uses `SwiftError.getFunctionArg()` without checking
`supportSwiftError()` and this could read an uninitialized
`SwiftErrorArg` value. (SelectionDAG gates the `getFunctionArg` usages
behind `supportSwiftError()`, so it's specific to GlobalISel)
29391328ab66 added [a first test
case](llvm/test/CodeGen/WebAssembly/GlobalISel/irtranslator/args-swiftcc.ll)
that satisfies:
- the target is `supportSwiftError` = false
- use swiftcc
- use GlobalISel
and it made the issue observable with sanitizer builds. This commit
fixes the per-function container reinitialization and defensively add
explicit pointer member initializations.
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.
Split off from #135148, and the final patch in the series for #135148
Rename several arguments to intrinsic related functions from `ArgsTys`
to `OverloadTys` to better reflect their meaning. The only variables
left with name `ArgTys` now actually mean function argument types.
Also reamove an incorrect comment in Intrinsics.td. Dependent types do
allow forward references starting with
7957fc6547
The `-mno-incremental-linker-compatible` switch translates to Brepro
linker flag and must be passed on to the underlying linker to match the
behavior of the Windows triples that produce PE COFF.
This commit add the GetDimensions methods to Texture2D. For DXIL, it
requires intrinsics that are not yet available. They are added, but not
implemented.
Assisted-by: Gemini
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Add SCEVUseVisitor, a new visitor class where all visit methods receive
a SCEVUse instead of a const SCEV*. Use it for SCEVExpander, so it can
use use-specific flags in the future.
PR: https://github.com/llvm/llvm-project/pull/188863
The Exact SIV test and the Exact RDIV test behave almost identically,
except that the Exact SIV test also explores the directions in the final
step. This patch consolidates the two duplicate implementations into a
single function that can be used by both tests. While this change
slightly affects things like debug output and metrics, it is not
intended to alter the actual test results.
The main change is to eliminate the use of "Argument" terminology when
dealing with overloaded types since overloaded types can be either
argument or return values, and some additional renaming for clarity.
1. Rename `Tys` argument to various intrinsic APIs to `OverloadTys` to
better reflect its meaning.
2. Rename `IITDescriptorKind::Argument` to
`IITDescriptorKind::Overloaded` to better convey that it's an overloaded
type. Removed "Argument" suffix for other kinds for dependent types.
3. Rename `ArgKind` to `AnyKind`, `getArgumentNumber` to
`getOverloadIndex`, `getArgumentKind` to `getOverloadKind`,
`getRefArgNumber` to `getRefOverloadIndex`, and `IIT_ARG` to `IIT_ANY`.
4. Rename `IIT_ANYPTR` (used to represent a pointer qualified with
address space) to `IIT_PTR_AS` to clearly distinguish it from
`llvm_anyptr_ty`
5. Change the packing of [ref overload index & overload index] for
`VecOfAnyPtrsToElt` to pack the overload index into the lower bits, so
we can use the `getOverloadIndex` function to get the overload index.
This adds the CalculateLevelOfDetail and CalculateLevelOfDetailUnclamped
methods to Texture2D using the establish pattern used for other methods.
Assisted-by: Gemini
Initially added in #187709. It was reverted in #188833, because
[llvm-clang-x86_64-sie-win](https://lab.llvm.org/buildbot/#/builders/46/builds/32873)
was failing in
`cross-project-tests/debuginfo-tests/dexter-tests/nrvo.cpp`.
The test passed for me locally. After checking on another machine, I
found that `S_DEFRANGE_REGISTER_REL_INDIR` is only supported by
dbgeng/WinDbg from Windows 10.0 Build 19041 (released 2020) onwards.
SDKs before this will fail to read the value. That buildbot is on
Windows 10.0 Build 17763.
I'm not sure if we should make the generation of that record
conditional. Debuggers that can't read the record will skip it. They'll
still see that there's some local variable, but won't be able to display
the value.
As far as I know, users of older Windows 10 builds should be able to
install a newer Windows SDK and use the WinDbg from that version. But I
haven't tested that.
Reland of #172062 (a71b1d2), which was reverted in b0234d1.
This patch makes essential Bitset member functions constexpr (`set()`,
`any()`, `none()`, `count()`, `operator==`, `!=`, `<`, `\~`) and adds a
new `all()` method. It also introduces a `maskLastWord()` invariant to
ensure unused high bits in the last word are always zero, which is
required for correctness of `operator~`, `set()`, `all()`, and
comparisons on non-word-aligned sizes (e.g., `Bitset<33>`).
Changes from the original reverted PR:
- Replaced `llvm::any_of` with an inline loop to avoid depending on
constexpr `any_of`/`none_of` from `STLExtras` (#172536), which was also
reverted due to a GCC 15.2.1 bootstrap miscompile.
- The patch is now fully self-contained with no prerequisite changes.
Motivation: This is a prerequisite for making `LaneBitmask` a wrapper
around `Bitset`, enabling scalable lane bitmasks beyond 64 bits
(https://discourse.llvm.org/t/rfc-out-of-lanebitmask-bits-again/88613).
This is a follow-up of the suggestion left here:
https://github.com/llvm/llvm-project/pull/181707#discussion_r2995733831
The override functions in AMDGPU/ARM/SystemZ/X86 are required to avoid
enabling partial reductions where they were previously disabled (I've
added this for all targets that implement getArithmeticReductionCost).
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.
This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was probably intended to be a `const SymbolStringPtr&` originally,
but if we were going to copy it anyway it's better to just take the
argument by value and std::move it.
Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.
Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.
In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.
The increased parallelism can reorder error messages from parallel
phases (e.g. relocation processing during section writes), so one lld
test is updated to use --threads=1 for deterministic output.
This makes the rematerializer able to rematerialize MIs at the end of a
basic block. We achieve this by tracking the parent basic block of every
region inside the rematerializer and adding an explicit target region to
some of the class's methods. The latter removes the requirement that we
track the MI of every region (`Rematerializer::MIRegion`) after the
analysis phase; the class member is therefore deleted.
This new ability will be used shortly to improve the design of the
rollback mechanism.
The class MCAsmBaseStreamer serves as the common base class for streamers
which emit assembly output. It has the same role as MCObjectStreamer has
for streams which emits object files.
Implements intrinsics used to get the level-of-detail given a texture,
sampler, and a coordinate. It will be used to implement the
corresponding HLSL methods.
Assisted-by: Gemini