This computes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided. Instead of
a single hash value, it now returns FunctionHashInfo which includes a
hash value, an instruction mapping, and a map to track the operand
location and its corresponding hash value that is ignored.
Depends on https://github.com/llvm/llvm-project/pull/112621.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
Add the llvm-canon tool. Description from the [original
PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu):
> Added a new llvm-canon tool which aims to transform LLVM Modules into
a canonical form by reordering and renaming instructions while
preserving the same semantics. This tool makes it easier to spot
semantic differences while diffing two modules which have undergone
different transformation passes.
The current version of this tool can:
- Reorder instructions within a function.
- Rename instructions based on the operands.
- Sort commutative operands.
This code was originally written by @michalpaszkowski and [submitted to
mainline
LLVM](14d358537f).
However, it was quickly
[reverted](335de55fa3)
to do BuildBot errors.
Michal presented his version of the tool in [LLVM-Canon: Shooting for
Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg).
@AidanGoldfarb and I ported the code to the new pass manager, added more
tests, and fixed some bugs related to PHI nodes that may have been the
root cause of the BuildBot errors that caused the patch to be reverted.
Additionally, we rewrote the implementation of instruction reordering to
fix cases where the original algorithm would break use-def chains.
Note that this is @AidanGoldfarb and I's first time submitting to LLVM.
Please liberally critique the PR!
CC @plotfi for initial review.
---------
Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>
This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.
This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.
Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)
You can find episode 1: Clang FE of this patch series at
https://github.com/llvm/llvm-project/pull/99282
Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
64498c5483).
While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.
With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.
Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes.
We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.
Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
This patch implements the new pass and registers it with the pass
manager. For context, this is a vectorizer that operates on Sandbox IR,
which is a transactional IR on top of LLVM IR.
This is simplifycfg part of
https://github.com/llvm/llvm-project/pull/95515
In this PR, we support hoisting load/store with conditional faulting in
`SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional
branches.
This is for cases like
```
void test (int a, int *b) {
if (a)
*b = a;
}
```
In the following patches, we will support the hoist in
`SimplifyCFGOpt::hoistCommonCodeFromSuccessors`.
That is for cases like
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV. Splitting it out makes
it easier to test.
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
Integration wise, I switched from using TTI to using a pass configuration
variable. This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
DXIL Metadata Analysis passes (one for legacy PM and one for new PM)
that collect following DXIL module metadata information in a structure
are added.
1. Shader Model version
2. DXIL version
3. Shader Stage
Information collected using the legacy pass is verified by adding
additional test commands to existing metadata test sources.
Split from #100596.
Introduce the RealtimeSanitizer transform, which inserts the
rtsan_enter/exit functions at the appropriate places in an instrumented
function.
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
The `!unpredictable` metadata has been present for a long time, but
it's usage in optimizations is still limited. This patch teaches
`FoldTwoEntryPHINode()` to be more aggressive with an unpredictable
branch to reduce mispredictions.
A TTI interface `getBranchMispredictPenalty()` is added to distinguish
between different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.
[CodeGen] Change the prototype of regalloc filter function
Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.
Patch provided by: Gang Chen (gangc@amd.com)
- Add `MachineVerifierPass`.
- Use complete `MachineVerifierPass` in `VerifyInstrumentation` if
possible.
`LiveStacksAnalysis` will be added in future, all other analyses are
done.
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
This PR introduces the numerical sanitizer originally proposed by
Clement Courbet on https://reviews.llvm.org/D97854
(https://arxiv.org/abs/2102.12782).
The main additions include:
- Migration to LLVM opaque pointers
- Migration to various updated APIs
- Extended coverage for LLVM instructions/intrinsics
- Code refactoring
The tool is still very experimental, the coverage (e.g. for intrinsics /
library functions) is incomplete.
Link: https://discourse.llvm.org/t/rfc-revival-of-numerical-sanitizer/79601
---------
Co-authored-by: Fangrui Song <i@maskray.me>
This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156.
In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR
when targeting PowerPC. Move test to `llc/new-pm`, which is X86
specific.
In NewPM pass managers, add a "pretty stack frame" that tells you which
pass crashed while running which function.
For example `opt -O3 -passes-ep-peephole=trigger-crash-function test.ll`
will print something like this:
```
Stack dump:
0. Program arguments: build/bin/opt -S -O3 -passes-ep-peephole=trigger-crash-function test.ll
1. Running pass "function<eager-inv>(mem2reg,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,trigger-crash-function,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>)" on module "test.ll"
2. Running pass "trigger-crash-function" on function "fshl_concat_i8_i8"
```
While the crashing pass is usually evident from the stack trace, this
also shows which function triggered the crash, as well as the pipeline
string for the pass (including options).
Similar functionality existed in the LegacyPM.
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.
-----
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.
This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.
This way there is no compile-time impact if the mapping is not used.
My attempt to fix the Windows build made things worse,
revert entirely for now.
This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.