There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
This change builds on https://github.com/llvm/llvm-project/pull/160319
which tries to clarify which *callers* (not backends) assume that the
result is actually trivial.
This change itself should be NFC. Essentially, I'm just renaming the
existing isTrivialRematerializable to the non-trivial version and then
adding a new trivial version (with the same name as the prior function)
and simplifying a few callers which want that semantic.
This change does *not* enable non-trivial remat any more broadly than
was already done for our targets which were lying through the old APIs;
that will come separately. The goal here is simply to make the code
easier to follow in terms of what assumptions are being made where.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
This reverts commit bd2dac98ed4f19dcf90c098ae0b9976604880b59, and part of ca2e8fc928ad103f46ca9f827e147c43db3a5c47.
My speculative attempt at fixing buildbot failed, so just roll back the
relavant part of the change.
This is a preparatory change for an upcoming reorganization of our
rematerialization APIs. Despite the interface being documented as
"trivial" (meaning no virtual register uses on the instruction being
considered for remat), our actual implementation inconsistently supports
non-trivial remat, and certain backends (AMDGPU and RISC-V mostly) lie
about instructions being trivial to abuse that. We want to allow
non-triial remat more broadly, but first we need to do some cleanup to
make it understandable what's going on.
These three call sites are ones which appear to actually want the
trivial definition, and appear fairly low risk to change.
p.s. I'm deliberately *not* updating any APIs in this change, I'm going
to do that as a followup once it's clear which category each callsite
fits in.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the remaining LLVM CodeGen
and CodeGenTypes library interfaces that were missed in, or modified
since, previous patches. The annotations currently have no meaningful
impact on the LLVM build; however, they are a prerequisite to support an
LLVM Windows DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS:
- Explicitly instantiate `CallLowering::setArgFlags` template method
instances in `CodeGen/GlobalISel/CallLowering.h` and annotate them with
`LLVM_ABI`. These methods are already explicitly instantiated in
`lib/CodeGen/GlobalISel/CallLowering.cpp` but were not `extern` declared
in the header.
- Annotate several explicit template instantiations with
`LLVM_EXPORT_TEMPLATE`.
- Include `llvm/CodeGen/Passes.h` from
`llvm/lib/CodeGen/GlobalMergeFunctions.cpp` to pick up the declaration
of `llvm::createGlobalMergeFuncPass` with the `LLVM_ABI` annotation
(instead of adding `LLVM_ABI` to the function definition in this file)
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)
Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.
In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).
We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.
Differential Revision: https://reviews.llvm.org/D115195