We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
Introduce a fresh analysis for recognizing polynomial hashes, with the
rationale that several targets have specific instructions to optimize
things like CRC and GHASH (eg. X86 and RISC-V crypto extension). We
limit the scope to polynomial hashes computed in a Galois field of
characteristic 2, since this class of operations can also be optimized
in the absence of target-specific instructions to use a lookup table.
At the moment, we only recognize the CRC algorithm.
RFC:
https://discourse.llvm.org/t/rfc-new-analysis-for-polynomial-hash-recognition/86268
add `GenericFloatingPointPredicateUtils` in order to generalize
effects of floating point comparisons on `KnownFPClass` for both IR and
MIR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
To perform constant folding in math operations, the implementation of
the ConstantFolding Analysis relies on the use of the math functions
from the host's libm. In particular, it relies on checking the value of
errno and IEEE exceptions to determine when an operation is safe to be
constant-folded.
On some platforms, such as BSD or Darwin, math library functions don't
set errno, so the ConstantFolding check depends only on the value of
IEEE exceptions. As the FP exception behaviour is set to `ignore` by
default, the compiler can perform optimisations that would get in the
way of such checks being performed correctly.
This patch sets the FP exception behaviour to `strict` when compiling
the `ConstantFolding.cpp` source file, ensuring the value of IEEE
exceptions can be reliably used by its implementation.
This re-lands the changes from #136139, but using the `-ftrapping-math`
compile option instead of `-ffp-exception-behavior` for GCC support.
To perform constant folding in math operations, the implementation of
the ConstantFolding Analysis relies on the use of the math functions
from the host's libm. In particular, it relies on checking the value of
errno and IEEE exceptions to determine when an operation is safe to be
constant-folded.
On some platforms, such as BSD or Darwin, math library functions don't
set errno, so the ConstantFolding check depends only on the value of
IEEE exceptions. As the FP exception behaviour is set to `ignore` by
default, the compiler can perform optimisations that would get in the
way of such checks being performed correctly.
This patch sets the FP exception behaviour to `strict` when compiling
the `ConstantFolding.cpp` source file, ensuring the value of IEEE
exceptions can be reliably used by its implementation.
In this PR, static-data-splitter pass finds out the local-linkage global
variables in {`.rodata`, `.data.rel.ro`, `bss`, `.data`} sections by
analyzing machine instruction operands, and aggregates their accesses
from code across functions.
A follow-up item is to analyze global variable initializers and count
for access from data.
* This limitation is demonstrated by `bss2` and `data3` in
`llvm/test/CodeGen/X86/global-variable-partition.ll`.
Some stats of static-data-splitter with this patch:
**section**|**bss**|**rodata**|**data**
:-----:|:-----:|:-----:|:-----:
hot-prefixed section coverage|99.75%|97.71%|91.30%
unlikely-prefixed section size percentage|67.94%|39.37%|63.10%
1. The coverage is defined as `#perf-sample-in-hot-prefixed <data>
section / #perf-sample in <data.*> section` for each <data> section.
* The perf command samples
`MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2` events at a high
frequency (`perf -c 2251`) for 30 seconds. The profiled binary is built
as non-PIE so `data.rel.ro` coverage data is not available.
2. The unlikely-prefixed `<data>` section size percentage is defined as
`unlikely <data> section size / the sum size of <data>.* sections` for
each `<data>` section
This patch does two things:
1. It implements an ephemeral values cache analysis pass that collects the ephemeral values of a function and caches them for fast lookups. The collection of the ephemeral values is done lazily when the user calls `EphemeralValuesCache::ephValues()`.
2. It adds caching of ephemeral values using the `EphemeralValuesCache` to speed up `CallAnalyzer::analyze()`. Without caching this can take a long time to run in cases where the function contains a large number of `@llvm.assume()` calls and a large number of callsites. The time is spent in `collectEphemeralvalues()`.
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.
By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
This is done because the CodeGen library and Passes library both link
against Analysis, to avoid adding a dependency between CodeGen and
Passes if we want to extend the DroppedVariableStats code for MIR stats
as well, as seen in https://github.com/llvm/llvm-project/pull/120501
LLVM has a CMake variable to control whether to consider logf128
constant folding which libAnalysis ignores. This patch changes the
logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in
config-ix.cmake.
ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`.
LLVM should not change the behavior depending on host configurations.
This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27.
(llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
This is a reland of (#96287). This patch attempts to reduce the reverted
patch's clang compile time by removing #includes of float128.h and
inlining convertToQuad functions instead.
This is a reland of #96287. This change makes tests in logf128.ll ignore
the sign of NaNs for negative value tests and moves an #include <cmath>
to be blocked behind #ifndef _GLIBCXX_MATH_H.
DXIL Metadata Analysis passes (one for legacy PM and one for new PM)
that collect following DXIL module metadata information in a structure
are added.
1. Shader Model version
2. DXIL version
3. Shader Stage
Information collected using the legacy pass is verified by adding
additional test commands to existing metadata test sources.
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
I had put this in Transforms/Utils, but that doesn't actually make
sense if we want to populate these structures via an analysis pass.
Pull Request: https://github.com/llvm/llvm-project/pull/100621
This is a second attempt to land #84501 which failed on several targets.
This patch adds the HAS_IEE754_FLOAT128 define which makes the check for
typedef'ing float128 more precise by checking whether __uint128_t is
available and checking if the host does not use __ibm128 which is
prevalent on power pc targets and replaces IEEE754 float128s.
This is a second attempt to land #84501 which failed on several targets.
This patch adds the HAS_IEE754_FLOAT128 define which makes the check for
typedef'ing float128 more precise by checking whether __uint128_t is available
and checking if the host does not use __ibm128 which is prevalent on power pc
targets and replaces IEEE754 float128s.
This patch enables constant folding for 128 bit floating-point logf
calls. This is achieved by querying if the host system has the logf128()
symbol available with a CMake test. If so, replace the runtime call with
the compile time value returned from logf128.
This removes the old legacy PM "intervals" analysis pass (aka
IntervalPartition). It also removes the associated Interval and
IntervalIterator help classes.
Reasons for removal:
1) The pass is not used by llvm-project (not even being tested by
any regression tests).
2) Pass has not been ported to new pass manager, which at least
indicates that it isn't used by the middle-end.
3) ASan reports heap-use-after-free on
++I; // After the first one...
even if false is passed to intervals_begin. Not sure if that is
a false positive, but it makes the code a bit less trustworthy.
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
This patch adds in a StructuralHash printer pass that prints out the
hexadeicmal representation of the hash of a module and all of the
functions within it.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D158317
The last use was removed by:
commit 8005332835246c54a4a6b026eede930ed559deb4
Author: Nikita Popov <npopov@redhat.com>
Date: Fri Dec 9 11:57:50 2022 +0100
Differential Revision: https://reviews.llvm.org/D150748
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538
- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
- Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.
Reviewed By: foad, sameerds
Differential Revision: https://reviews.llvm.org/D148116
This patch involves boolean ring to simplify logical operations. We can treat `&` as ring multiplication and `^` as ring addition.
So we need to canonicalize all other operations to `*` `+`. Like:
```
a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b
```
In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.
For now, the initial patch only enabled on and/or/xor, Later we can enhance the code step by step.
Reference: https://en.wikipedia.org/wiki/Boolean_ring
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D142803
This is a model runner for ML researchers using environments like
CompilerGym. In such environments, researchers host the compiler and
want to be able to observe the problem space (features) at each decision
step of some optimization pass, at which point the compiler is stopped,
waiting for the host makes a decision and provide an advice back to
the compiler, which then continues its normal operation, and so on.
The InteractiveModelRunner supports this scenario for the feature set
exposed by the compiler at a given time. It uses 2 files - ideally FIFO
pipes - one to pass data to the host, the other to get advices back from
the host. This means this scenario is supported with no special
dependencies. The file creation and deletion is the responsibility of
the host. Hooking up this model evaluator to a MLGO-ed pass is the
responsibilty of the pass author, and subsequent patches will do so for
the current set of mlgo passes, and offer an API to easily "just opt in"
by default when mlgo-ing a new pass.
The data protocol is that of the training logger: the host sees a training
log doled out observation by observation by reading from one of the
files, and passes back its advice as a serialized tensor (i.e. tensor value
memory dump) via the other file.
There are some differences wrt the log seen during training: the
interactive model doesn't currently include the outcome (because it should be
identical to the decision, and it's also not present in the "release"
mode); and partial rewards aren't currently communicated back.
The assumption - just like with the training logger - is that the host
is co-located, thus avoiding any endianness concerns. In a distributed
environment, it is up to the hosting infrastructure to intermediate
that.
Differential Revision: https://reviews.llvm.org/D142642
Computing EH-related information was only relevant for analysis passes so far. Lifting it to IR will allow the IR Verifier to calculate EH funclet coloring and validate funclet operand bundles in a follow-up step.
Reviewed By: rnk, compnerd
Differential Revision: https://reviews.llvm.org/D138122
The dependency was due to the log format. This change switches to the
previously-introduced (D139370) "dependency-free" logger instead of the
protobuf-based one.
A subsequent change will clean out the unnecessary abstraction left
behind.
This change drops the logger unittest, we have sufficient test coverage
via lit tests, and a unit test would require adding, unnecesarily, a log
reader (the reader is expected to be python, for the ML side, and there
is a reader for that under Analysis/models, used for tests).
Differential Revision: https://reviews.llvm.org/D141720
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:
1. The proposed spec presents a notion of "maximal convergence" that
captures the existing convention of converging threads at the
headers of natual loops.
2. Maximal convergence is then extended to irreducible cycles. The
identity of irreducible cycles is determined by the choices made
in a depth-first traversal of the control flow graph. Uniformity
analysis uses criteria that depend only on closed paths and not
cycles, to determine maximal convergence. This makes it a
conservative analysis that is independent of the effect of DFS on
CycleInfo.
3. The analysis is implemented as a template that can be
instantiated for both LLVM IR and Machine IR.
Validation:
- passes existing tests for divergence analysis
- passes new tests with irreducible control flow
- passes equivalent tests in MIR and GMIR
Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>
With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.
Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.
Differential Revision: https://reviews.llvm.org/D130746
This patch replaces uses of LLVM_HAVE_TF_API with LLVM_HAVE_TFLITE in
a couple of CMakeLists.txt.
Now that 842b0d0fe2dd142305a9461e50cdce9aff7f86bc has landed,
we now have:
LLVM_HAVE_TF_API is defined if and only if LLVM_HAVE_TFLITE
evaluates to true
in the CMake variable world (assuming that you do not set
LLVM_HAVE_TF_API on the cmake invocation).
FWIW, the story is a little different in the C++ macro world, where:
LLVM_HAVE_TF_API is defined if and only if LLVM_HAVE_TFLITE is
defined
This is why edc83a15b45e6b91fce3f35622a6b0a6d34e5211 consisted only of
mechanical replacements.
Differential Revision: https://reviews.llvm.org/D140061