ActionCache is used to store a mapping from CASID to CASID. The current
implementation of the ActionCache can only be used to associate the
key/value from the same hash context.
ActionCache has two operations: `put` to store the key/value and `get`
to
lookup the key/value mapping. ActionCache uses the same TrieRawHashMap
data structure to store the mapping, where is CASID of the key is the
hash to index the map.
While CASIDs for key/value are often associcate with actual CAS
ObjectStore, it doesn't provide the guarantee of the existence of such
object in any ObjectStore.
This change introduces a new kernel attribute that allows thread blocks to be mapped to clusters.
In addition, it also adds support of `+ptx90` PTX ISA support.
For each function with the AMDGPU_CS_Chain calling convention, with
dynamic VGPRs enabled, add a _dvgpr$ symbol, with the value of the
function symbol, plus an offset encoding one less than the number of
VGPR blocks used by the function (16 VGPRs per block, no more than 128)
in bits 5..3 of the symbol value. This is used by a front-end to have
functions that are chained rather than called, and a dispatcher that
dynamically resizes the VGPR count before dispatching to a function.
The script copies `ReleaseNotesTemplate.txt` to corresponding
`ReleaseNotes.rst`/`.md` to clear release notes.
The suffix of `ReleaseNotesTemplate.txt` must be `.txt`. If it is
`.rst`/`.md`, it will be treated as a documentation source file when
building documentation.
This does a few things:
* LLVM_CONFIG_PATH is deprecated, use LLVM_CMAKE_DIR instead.
* Don't use $ before command examples. I would normally, but the key
cmake commands didn't use it so I removed it from all commands.
* Makes the commands shown full commands, so you don't have to piece
them together.
* Uses shell variables to cut down on repetition and make this easier to
port to other targets.
* Adds a few options to disable more compiler-rt things.
* Use the built in cmake options for sysroot and toolchains.
* Include test options in the first cmake command, so you don't have to
re-do the whole thing after you read the testing section.
* Removes the section about using BaremetalARM.cmake.
The closest I got to getting that cache to work was:
```
SYSROOT=/home/david.spickett/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi/arm-none-eabi/libc
LLVM_TOOLCHAIN=/home/david.spickett/LLVM-20.1.8-Linux-X64/
cmake \
-G Ninja \
-DCMAKE_C_COMPILER=${LLVM_TOOLCHAIN}/bin/clang \
-DBAREMETAL_ARMV6M_SYSROOT=${SYSROOT} \
-DBAREMETAL_ARMV7M_SYSROOT=${SYSROOT} \
-DBAREMETAL_ARMV7EM_SYSROOT=${SYSROOT} \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="compiler-rt" \
-C ../llvm-project/clang/cmake/caches/BaremetalARM.cmake \
-DCOMPILER_RT_BUILD_BUILTINS=ON \
-DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
-DCOMPILER_RT_BUILD_MEMPROF=OFF \
-DCOMPILER_RT_BUILD_PROFILE=OFF \
-DCOMPILER_RT_BUILD_CTX_PROFILE=OFF \
-DCOMPILER_RT_BUILD_SANITIZERS=OFF \
-DCOMPILER_RT_BUILD_XRAY=OFF \
-DCOMPILER_RT_BUILD_ORC=OFF \
-DCOMPILER_RT_BUILD_CRT=OFF \
../llvm-project/runtimes
```
All this does is build the x86 builtins. I tried forcing the issue with:
```
-DBUILTIN_SUPPORTED_ARCH="armv7m;armv6m;armv7em" \
```
But again, just x86.
It's probably something deep in compiler-rt failing a compiler check for
the Arm targets. Even if that's the case, fixing that means adding more
options to the cmake command.
I can't find evidence of a full command using this cache file since the
commit that introduced it and that command no longer works.
I think if you ever got this to work again the command would be as long
and complex as the ones already shown in the document.
I would also argue that some of the other caches, for example Fuschia's,
are much better example of multi-target runtimes builds. If what's in
this document isn't enough, folks should be learning from those files
and about the runtimes build overall before attempting anything complex
(though it does not take much to be "complex").
Add llvm::cas::ObjectStore abstraction and InMemoryCAS as a in-memory
CAS object store implementation.
The ObjectStore models its objects as:
* Content: An array of bytes for the data to be stored.
* Refs: An array of references to other objects in the ObjectStore.
And each CAS Object can be idenfied with an unqine ID/Hash.
ObjectStore supports following general action:
* Expected<ID> store(Content, ArrayRef<Ref>)
* Expected<Ref> get(ID)
It also introduces following types to interact with a CAS ObjectStore:
* CASID: Hash representation for an CAS Objects with its context to help
print/compare CASIDs.
* ObjectRef: A light-weight ref for an object in the ObjectStore. It is
implementation defined so it can be optimized for
read/store/references depending on the implementation.
* ObjectProxy: A proxy for the users of CAS to interact with the data
inside CAS Object. It bundles a ObjectHandle and an ObjectStore
instance.
Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.
The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.
I mostly want this for writing allocation tests.
Added initial check for potential fmad conversion in reductions and
operands vectorization.
Added the check for instruction to fix#152683
Skipped the code for reduction to avoid regressions.
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
This patch takes care of the highly mechanical part of proofreading
SourceLevelDebugging.rst, namely:
- hyphenating "32 bit value" and similar and
- hypenating "Objective C"
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This implements very basic support for RISC-V mapping symbols in
llvm-objdump, sharing the implementation with how Arm/AArch64/CSKY
implement this feature.
This only supports the `$x` (instruction) and `$d` (data) mapping
symbols for RISC-V, and not the version of `$x` which includes an
architecture string suffix.
[NVPTX] Add Prefetch tensormap intrinsics
This PR adds prefetch intrinsics with the relevant tensormap_space.
* Lit tests are added as part of prefetch.ll
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst.
For more information, refer to the PTX ISA for prefetch intrinsic :
[Prefetch
Tensormap](https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu)
@durga4github @schwarzschild-radius
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
Refactor llvm-ir2vec to use subcommands instead of a mode flag for better CLI usability.
- Converted the `--mode` flag to three distinct subcommands: `triplets`, `entities`, and `embeddings`
- Updated documentation, tests, and python script
This patch hyphenates words that are used as adjecives, such as:
- architecture specific
- human readable
- implementation defined
- language independent
- language specific
- machine readable
- machine specific
- target independent
- target specific
The option is `-spirv-ext` not `-spirv-extensions`. Also move the
examples after the description of the option, instead of after the list
of extensions, where its easy to miss when skimming.
---------
Co-authored-by: Nathan Gauër <github@keenuts.net>
See the related issue. We want to set up a build bot where `opt` runs with `-enable-profcheck`, which inserts `MD_prof` before running the rest of the pipeline requested from `opt`, and then validates resulting profile information (more info in the RFC linked by the aforementioned issue)
In that setup, we will also ignore `FileCheck`: while the profile info inserted is, currently, equivalent to the profile info a pass would observe via `BranchProbabilityInfo`/`BlockFrequencyInfo`, (1) we may want to change that, and (2) some tests are quite sensitive to the output IR, and break if, for instance, extra metadata is present (which it would be due to `-enable-profcheck`). Since we're just interested in profile consistency on the upcoming bot, ignoring `FileCheck` is simpler and sufficient. However, this has the effect of passing XFAIL tests. Rather than listing them all, the alternative is to just exclude XFAIL tests.
This PR adds support for that by introducing a `--exclude-xfail` option to `llvm-lit`.
Issue #147390
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
Someone asked about this on Discord and it was a bit hard to follow. I
found them a config that worked, but the doc was not as much help as it
should have been.
It probably needs some updates for the runtime build era, but for now,
I'm just making it easier to read. I know the basic build can work at
least.
Some aspects of it may be literally wrong now, but I'll check that
later.
* Remove contractions.
* Remove references to the old separate llvm repo layout.
* Remove mentions of cmake versions older than what llvm requires now.
* Make a bunch of things plain text.
* Make a bunch of things code blocks so they are easier to copy and
paste from.
Added a Python utility script for generating IR2Vec triplets and updated documentation to reference it.
The script generates triplets in a form suitable for training the vocabulary.
(Tracking issues - #141817, #141834; closes - #141834)