SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write
* a single component (`X`), or
* two components (`XY`),
and fold them into the wider native variants:
```
X + X --> XY
X + X + X + X --> XYZW
XY + XY --> XYZW
X + X + X --> XYZ
XY + X --> XYZ
```
The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:
- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.
- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
When testing LLDB, we want to make sure to use the same Python as the
one we used to build it.
We already did this in https://github.com/llvm/llvm-project/pull/143183
for the Unit and Shell tests. This patch does the same thing for the API
tests as well.
This patch reworks how VG is handled around streaming mode changes.
Previously, for functions with streaming mode changes, we would:
- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming
mode changes
Additionally, for locally streaming functions, we would:
- Also save the streaming VG in the prologue
- Emit `.cfi_offset vg, <incoming VG offset>` in the prologue
- Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg`
around streaming mode changes
In both cases, this ends up doing more than necessary and would be hard
for an unwinder to parse, as using `.cfi_offset` in this way does not
follow the semantics of the underlying DWARF CFI opcodes.
So the new scheme in this patch is to:
In functions with streaming mode changes (inc locally streaming)
- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode
changes)
- Emit `.cfi_restore vg` after the saved VG has been deallocated
- This will be in the function epilogue, where VG is always the same as
the entry VG
- Explicitly reference the incoming VG expressions for SVE callee-saves
in functions with streaming mode changes
- Ensure the CFA is not described in terms of VG in functions with
streaming mode changes
A more in-depth discussion of this scheme is available in:
https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b
But the TLDR is that following this scheme, SME unwinding can be
implemented with minimal changes to existing unwinders. All unwinders
need to do is initialize VG to `CNTD` at the start of unwinding, then
everything else is handled by standard opcodes (which don't need changes
to handle VG).
Fixes#150163
MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.
The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
This patch implements pages 15-17 from
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
Documentation:
jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
Summary:
It's extremely common to conditionally blend two vectors. Previously
this was done with mask registers, which is what the normal ternary code
generation does when used on a vector. However, since Clang 15 we have
supported boolean vector types in the compiler. These are useful in
general for checking the mask registers, but are currently limited
because they do not map to an LLVM-IR select instruction.
This patch simply relaxes these checks, which are technically forbidden
by
the OpenCL standard. However, general vector support should be able to
handle these. We already support this for Arm SVE types, so this should
be make more consistent with the clang vector type.
Fix the `sys.path` logic in the GDB plugin to insert the intended
self-path in the first position rather than appending it to the end. The
latter implied that if `sys.path` (naturally) contained the GDB's
`gdb-plugin` directory, `import ompd` would return the top-level
`ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as
intended by adding the `ompd/` directory to `sys.path`.
This is intended to be a minimal change necessary to fix the issue.
Alternatively, the code could be modified to import `ompd.ompd` and stop
modifying `sys.path` entirely. However, I do not know why this option
was chosen in the first place, so I can't tell if this won't break
something.
Fixes#153954
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.
These builtins are modeled on the clzg/ctzg builtins, which accept an
optional second argument. This second argument is returned if the first
argument is 0. These builtins unconditionally exhibit zero-is-undef
behaviour, regardless of target preference for the other ctz/clz
builtins. The builtins have constexpr support.
Fixes#154113
Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we
don't fallback to the AVX1 variant when we split a 512-bit vector, so we
can only use the 128/256-bit variants if we have AVX512VL.
Fixes#154492
After a485e0e, we may not set the vector trip count in
preparePlanForEpilogueVectorLoop if it is zero. We should not choose a
VF * UF that makes the main vector loop dead (i.e. vector trip count is
zero), but there are some cases where this can happen currently.
In those cases, set EPI.VectorTripCount to zero.
This patch adds the constant attribute to cir.global, the appropriate
lowering to LLVM constant and updates the tests.
---------
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
`std::wstring AnsiToUtf16(const std::string &ansi)` is a
reimplementation of `llvm::sys::windows::UTF8ToUTF16`. This patch
removes `AnsiToUtf16` and its usages entirely.
Both these attributes were introduced in ab1dc2d54db5 ("Thread Safety
Analysis: add support for before/after annotations on mutexes") back in
2015 as "beta" features.
Anecdotally, we've been using `-Wthread-safety-beta` for years without
problems.
Furthermore, this feature requires the user to explicitly use these
attributes in the first place.
After 10 years, let's graduate the feature to the stable feature set,
and reserve `-Wthread-safety-beta` for new upcoming features.
The AArch64 build attribute specification now allows switching to an
already-defined subsection using its name alone, without repeating the
optionality and type parameters.
This patch updates the parser to support that behavior.
Spec reference: https://github.com/ARM-software/abi-aa/pull/230/files
If FunctionAttrs infers additional attributes on a function, it also
invalidates analysis on callers of that function. The way it does this
right now limits this to calls with matching signature. However, the
function attributes will also be used when the signatures do not match.
Use getCalledOperand() to avoid a signature check.
This is not a correctness fix, just improves analysis quality. I noticed
this due to
https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709,
where LICM ends up with a stale MemoryDef that could be a MemoryUse
(which is a bug in LICM, but still non-optimal).
Whether the argument is fixed is now available via ArgFlags, so make use
of it. The previous implementation was quite problematic, because it
stored the number of fixed arguments in a global variable, which is not
thread safe.
The immediate evaluation context needs the lambda scope info to
propagate some flags, however that LSI was removed in
ActOnFinishFunctionBody which happened before rebuilding a lambda
expression.
The last attempt destroyed LSI at the end of the block scope, after
which we still need it in DiagnoseShadowingLambdaDecls.
This also converts the wrapper function to default arguments as a
drive-by fix, as well as does some cleanup.
Fixes https://github.com/llvm/llvm-project/issues/145776
Add trailing newlines to the following files to comply with POSIX
standards:
- llvm/lib/Target/RISCV/RISCVInstrInfoXSpacemiT.td
- llvm/test/MC/RISCV/xsmtvdot-invalid.s
- llvm/test/MC/RISCV/xsmtvdot-valid.s
Closes#151706
This is a cherry pick of #154053 with a fix for bad handling of
endianess when loading float and double litteral from the binary.
---------
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.
stack on: https://github.com/llvm/llvm-project/pull/147173
I was looking over this pass and noticed it was using shared pointers
for CompositeNodes. However, all nodes are owned by the deinterleaving
graph and are not released until the graph is destroyed. This means a
bump allocator and raw pointers can be used, which have a simpler
ownership model and less overhead than shared pointers.
The changes in this PR are to:
- Add a `SpecificBumpPtrAllocator<CompositeNode>` to the
`ComplexDeinterleavingGraph`
- This allocates new nodes and will deallocate them when the graph is
destroyed
- Replace `NodePtr` and `RawNodePtr` with `CompositeNode *`
This updates the fp<->int tests to include some store(fptoi) and itofp(load)
test cases. It also cuts down on the number of large vector cases which are not
testing anything new.