This sets the cache line size to 64 for the Neoverse V2 and V3. I've
tested this with loop-interchange: it doesn't result in extra
compile-times, but it does enable a lot more interchange.
Commit a6293228fdd5aba8c04c63f02f3d017443feb3f2 forced the register
class of ZPR[24]StridedOrContiguous for spills/fills of ZPR2 and ZPR4,
but this may result in issues when the regclass for the fill is a
ZPR2/ZPR4 which would allow the register allocator to pick `z1_z2`,
which is not a supported register for ZPR2StridedOrContiguous that only
supports tuples of the form (strided) `z0_z8`, `z1_z9` or (contiguous,
start at multiple of 2) `z0_z1`, `z2_z3`. For spills we could add a new
register class that supports any of the tuple forms, but I've decided
to use two pseudos similar to the fills for consistency.
Fixes https://github.com/llvm/llvm-project/issues/148655
This analysis currently just crashes when applied to a graph region that
has a use-def cycle. This PR fixes that by keeping track of the
operations the DFS has already visited when following use-def edges and
stopping once we visit an operation again.
This patch introduces two new ops to the SPIR-V dialect:
- `spirv.EXT.ConstantCompositeReplicate`
- `spirv.EXT.SpecConstantCompositeReplicate`
These ops represent composite constants and specialization constants,
respectively, constructed by replicating a single splat constant across
all elements. They correspond to `SPV_EXT_replicated_composites`
extension instructions:
- `OpConstantCompositeReplicatedEXT`
- `OpSpecConstantCompositeReplicatedEXT`
No transformation to these new ops has been introduced in this patch.
This approach is chosen as per the discussions on RFC
https://discourse.llvm.org/t/rfc-basic-support-for-spv-ext-replicated-composites-in-mlir-spir-v-compile-time-constant-lowering-only/86987
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
Before this patch, when a reduction exists in the loop, the legality
check of LoopInterchange only verified if there exists a
non-reassociative floating-point instruction in the reduction
calculation. However, it is insufficient, because reordering integer
reductions can also lead to incorrect transformations. Consider the
following example:
```c
int A[2][2] = {
{ INT_MAX, INT_MAX },
{ INT_MIN, INT_MIN },
};
int sum = 0;
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
sum += A[j][i];
```
To make this exchange legal, we must drop nuw/nsw flags from the
instructions involved in the reduction operations.
This patch extends the legality check to correctly handle such cases. In
particular, for integer addition and multiplication, it verifies that
the nsw and nuw flags are set on involved instructions, and drop them
when the transformation actually performed. This patch also introduces
explicit checks for the kind of reduction and permits only those that
are known to be safe for interchange. Consequently, some "unknown"
reductions (at the moment, `FindFirst*` and `FindLast*`) are rejected.
Fix#148228
When constructing the protocol list in the class metadata generation
(`GenerateClass`), only the protocols from the base class are added but
not protocols declared in class extensions.
This is fixed by using `all_referenced_protocol_{begin, end}` instead of
`protocol_{begin, end}`, matching the behaviour on Apple platforms.
A unit test is included to check if all protocol metadata was emitted
and that no duplication occurs in the protocol list.
Fixes https://github.com/gnustep/libobjc2/issues/339
CC: @davidchisnall
These objects are used as local stack variables during parsing, and they
are not small. This patch reduces their sizes:
* `ParsedAttributesView`: 72 → 40 bytes
* `AttributePool`: 72 → 40 bytes
No negative performance impact has been
[observed](https://llvm-compile-time-tracker.com/compare.php?from=a709621cd545b061782b03136286227867b452a6&to=f50500b3c178e97c0c861301e853e6d5b859040b&stat=instructions:u).
**Context:**
We have some verilator-generated code with extremely deep nesting of
parenthesized expressions, e.g.:
```cpp
bool s =
(...(bool)(i[0])
|(bool)(i[1]))
|(bool)(i[2]))
| ...
|(bool)(i[n]));
```
Before this patch, on my local machine, Clang begins emitting
`-Wstack-exhausted` when `n` is 715. After the patch, that threshold
increases to `950`.
As discussed in PR #142353, the current testsuite of the `clang` Python
bindings has several issues:
- It `libclang.so` cannot be loaded into `python` to run the testsuite,
the whole `ninja check-all` aborts.
- The result of running the testsuite isn't report like the `lit`-based
tests, rendering them almost invisible.
- The testsuite is disabled in a non-obvious way (`RUN_PYTHON_TESTS`) in
`tests/CMakeLists.txt`, which again doesn't show up in the test results.
All these issues can be avoided by integrating the Python bindings tests
with `lit`, which is what this patch does:
- The actual test lives in `clang/test/bindings/python/bindings.sh` and
is run by `lit`.
- The current `clang/bindings/python/tests` directory (minus the
now-subperfluous `CMakeLists.txt`) is moved into the same directory.
- The check if `libclang` is loadable (originally from PR #142353) is
now handled via a new `lit` feature, `libclang-loadable`.
- The various ways to disable the tests have been turned into `XFAIL`s
as appropriate. This isn't complete and not completely tested yet.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
Co-authored-by: Rainer Orth <ro@gcc.gnu.org>
It assumed that the VF remains constant throughout the tree. That's not
always true. This meant that we could query the extraction cost for a
lane that is out of bounds.
While experimenting with re-vectorisation for AArch64, we ran into this
issue. We cannot add a proper AArch64 test as more changes would need to
be brought in.
This commit is only fixing the computation of VF and adding an assert.
Some tests were failing after adding the assert:
- foo() in llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll
- test() in
llvm/test/Transforms/SLPVectorizer/X86/reduction-with-removed-extracts.ll
- test_with_extract() in
llvm/test/Transforms/SLPVectorizer/RISCV/segmented-loads.ll
As a legacy of Dexter's role as a test runner, it selects a name for
result files based on the relative path from the test root to each
individual test. Since Dexter no longer takes a test directory as an
argument, only the basename for each test is ever used. This patch adds
an optional --test-root-dir argument, allowing for relative paths to be
used for result files again.
Reverts llvm/llvm-project#148266
I'm reverting this temporarily, since the release branch is today and
this is ABI sensitive. Let's wait until after the branch so that we have
plenty time to discuss the patch.
calculateByteProvider only cares about scalars or a single element
within a vector. For the later there is the VectorIndex parameter to
identify the element. All other properties, and specificially Index, are
related to the underyling scalar type and thus when taking the size of a
type it's the scalar size that matters.
Fixes https://github.com/llvm/llvm-project/issues/148387
Stop using hardcoded function named and check availability. This only fixes
the forced usage via command line in the pass itself; the implementations
inside of TargetLoweringBase hide additional call emission.
After changes in https://github.com/llvm/llvm-project/pull/145793.
/home/david.spickett/llvm-project/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp:1360:49: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned int')
1360 | status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls());
| ^~~~
/home/david.spickett/llvm-project/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h:135:64: note: passing argument to parameter 'Size' here
135 | virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
| ^
1 error generated.
The type used in the LLVM method we call is uin64_t so use that instead.
It's overkill for what it is, but that's a separate issue if anyone cares.
Also removed the unused form of GetMCInst.
This is a behavior change; previously SPIRV inherited
a default set of calls which seems like a mistake. This
defines a library set with no calls. Add memcpy and memset
as a hack; this avoids PreISelIntrinsicLowering performing
the default expansion. SPIRVPrepareFunctions also calls
the utilities to expand these but the resulting output
is slightly different. The backend specific version
can probably be removed, it for some reason has a larger
output than the default one.
GCC 15 also supports `__buitin_operator_{new,delete}` now, so the
`#else` cases are dead code. This patch inlines the calls to the wrapper
functions and simplifies some surrounding code.
SjLjEHPrepare and WasmEHPrepare directly emit calls to these by
name, and these are not tracked in RuntimeLibcalls. It will be easier
to fix this when RuntimeLibcalls is turned into an analysis, so just
add the entries for now.
The compiler should not introduce calls to arbitrary strings
that aren't defined in RuntimeLibcalls. Previously OpenBSD was
disabling the default __stack_chk_fail, but there was no record
of the alternative __stack_smash_handler function it emits instead.
This also avoids a random triple check in the pass.
Reduction support: https://github.com/llvm/llvm-project/pull/146671
If Support is fixed in this PR
The problem for the IF clause in composite constructs was that wsloop
and simd both operate on the same CanonicalLoopInfo structure: with the
SIMD processed first, followed by the wsloop. Previously the IF clause
generated code like
```
if (cond) {
while (...) {
simd_loop_body;
}
} else {
while (...) {
nonsimd_loop_body;
}
}
```
The problem with this is that this invalidates the CanonicalLoopInfo
structure to be processed by the wsloop later. To avoid this, in this
patch I preserve the original loop, moving the IF clause inside of the
loop:
```
while (...) {
if (cond) {
simd_loop_body;
} else {
non_simd_loop_body;
}
}
```
On simple examples I tried LLVM was able to hoist the if condition
outside of the loop at -O3.
The disadvantage of this is that we cannot add the
llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD
loops because they both share a loop back edge. There's no way of
solving this without keeping the old design of having two different
loops: which cannot be represented using only one CanonicalLoopInfo
structure. I don't think the presence or absence of this attribute makes
much difference. In my testing it is the llvm.loop.parallel_access
metadata which makes the difference to vectorization. LLVM will
vectorize if legal whether or not this attribute is there in the TRUE
branch. In the FALSE branch this means the loop might be vectorized even
when the condition is false: but I think this is still standards
compliant: OpenMP 6.0 says that when the if clause is false that should
be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined
as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are
silently ignored when SIMD IF is used.
I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN
and some vectorization encouragement metadata when combined with IF than
to ignore IF because IF could have correctness consequences whereas the
rest are optimiztion hints. For example, the user might use the IF
clause to disable SIMD programatically when it is known not safe to
vectorize the loop. In this case it is not at all safe to add the
parallel access or SAFELEN metadata.
The code here probably needs to change to handle types more uniformly,
but this patch prevents it from trying to use a simple type where it does
not exist.
Fixes#148438.
Assertion semantics closely mimic C++26 Contracts evaluation semantics.
This brings our implementation closer in line with C++26 Library
Hardening (one particular benefit is that using the `observe` semantic
makes adopting hardening easier for projects).
Similar to how clang-cl driver does it, make it possible to build arm64x
binaries with a mingw-style invocation.
Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
In getScaledReductions for the case where we try to match a partial
reduction of the form:
%phi = phi i32 ...
...
%add = add i32 %phi, %zext
where
%zext = i8 %some_val to i32
we should ensure that %zext is actually inside the loop.
Fixes https://github.com/llvm/llvm-project/issues/148260
In c87d198cd964f37343083848f8fdd58bb0b00156, the `__jit_debug_*` symbols
gained explicit dllexport annotations. Unfortunately, mingw's linkers
have a quirk where the presence of any dllexport symbols at all will
switch off the `-export-all-symbols` flag, so without a full conversion
to dllexport annotations (#109483), the mingw LLVM dll build is broken
in LLVM 20+ when building with GCC (when building with clang,
LLVM_ALWAYS_EXPORT expands to the default visibility attribute,
see extended discussion in #148772).
Fix this by adding the flag explicitly as was done for
clang-shlib earlier in https://reviews.llvm.org/D151620.
Refactor the safe pattern analysis of pointer and size expression pairs
so that the check can be re-used in more places. For example, it can be
used to check whether the following cases are safe:
- `std::span<T>{ptr, size} // span construction`
- `snprintf(ptr, size, "%s", ...) // unsafe libc call`
- `printf("%.*s", size, ptr) // unsafe libc call`