Hexagon instructions are VLIW "bundles" of up to four instruction words
encoded as a single MCInst with operands for each sub-instruction.
Previously, the disassembler's getInstruction() returned the full
bundle, which made it difficult to work with llvm-objdump.
For example, since all instructions are bundles, and bundles do not
branch, branch targets could not be printed.
This patch modifies the Hexagon disassembler to return individual
sub-instructions instead of entire bundles, enabling correct printing of
branch targets and relocations. It also introduces
`MCDisassembler::getInstructionBundle` for cases where the full bundle
is still needed.
By default, llvm-objdump separates instructions with newlines. However,
this does not work well for Hexagon syntax:
{ inst1
inst2
inst3
inst4 <branch> } :endloop0
Instructions may be followed by a closing brace, a closing brace with
`:endloop`, or a newline. Branches must appear within the braces.
To address this, `PrettyPrinter::getInstructionSeparator()` is added and
overridden for Hexagon.
(cherry picked from commit ac7ceb3dabfac548caa993e7b77bbadc78af4464)
cc https://github.com/llvm/llvm-project/pull/138299
rustc sets `allockind("uninitialized")` - if we copy the attributes
as-is when creating a dummy function, Verify complains about
`allockind("uninitialized,zeroed")` conflicting, so we need to clear the
flag.
Co-authored-by: Jamie Hill-Daniel <jamie@osec.io>
(cherry picked from commit 74c396afb26dec74c0b799e218c63f1a26e90d21)
When not in the prologue we do not want to set the FrameSetup flag, by
passing the flag as argument we can use allocateStack correctly on those
cases.
This fixes the allocation and probe in eliminateCallFramePseudoInstr.
(cherry picked from commit 1db9eb23209826d9e799e68a9a4090f0328bf70c)
Reverts llvm/llvm-project#148613
Considering object argument conversion qualifications perfect leads to
situations where we prefer a non-template const qualified function over
a non-qualified template function, which is very wrong indeed.
I explored solutions to work around that, but instead, we might want to
go the GCC road and prefer the friend overload in the #147374 example,
as this seems a lot more consistent and reliable
(cherry picked from commit 28e1e7e1b4b059a2e42f68061475cddb4ad0a6a3)
This is a follow-up to #149079. Seems like we forgot about the fact that
the symbols also need to be in `libclang.map`.
(cherry picked from commit 7e0fde0c2f6b0b9d727ce9196956b36e91961ac4)
Fixes: https://github.com/llvm/llvm-project/issues/148953
Currently when coroutine return object type is const qualified, we don't
do direct emission. The regular emission logic assumed that the auto var
emission will always result in an `AllocaInst`. However, based on my
findings, NRVO var emissions don't result in `AllocaInst`s. Therefore,
this
[assertion](1a940bfff9/clang/lib/CodeGen/CGCoroutine.cpp (L712))
will fail.
Since the NRVOed returned object don't live on the coroutine frame, we
won't have the problem of it outliving the coroutine frame, therefore,
we can safely omit this metadata.
(cherry picked from commit c36156de45a0f5e24e7a4ee2259c3302ea814785)
A constexpr-unknown reference can be equal to an arbitrary value, except
values allocated during constant evaluation. Fix the handling.
The standard is unclear exactly which pointer comparisons count as
"unknown" in this context; for example, in some cases we could use
alignment to prove two constexpr-unknown references are not equal. I
decided to ignore all the cases involving variables not allocated during
constant evaluation.
While looking at this, I also spotted that there might be issues with
lifetimes, but I didn't try to address it.
(cherry picked from commit 20c8e3c2a4744524396cc473a370cfb7855850b6)
This was dropped in #147948 and causes symbol conflicts if libblake3 is
also linked.
(cherry picked from commit 1754a7d5733d5305e4ec25ef0945b39d6882bb28)
There's probably an existing test this should be added to,
but our test coverage is really bad that this wasn't caught
by one.
(cherry picked from commit 0110168f6aa5c8a8d02ffd9e62c7929ce6d24d26)
This patch switches the windows testing over to server 2022 by switching
to the recently introduced runner set.
(cherry picked from commit 3248a6d76abccbbe78e853c76bc022b70d594347)
For more context, see
https://github.com/llvm/llvm-project/pull/119269#issuecomment-3075444493,
but briefly, when removing ARCMigrate, I also removed some symbols in
libclang, which constitutes an ABI break that we don’t want, so this pr
reintroduces the removed symbols; the declarations are marked as
deprecated for future removal, and the implementations print an error
and do nothing, which is what we used to do when ARCMigrate was
disabled.
(cherry picked from commit 1600450f9098e5c9cb26840bd53f1be8a2559b7d)
Added by #147948, blake3_xof_many and blake3_compress_subtree_wide cause
conflicts when linking llvm and blake3 statically into the same binary.
Similar to #148607.
(cherry picked from commit 60579ec3059b2b6cc9dad90eaac1ed363fc395a7)
Happened to spot this while looking at libclang.map for other reasons.
clang_visitCXXMethods was added in LLVM 21, not LLVM 20.
(cherry picked from commit 116110e1a93531a64d82f049b6e36403bc14f278)
The transformation done in #147349 was incorrect since we were not
passing the input node of the `OR` instruction to the `QC.INSBI`
instruction leading to the generated instruction doing the wrong thing.
In order to do this we first needed to add the output register to
`QC.INSBI` as being both an input and output.
The code produced after the above fix will need a copy (mv) to preserve
the register input to the OR instruction if it has more than one use
making the transformation net neutral ( `6-byte QC.E.ORI/ORAI` vs
`2-byte C.MV + 4-byte QC.INSB`I). Avoid doing the transformation if
there is more than one use of the input register to the OR instruction.
(cherry picked from commit d67d91a9906366585162cebf292f923a3f28c8a6)
To fold a FrameIndex, we need to teach eliminateFrameIndex to respect
the uimm9 range.
(cherry picked from commit 63d099af146a19bc8fd5a791d6184125e6cc42e7)
This sets the cache line size to 64 for the Neoverse V2 and V3. I've
tested this with loop-interchange: it doesn't result in extra
compile-times, but it does enable a lot more interchange.
Commit a6293228fdd5aba8c04c63f02f3d017443feb3f2 forced the register
class of ZPR[24]StridedOrContiguous for spills/fills of ZPR2 and ZPR4,
but this may result in issues when the regclass for the fill is a
ZPR2/ZPR4 which would allow the register allocator to pick `z1_z2`,
which is not a supported register for ZPR2StridedOrContiguous that only
supports tuples of the form (strided) `z0_z8`, `z1_z9` or (contiguous,
start at multiple of 2) `z0_z1`, `z2_z3`. For spills we could add a new
register class that supports any of the tuple forms, but I've decided
to use two pseudos similar to the fills for consistency.
Fixes https://github.com/llvm/llvm-project/issues/148655
This analysis currently just crashes when applied to a graph region that
has a use-def cycle. This PR fixes that by keeping track of the
operations the DFS has already visited when following use-def edges and
stopping once we visit an operation again.
This patch introduces two new ops to the SPIR-V dialect:
- `spirv.EXT.ConstantCompositeReplicate`
- `spirv.EXT.SpecConstantCompositeReplicate`
These ops represent composite constants and specialization constants,
respectively, constructed by replicating a single splat constant across
all elements. They correspond to `SPV_EXT_replicated_composites`
extension instructions:
- `OpConstantCompositeReplicatedEXT`
- `OpSpecConstantCompositeReplicatedEXT`
No transformation to these new ops has been introduced in this patch.
This approach is chosen as per the discussions on RFC
https://discourse.llvm.org/t/rfc-basic-support-for-spv-ext-replicated-composites-in-mlir-spir-v-compile-time-constant-lowering-only/86987
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
Before this patch, when a reduction exists in the loop, the legality
check of LoopInterchange only verified if there exists a
non-reassociative floating-point instruction in the reduction
calculation. However, it is insufficient, because reordering integer
reductions can also lead to incorrect transformations. Consider the
following example:
```c
int A[2][2] = {
{ INT_MAX, INT_MAX },
{ INT_MIN, INT_MIN },
};
int sum = 0;
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
sum += A[j][i];
```
To make this exchange legal, we must drop nuw/nsw flags from the
instructions involved in the reduction operations.
This patch extends the legality check to correctly handle such cases. In
particular, for integer addition and multiplication, it verifies that
the nsw and nuw flags are set on involved instructions, and drop them
when the transformation actually performed. This patch also introduces
explicit checks for the kind of reduction and permits only those that
are known to be safe for interchange. Consequently, some "unknown"
reductions (at the moment, `FindFirst*` and `FindLast*`) are rejected.
Fix#148228
When constructing the protocol list in the class metadata generation
(`GenerateClass`), only the protocols from the base class are added but
not protocols declared in class extensions.
This is fixed by using `all_referenced_protocol_{begin, end}` instead of
`protocol_{begin, end}`, matching the behaviour on Apple platforms.
A unit test is included to check if all protocol metadata was emitted
and that no duplication occurs in the protocol list.
Fixes https://github.com/gnustep/libobjc2/issues/339
CC: @davidchisnall
These objects are used as local stack variables during parsing, and they
are not small. This patch reduces their sizes:
* `ParsedAttributesView`: 72 → 40 bytes
* `AttributePool`: 72 → 40 bytes
No negative performance impact has been
[observed](https://llvm-compile-time-tracker.com/compare.php?from=a709621cd545b061782b03136286227867b452a6&to=f50500b3c178e97c0c861301e853e6d5b859040b&stat=instructions:u).
**Context:**
We have some verilator-generated code with extremely deep nesting of
parenthesized expressions, e.g.:
```cpp
bool s =
(...(bool)(i[0])
|(bool)(i[1]))
|(bool)(i[2]))
| ...
|(bool)(i[n]));
```
Before this patch, on my local machine, Clang begins emitting
`-Wstack-exhausted` when `n` is 715. After the patch, that threshold
increases to `950`.
As discussed in PR #142353, the current testsuite of the `clang` Python
bindings has several issues:
- It `libclang.so` cannot be loaded into `python` to run the testsuite,
the whole `ninja check-all` aborts.
- The result of running the testsuite isn't report like the `lit`-based
tests, rendering them almost invisible.
- The testsuite is disabled in a non-obvious way (`RUN_PYTHON_TESTS`) in
`tests/CMakeLists.txt`, which again doesn't show up in the test results.
All these issues can be avoided by integrating the Python bindings tests
with `lit`, which is what this patch does:
- The actual test lives in `clang/test/bindings/python/bindings.sh` and
is run by `lit`.
- The current `clang/bindings/python/tests` directory (minus the
now-subperfluous `CMakeLists.txt`) is moved into the same directory.
- The check if `libclang` is loadable (originally from PR #142353) is
now handled via a new `lit` feature, `libclang-loadable`.
- The various ways to disable the tests have been turned into `XFAIL`s
as appropriate. This isn't complete and not completely tested yet.
Tested on `sparc-sun-solaris2.11`, `sparcv9-sun-solaris2.11`,
`i386-pc-solaris2.11`, `amd64-pc-solaris2.11`, `i686-pc-linux-gnu`, and
`x86_64-pc-linux-gnu`.
Co-authored-by: Rainer Orth <ro@gcc.gnu.org>