Run for clang-tidy checks available in release/19.x branch.
Some notable findings:
- altera-id-dependent-backward-branch, stays slow with 13%.
- misc-const-correctness become faster, going from 261% to 67%, but
still above
8% threshold.
- misc-header-include-cycle is a new SLOW check with 10% runtime
implications
- readability-container-size-empty went from 16% to 13%, still SLOW.
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
Due to a reviewer request on PR #88385 I have created this patch
to add a getPredicatedExitCount function, which is similar to
getExitCount except that it uses the predicated backedge taken
information. With PR #88385 we will start to care about more
loops with multiple exits, and want the ability to query exit
counts for a particular exiting block. Such loops may require
predicates in order to be vectorised.
New tests added here:
Analysis/ScalarEvolution/predicated-exit-count.ll
because that doesn't work (results in `LINK : error LNK2001: unresolved
external symbol malloc`).
Based on the title of #91862 it was only intended for use in 64-bit
builds.
Previously, we were returning an error if we couldn't read the whole
region. This doesn't matter most of the time, because lldb caches memory
reads, and in that process it aligns them to cache line boundaries. As
(LLDB) cache lines are smaller than pages, the reads are unlikely to
cross page boundaries.
Nonetheless, this can cause a problem for large reads (which bypass the
cache), where we're unable to read anything even if just a single byte
of the memory is unreadable. This patch fixes the lldb-server to do
that, and also changes the linux implementation, to reuse any partial
results it got from the process_vm_readv call (to avoid having to
re-read everything again using ptrace, only to find that it stopped at
the same place).
This matches debugserver behavior. It is also consistent with the gdb
remote protocol documentation, but -- notably -- not with actual
gdbserver behavior (which returns errors instead of partial results). We
filed a
[clarification
bug](https://sourceware.org/bugzilla/show_bug.cgi?id=24751) several
years ago. Though we did not really reach a conclusion there, I think
this is the most logical behavior.
The associated test does not currently pass on windows, because the
windows memory read APIs don't support partial reads (I have a WIP patch
to work around that).
The LoopIdiomVectorize pass already creates calls to the intrinsic
experimental_cttz_elts, but PR #88385 will start calling this more
too so I've created a helper for it.
We weren't taking account of the space we require in the stubs for
things that are dllimported, and as a result we could hit the assertion
failure for running out of stub space. Fix that.
Also add a couple of `override` specifiers that were missing last time
(#102586).
rdar://133473673
When serialising to textual IR, there can be constant Values referred to
by DbgRecords that don't appear anywhere else, and have types hidden
even deeper in side them. Enumerate these when enumerating all types.
Test by Mikael Holmén.
This reverts commit fa93be4, restoring
commit d884b77, with fixes that ensure the CAPI declarations are
exported properly.
This commit implements LLVM_DIRecursiveTypeAttrInterface for the
DISubprogramAttr to ensure cyclic subprograms can be imported properly.
In the process multiple shortcuts around the recently introduced
DIImportedEntityAttr can be removed.
This renames:
- `arm_sme.move_tile_slice_to_vector` to `arm_sme.extract_tile_slice`
- `arm_sme.move_vector_to_tile_slice` to `arm_sme.insert_tile_slice`
The new names are more consistent with the rest of MLIR and should be
easier to understand. The current names (to me personally) are hard to
parse and easy to mix up when skimming through code.
Additionally, the syntax for `insert_tile_slice` has changed from:
```mlir
%4 = arm_sme.insert_tile_slice %0, %1, %2
: vector<[16]xi8> into vector<[16]x[16]xi8>
```
To:
```mlir
%4 = arm_sme.insert_tile_slice %0, %1[%2]
: vector<[16]xi8> into vector<[16]x[16]xi8>
```
This is for consistency with `extract_tile_slice`, but also helps with
readability as it makes it clear which operand is the index.
When we decompose the GEP offset expression, and the arithmetic is not
performed using nuw operations, we cannot retain the nuw flag on the
decomposed GEP.
For example, if we have `gep nuw p, (a-1)`, this is not at all the same
as `gep nuw (gep nuw p, a), -1`.
Fix this by tracking NUW through linear expression decomposition,
similarly to what we already do for the NSW flag.
This fixes the miscompilation reported in
https://github.com/llvm/llvm-project/pull/105496#issuecomment-2315322220.
f80 is only a thing on x86, and even then the size of long double can be
changed with compiler flags. Instead set the size according to the host
system (this is what is already done for integer types).
We weren't taking account of the space we require in the stubs for
things that are dllimported, and as a result we could hit the assertion
failure for running out of stub space. Fix that.
rdar://133473673
---------
Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org>
Co-authored-by: Lang Hames <lhames@gmail.com>
Co-authored-by: Ben Barham <b.n.barham@gmail.com>
This test case was failing to compile with a "ran out of registers
during register allocation" error at -O0. This was because CMP_SWAP_64
has 3 operands which must be an even-odd register pair, and two other
GPR operands. All of the def operands are also early-clobber, so
registers can't be shared between uses and defs. Because the function
has an over-aligned alloca it needs frame and base pointers, so r6 and
r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the
only valid register pairs, and if the two individual GPR operands happen
to get allocated to registers in different pairs then only 2 pairs will
be available for the three GPRPair operands.
To fix this, I've merged the two GPR operands into a single GPRPair
operand. This means that the instruction now has 4 GPRPair operands,
which can always be allocated without relying on luck. This does
constrain register allocation a bit more, but this pseudo instruction is
only used at -O0, so I don't think that's a problem.
If the uint64_t constructor is used, assert that the value is actually a
signed or unsigned N-bit integer depending on whether the isSigned flag
is set. Provide an implicitTrunc flag to restore the previous behavior,
where the argument is silently truncated instead.
In this commit, implicitTrunc is enabled by default, which means that
the new assertions are disabled and no actual change in behavior occurs.
The plan is to flip the default once all places violating the assertion
have been fixed. See #80309 for the scope of the necessary changes.
The primary motivation for this change is to avoid incorrectly specified
isSigned flags. A recurring problem we have is that people write
something like `APInt(BW, -1)` and this works perfectly fine -- until
the code path is hit with `BW > 64`. Most of our i128 specific
miscompilations are caused by variants of this issue.
The cost of the change is that we have to specify the correct isSigned
flag (and make sure there are no excess bits) for uses where BW is
always <= 64 as well.
It may be profitable to revert SCCP propagation of C++ static values,
if such constants are pointers, in order to avoid redundant pointer
computation, since the method returning the constant is non-removable.
These would implicitly cast the register to `unsigned`. Switch most of
them to use printReg will give a more readable output. Change some
others to use Register::id() so we can eventually remove the implicit
cast to `unsigned`.
Most of it is redundant with bfloat-convert.ll. One testcase is found in
bfloat-imm.ll. The load and stores are more thoroughly tested in
bfloat-mem.ll.
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.
Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).
Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).
For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.
This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
This patch extends TypeQuery matching to support anonymous namespaces. A
new flag is added to control the behavior. In the "strict" mode, the
query must match the type exactly -- all anonymous namespaces included.
The dynamic type resolver in the itanium abi (the motivating use case
for this) uses this flag, as it queries using the name from the
demangles, which includes anonymous namespaces.
This ensures we don't confuse a type with a same-named type in an
anonymous namespace. However, this does *not* ensure we don't confuse
two types in anonymous namespacs (in different CUs). To resolve this, we
would need to use a completely different lookup algorithm, which
probably also requires a DWARF extension.
In the "lax" mode (the default), the anonymous namespaces in the query
are optional, and this allows one search for the type using the usual
language rules (`::A` matches `::(anonymous namespace)::A`).
This patch also changes the type context computation algorithm in
DWARFDIE, so that it includes anonymous namespace information. This
causes a slight change in behavior: the algorithm previously stopped
computing the context after encountering an anonymous namespace, which
caused the outer namespaces to be ignored. This meant that a type like
`NS::(anonymous namespace)::A` would be (incorrectly) recognized as
`::A`). This can cause code depending on the old behavior to misbehave.
The fix is to specify all the enclosing namespaces in the query, or use
a non-exact match.