7dc20ab introduced an extra COPY when spilling and filling a PNR
register, which can't be elided as the input (PNR predicate) and output
(PPR predicate) register classes differ. The patch adds a new register
class that covers both PPR and PNR so that STR_PXI and LDR_PXI can
take either of them, removing the need for the copy.
This reverts commit e770153865c53c4fd72a68f23acff33c24e42a08.
This wasn't reviewed, and the functionality in question was
intentionally rejected the last time it was discussed in
https://reviews.llvm.org/D56305 .
-fsanitize=function emits a signature and function hash before a
function. Similar to 7f6e2c9, these can be sheared off when
`.subsections_via_symbols` is used.
This change uses the same technique 7f6e2c9 introduced for prefixes:
emitting a symbol for the metadata, then marking the actual function
entry as an .alt_entry symbol.
llvm/test/CodeGen/AArch64/elf-globals-pic.ll:
Since https://reviews.llvm.org/D91734, elf-globals-static.ll test
contains several `CHECK-PIC` lines. They do not seem to bring any value
since there are no FileCheck run lines checking against this prefix. The
right place for such tests should be elf-globals-pic.ll, which already
contains check lines being deleted in this commit. Both
elf-globals-pic.ll and elf-globals-static.ll were created after
splitting arm64-elf-globals.ll in 6dbd0ea, and having `CHECK-PIC` lines
in elf-globals-static.ll seems like an issue occurred because of git
thinking that elf-globals-pic.ll is a new file and elf-global-static.ll
is a rename of arm64-elf-globals.ll.
llvm/test/CodeGen/AArch64/tagged-globals-pic.ll:
Similar to elf-globals-pic.ll, contains unneeded
`CHECK-SELECTIONDAGISEL` and `CHECK-GLOBALISEL` directives not checked
by any FileCheck invocation. These directives are present in
tagged-globals-static.ll. Both tests are present in the code tree since
fd32639 when tagged-globals.ll was splitted into
tagged-globals-{pic|static}.ll.
SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
expanding into a stack store and reload.
Fold BICi if all destination bits are already known to be zeroes
```llvm
define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) {
%x0 = zext <8 x i8> %a0 to <8 x i16>
%x1 = zext <8 x i8> %a1 to <8 x i16>
%hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1)
%res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511>
ret <8 x i16> %res
}
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
```
```
haddu_known: // @haddu_known
ushll v0.8h, v0.8b, #0
ushll v1.8h, v1.8b, #0
uhadd v0.8h, v0.8h, v1.8h
bic v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended
ret
```
Fixes#53881Fixes#53622
Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of
floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x
bfloat> on some targets and address spaces.
Note this only supports the proper floating-point operations; float
vector typed xchg is still not supported. cmpxchg still only supports
integers, so this inserts bitcasts for the loop expansion.
I have support for fp vector typed xchg, and vector of int/ptr
separately implemented but I don't have an immediate need for those
beyond feature consistency.
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.
On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).
To reflect this, this patch:
- Enables aggressive folding of shifts into loads by default.
- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).
- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.
I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
Depends on #87545
Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in
`.note.gnu.property` section depending on
`aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm
module flags.
- When both operands are constant, the matcher runs into an infinite
loop as the commutation should be applied only when LHS is a constant
and RHS is not.
Reviewers: arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/87426
This tries to fill in the basic vector handling for sadd_sat/uadd_sat
and ssub_sat/usub_sat. It just handles the basics, marking legal types
and clamping illegally sized vectors to legal ones.
The ID argument of `gc.statepoint` gets incorrectly truncated to 32 bits
during code generation.
This is fixed by using `uint64_t` instead of `unsigned` for the `ID`
member in `SelectionDAGBuilder::StatepointLoweringInfo`, and a
`patchpoint` test case is extended to check for 64 bit ID generation in
stackmaps.
`RegisterClassInfo::getRegPressureSetLimit` has been changed to return a
smaller value than before so the limit may become negative in later
calculations. As a workaround, change to use
`TargetRegisterInfo::getRegPressureSetLimit`.
Also improve tests.
We currently just use mangled name. This works fine, because linker
should detect that and demangle it for the export table. However, on
MSVC, the compiler is more specific and passes demangled name as well,
with EXPORTAS. This PR aims to match that. MSVC doesn't use quotes in
this case, so I added '#' to the list of characters that don't need it.
Previously we wouldn't remove dead copies from basic blocks with
successors. The comment said we didn't want to trust the live-in lists.
The comment is very old so I'm not sure if that's still a concern today.
This patch checks the live-in lists and removes copies from
MaybeDeadCopies if they are referenced by any live-ins in any
successors. We only do this if the tracksLiveness property is set. If
that property is not set, we retain the old behavior.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.
We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.
Fixes#86717
Adds logic to the IR verifier that checks whether !tbaa.struct nodes are
well-formed. That is, it checks that the operands of !tbaa.struct nodes
are in groups of three, that each group of three operands consists of
two integers and a valid tbaa node, and that the regions described by
the offset and size operands are non-overlapping.
PR: https://github.com/llvm/llvm-project/pull/86709
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
Similar to how we protected FP/fixed-vector arguments and results from
calls, we should do the same for arguments/results from locally-streaming
functions such that those are not spilled/filled as ZPR registers.
This may cause a small regression (additional spills/fills), which is
addressed by #85386.
This extends the concat load patch from
https://reviews.llvm.org/D121400, which was later moved to a combine, to
handle v2i8 and v2i16 concat loads too.