This patch extends #149095 for EOR and ORR.
It uses a simple partition scheme to try to find two suitable disjoint
bitmasks that can be used with EOR/ORR to reconstruct the original mask.
Fixes: #148987.
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.
Move this information to ArgFlags to make it directly available in all
relevant places.
I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
We can generate ext from shuffles like <2, 3, 0, 1> from a single vector
source. Add handling to isShuffleMaskLegal to allow DAG combines to
optimize to it.
The current disassembly of `ldt{add,set,clr}` instructions when using
`xzr/wzr` is incorrect. The Armv9.6-A Memory Systems specification says:
```
For each of LDT{ADD|SET|CLR}{L}, there is the corresponding STT{ADD|SET|CLR}{L}
alias, for the case where the register selected by the Rt field is XZR or WZR
```
and:
```
LDT{ADD|SET|CLR}{A}{L} is equivalent to LD{ADD|SET|CLR}{A}{L} except that: <..conditions..>
```
The Arm ARM specifies the preferred form of disassembly for these
aliases:
```
STADD <Xs>, [<Xn|SP>]
is equivalent to
LDADD <Xs>, XZR, [<Xn|SP>]
and is always the preferred disassembly.
```
(ref: DDI 0487L.b C6-2317)
This means that `sttadd` is the preferred disassembly for `ldtadd w0,
wzr, [x2]` when Rt is `xzr` or `wzr`.
This change also aligns llvm disassembly with GNU binutils, as shown by
the following examples:
llvm before this change:
```
% cat test.s
stadd w0, [sp]
sttadd w0, [sp]
ldadd w0, wzr, [sp]
ldtadd w0, wzr, [sp]
% llvm-mc-20 -triple aarch64 -mattr=+lse,+lsui test.s
stadd w0, [sp]
ldtadd w0, wzr, [sp]
stadd w0, [sp]
ldtadd w0, wzr, [sp]
```
llvm after this change:
```
% llvm-mc -triple aarch64 -mattr=+lse,+lsui test.s
stadd w0, [sp]
sttadd w0, [sp]
stadd w0, [sp]
sttadd w0, [sp]
```
GCC-15 test:
```
% gas test.s -march=armv8-a+lsui+lse -o test.o
% objdump -dr test.o
0: b82003ff stadd w0, [sp]
4: 192007ff sttadd w0, [sp]
8: b82003ff stadd w0, [sp]
c: 192007ff sttadd w0, [sp]
```
Many thanks to Ezra Sitorus and Alice Carlotti for reporting and
confirming this issue.
This patch avoids a comparison against zero when lowering abs(sub(a, b))
patterns, instead reusing the condition codes generated by a subs of the
operands directly.
For example, currently:
```
sxtb w8, w0
sub w8, w8, w1, sxtb
cmp w8, #0
cneg w0, w8, mi
```
becomes:
```
sxtb w8, w0
subs w8, w8, w1, sxtb
cneg w0, w8, mi
```
Together with #151177, this should handle the remaining patterns in
#118413.
As a follow-up to #151177, when lowering SELECT_CC nodes of absolute
difference patterns, drop poison-generating flags from the negated
operand to avoid inadvertently propagating poison.
As discussed in the PR above, I didn't find practical issues with the
current code, but it seems safer to do this preemptively.
The main change in this patch is we go from emitting the expression:
@ cfa - NumBytes - NumScalableBytes * VG
To:
@ cfa - VG * NumScalableBytes - NumBytes
That is, VG is the first expression. This is for a future patch that
adds an alternative way to resolve VG (which uses the CFA, so it is
convenient for the CFA to be at the top of the stack).
Since doing this is fairly churn-heavy, I took the opportunity to also
save up to 4-bytes per SVE CFI expression. This is done by folding
LEB128 constants to literals when in the range 0 to 31, and using the
offset in `DW_OP_breg*` expressions.
* `tools/llvm-objcopy/MachO/update-section-object.test` was failing on
Windows since the input file (`macho_sections.s`) might be checked out
with the wrong line ending, resulting in difference in the size of
sections being checked.
* Removed the check for Windows in `AArch64Arm64ECCallLowering`: when
`llc` is run without an explicit target, the module's target triple is
unknown so this assert fires.
* Expect `llvm/test/CodeGen/Generic/allow-check.ll` to fail for Arm64EC:
Global ISel is not supported.
Some vector instructions override AsmString in the tablegen description,
but did not include the Apple syntax variant, so were printed without
operands.
Fixes#151330
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
While testing Arm64EC, I observed that LLVM crashes when an empty
function name is used. My original fix in #151409 was to raise an error,
but this change now handles the empty name via
`Mangler::getNameWithPrefix` (which assigns a name to the function).
To get this working, I had to create the `Mangler` in
`TargetLoweringObjectFile` early so it would be available to Arm64EC's
lowering. There's no reason why `Mangler` is only created when
`Initialize` is called (or re-created if it exists), and so I moved
creation to the constructor and switched the raw pointer for a
`unique_ptr` to avoid the explicit `delete` in the destructor.
Certain fcmp predicates need to be expanded into multiple operations and
or'd together. This adds some more accurate cost modelling for them
based on the predicate. Unsupported operations are given the cost of a
libcall and the latency is set to 2 as that seemed to be fairly common
between different CPUs.
This prevents them from generating Invalid costs, as generating the
instructions seems to work fine with and without +bf16. The costs are
mostly taken from the number of instructions (minus ptrue and constants).
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
`Data` now references the first byte of the fixup offset within the current fragment.
MCAssembler::layout asserts that the fixup offset is within either the
fixed-size content or the optional variable-size tail, as this is the
most the generic code can validate without knowing the target-specific
fixup size.
Many backends applyFixup assert
```
assert(Offset + Size <= F.getSize() && "Invalid fixup offset!");
```
This refactoring allows a subsequent change to move the fixed-size
content outside of MCSection::ContentStorage, fixing the
-fsanitize=pointer-overflow issue of #150846
Pull Request: https://github.com/llvm/llvm-project/pull/151724
While testing Arm64EC, I observed that LLVM crashes when an
`available_externally` function is used as it tries to place it in a
COMDAT, which is not permitted by the verifier.
This the fix from #151409 plus a dedicated test.
Without this change, the following test would fail to compile
with `-march=armv8-a+sme`:
```
void func1(const svuint32_t *in, svuint32_t *out) {
[&]() __arm_streaming { *out = *in; }();
}
```
But in general, it's probably better never to inline
streaming functions into non-streaming functions, because
they will have been marked as 'streaming' for a reason
by the user.
to facilitate replacing `MutableArrayRef<char> Data` (fragment content)
with the relocated location. This is necessary to fix the
pointer-overflow sanitizer issue and reland #150846
#147420 changed the unrolling preferences to permit unrolling of
non-auto vectorized loops by checking for the isvectorized attribute,
however when a loop is vectorized this attribute is put on both the
vector loop and the scalar epilogue, so this change prevented the scalar
epilogue from being unrolled.
Restore the previous behaviour of unrolling the scalar epilogue by
checking both for the isvectorized attribute and vector instructions in
the loop.
This extends the existing push_add_through_zext to handle mul, similar
to performVectorExtCombine in SDAG. This allows muls to be pushed up the
tree of extends, operating on smaller vector types whilst keeping the
result the same (providing there are > 2x bits in the output).
matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps
generating dot instructions from add(ext(mul(ext, ext))).
Stores can be issued faster if the result is kept in the SIMD/FP
registers.
The `HasOneUse` guards against creating two floating point conversions,
if for example there's some arithmetic done on the converted value as
well. Another approach would be to inspect the user instructions during
lowering, but I don't see that type of check in the lowering too often.
The current code generated for absolute difference patterns
(a > b ? a - b : b - a) typically consists of sequences of:
```
sub w8, w1, w0
subs w9, w0, w1
csel w0, w9, w8, hi
```
The first sub is redundant if the csel is replaced by a cneg:
```
subs w8, w0, w1
cneg w0, w8, ls
```
This is achieved by canonicalising
```
select_cc lhs, rhs, sub(lhs, rhs), sub(rhs, lhs), cc ->
select_cc lhs, rhs, sub(lhs, rhs), neg(sub(lhs, rhs)), cc
select_cc lhs, rhs, sub(rhs, lhs), sub(lhs, rhs), cc ->
select_cc lhs, rhs, neg(sub(lhs, rhs)), sub(lhs, rhs), cc
```
as the second forms can already be matched.
This helps with some of the patterns in #118413.
The selection code for aarch64_sve_ld[nt]1_pn_x{2,4} intrinsics gates
the use of strided load instructions behind the SME2 target feature.
However, the instructions are only available in streaming mode.
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
Introduce a pseudo instruction carrying address and immediate modifiers
as separate operands to be selected instead of a pair of `MOVKXi` and
`PAC[ID][AB]` . The new pseudo instruction is expanded in AsmPrinter, so
that `MOVKXi` is emitted immediately before `PAC[ID][AB]`. This way, an
attacker cannot control the immediate modifier used to sign
the value, even if address modifier can be substituted.
To simplify the instruction selection, select `AArch64::PAC` pseudo
using TableGen pattern and post-process its `$AddrDisc` operand by
custom inserter hook - this eliminates duplication of the logic for
DAGISel and GlobalISel. Furthermore, this improves cross-BB analysis in
case of DAGISel.
I think this is OK, that we always use v2i64 for the type of a LDNP/STNP
nodes. Bitcasting the type should be fine for little endian. This helps
with #150125.
Unable to reproduce yet, but this definitely seems wrong. Better safe
than sorry.
No effect on codegen as far as I know (because I have not been able to
repro).
Consistently use `FlagsVT` for operands/results of nodes that
consume/produce NZCV flags.
Previously, some of the operands/results had incorrect `MVT::Glue` type
while others had `MVT_CC` type, which is supposed to be used for
condition codes (`AArch64CC::CondCode` enum).
Found by #150125.
Previously, a separate load, zext and FMOV instruction was emitted. This
patch adds a new TableGen pattern to avoid the unnecessary FMOV. A test
is included in test/CodeGen/AArch64/load_u64_from_u32.ll
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.