This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.
https://reviews.llvm.org/D158904
This is a way to prevent the register allocator from inserting instructions
which behave differently for different runtime vector-lengths, inside a
call-sequence which changes the streaming-SVE mode before/after the call.
I've considered using BUNDLEs in Machine IR, but found that using this is
not possible for a few reasons:
* Most passes don't look inside BUNDLEs, but some passes would need to
look inside these call-sequence bundles, for example the PrologEpilog
pass (to remove the CALLSEQSTART/END), a PostRA pass to remove COPY
instructions, or the AArch64PseudoExpand pass.
* Within the streaming-mode-changing call sequence, one of the instructions
is a CALLSEQEND. The corresponding CALLSEQBEGIN (AArch64::ADJCALLSTACKUP)
is outside this sequence. This means we'd end up with a BUNDLE that has
[SMSTART, COPY, BL, ADJCALLSTACKUP, COPY, SMSTOP]. The MachineVerifier
doesn't accept this, and we also can't move the CALLSEQSTART into the
call sequence.
Maybe in the future we could model this differently by modelling
the runtime vector-length as a value that's used by certain operations
(similar to e.g. NCZV flags) and clobbered by SMSTART/MMSTOP, such that the
register allocator can consider these as actual dependences and avoid
rematerialization. For now we just want to address the immediate problem.
Reviewed By: paulwalker-arm, aemerson
Differential Revision: https://reviews.llvm.org/D159193
When a function is compiled to be in Streaming(-compatible) mode, the full
set of SVE instructions may not be available. This patch adds an interface
to query that and changes the codegen for FADDA (not legal in Streaming-SVE
mode) to instead be expanded for fixed-length vectors, or otherwise not to
code-generate for scalable vectors.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D156109
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.
Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)
Differential Revision: https://reviews.llvm.org/D158364
LLVM currently emits `.note.GNU-stack` sections on all ELF targets.
However, Solaris ld doesn't know/care about them. Even worse, with the
revised Solaris GNU ld patch (D85309 <https://reviews.llvm.org/D85309>),
there are hundreds of warnings:
/usr/gnu/bin/ld: warning: /usr/lib/amd64/crtn.o: missing .note.GNU-stack
section implies executable stack
/usr/gnu/bin/ld: NOTE: This behaviour is deprecated and will be removed
in a future version of the linker
The Solaris crts are not going to change here, and even if they were, GNU
ld would emit `PT_GNU_STACK` segments that Solaris `ld.so.1` ignores.
So the note sections are completely useless on Solaris and this patch
disables their creation.
Instead, Solaris has its own mechanisms to control stack executability:
`PT_SUNW_STACK`, `DT_SUNW_SX_NXSTACK` and the system-wide control via
`sxadm` where `nxstack` defaults to on.
Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11` with Solaris
ld and GNU ld, and `x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D159179
This adds some more extensive test coverage for frem through global isel,
making sure that vector types are all scalarized and all fp16 become f32
libcalls.
Now that in an earlier commit we adopt the heuristics for SDAG's expansion
of 32/64b fpimms to either GPR materializations or CP load, we can also improve
the localizer to also understand the same heuristics. This avoids localizing
expensive immediates as that increases code size.
The combination of these two changes results in minor improvements in CTMark -Os,
and bigger improvements in some other cases.
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.
If we have Zba we can use add.uw to zero the sign extended bits.
Reviewed By: reames, wangpc
Differential Revision: https://reviews.llvm.org/D159253
With the large code model, the label difference may not fit into 32 bits.
Even if we assume that any individual function is no larger than 2^32
and use a difference from the function entry to the target destination,
things like BOLT can rearrange blocks (even if BOLT doesn't necessarily
work with the large code model right now).
set directives avoid static relocations in some 32-bit entry cases, but
don't worry about set directives for 64-bit jump table entries (we can
do that later if somebody really cares about it).
check-llvm in a bootstrapped clang with the large code model passes.
Fixes#62894
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D159297
but there are some in the epilogue.
Make a decision whether or not to have a startepilogue/endepilogue
based on whether we actually insert SEH opcodes in the epilogue,
rather than whether we had SEH opcodes in the prologue or not.
This fixes an assert failure when there are no SEH opcodes in the
prologue but there are SEH opcodes in the epilogue (for example, when
there is no stack frame but there are stack arguments) which was not
covered in https://reviews.llvm.org/D88641.
Assertion failed: HasWinCFI == MF.hasWinCFI(), file C:\Users\hiroshi\llvm-project\llvm\lib\Target\AArch64\AArch64FrameLowering.cpp, line 1988
Differential Revision: https://reviews.llvm.org/D159238
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
A shuffle of v256i1 with a large enough minimum vlen might make it through type
legalization and into lowering. In this case, zvl1024b was enough. The
bitreverse shuffle lowering would then try to convert this to a v1i256 type
which is invalid (v1i128 exists though, which is why the existing v128i1 tests
were fine).
This patch checks to make sure that the new type is not only legal but also
valid.
Reviewed By: craig.topper, reames
Differential Revision: https://reviews.llvm.org/D159215
Summary:
There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D158739
This patch contains a few changes:
* It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to
128-bits. This is important, because when we spill these registers to the
stack, the address doesn't need to be 256/512 bits aligned because we
split the single-store/reload pseudo instruction up into multiple
STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only
require a 128-bit alignment. Additionally, an alignment larger than the
stack-alignment is not supported for scalable vectors.
* It adds support for these register classes in storeRegToStackSlot,
loadRegFromStackSlot and copyPhysReg.
* It adds tests only for the strided forms. There is no need to also
test the contiguous forms, because a register such as z2_z3 or
z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes,
respectively, which are already covered and tested.
Reviewed By: dtemirbulatov
Differential Revision: https://reviews.llvm.org/D159189
This patch contains a few changes:
* It changes the alignment of the strided/contiguous ZPR2/ZPR4 registers to
128-bits. This is important, because when we spill these registers to the
stack, the address doesn't need to be 256/512 bits aligned because we
split the single-store/reload pseudo instruction up into multiple
STR_ZXI/LDR_ZXI (single vector store/load) instructions, which only
require a 128-bit alignment. Additionally, an alignment larger than the
stack-alignment is not supported for scalable vectors.
* It adds support for these register classes in storeRegToStackSlot,
loadRegFromStackSlot and copyPhysReg.
* It adds tests only for the strided forms. There is no need to also
test the contiguous forms, because a register such as z2_z3 or
z4_z5_z6_z7 are also part of the regular ZPR2 and ZPR4 register classes,
respectively, which are already covered and tested.
Reviewed By: dtemirbulatov
Differential Revision: https://reviews.llvm.org/D159189
We can just bitcast the pre-shifted mask as an integer and use TEST/BT directly.
This can be extended further to better handle sub-i8 mask cases, but just getting rid of KSHIFTR nodes makes a notable difference.
When replacing ComplexDeinterleavingPass::ReductionOperation, we can do it
either from the Real or Imaginary part. The correct way is to take whichever
is later in the BasicBlock, but before the patch, we just always took the
Real part.
Fixes https://github.com/llvm/llvm-project/issues/65044
Differential Revision: https://reviews.llvm.org/D159209
For inline asm with memory operands, we can merge the offset into
the second operand of memory constraint operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158062
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.
The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.
Differential Revision: https://reviews.llvm.org/D158874
We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine.
Differential Revision: https://reviews.llvm.org/D158854
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.
Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.
https://reviews.llvm.org/D153158