The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.
Co-authored-by: Sirish Pande <sirish.pande@amd.com>
The image intrinsic optimizer pass was neglecting to check any arguments
of the load intrinsic after the VAddr arguments. For example multiple
loads from different resources should not have been combined but were,
because the pass was not checking the resource argument.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.
If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.
Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.
On x86 we can use this re-ordering to:
1) get better `and` constants for `C0` (zero extended moves or
avoid imm64).
2) covert `srl` to `shl` if `shl` will be implementable with `lea`
or `add` (both of which can be preferable).
Proofs: https://alive2.llvm.org/ce/z/qzGM_w
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152116
This patch is a bandage to get G_FRAME_INDEX working. We could import
the SelectionDAG patterns for the ComplexPattern FrameAddrRegImm, and
perhaps we will do that in the future. For now we just select it as an
addition with 0.
The `YAMLParser.h` header file claims support for YAML 1.2 with a few
deviations, but our plain scalar parsing failed to parse some valid YAML
according to the spec. This change puts us more in compliance with the
YAML spec, now letting us parse plain scalars containing additional
special characters in cases where they are not ambiguous.
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.
Also enhanced peephole by deleting identical instructions after FoldImmediate.
Differential Revision: https://reviews.llvm.org/D151848
There was an error in decoding shift type, which permitted shift types
other than LSL to be (incorrectly) folded into the addressing mode of a
load/store instruction.
If a function has odd number of same type of registers to save, and the
calling convention also requires odd number of such type of CSRs, an FP
register would be accidentally marked as saved when producePairRegisters
returns true.
This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not
handling AArch64::NoRegister; actually this pass must be fixed along
with the register pairing so i can write a test for it.
This patch commits tests that can be optimized by improving
performCONCAT_VECTORCombine to do a better job at decomposing the base
pointer and recognizing a constant offset.
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.
- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
`M68k_RTD` is really similar to X86's stdcall, in which callee pops the
arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`.
This patch also improves how ExpandPseudo Pass handles popping stack at
function returns in the absent of the RTD instruction.
Differential Revision: https://reviews.llvm.org/D149864
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
This moves the legalization of G_FMA to the action builder that can handle more
types. The existing arm64-vfloatintrinsics.ll has been removed as they are
covered in other test files.
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.