Eliding the vReg to NZCV conversion instruction for G_UADDE/... is illegal if
it causes the carry generating instruction to become dead because ISel
will just remove the dead instruction.
I accidentally introduced this here: https://reviews.llvm.org/D153164.
As far as I can tell, this is not exposed on the default clang settings,
because on O0 there is always a G_AND between boolean defs and uses, so
the optimization doesn't apply. Thus, when I tried to commit
https://reviews.llvm.org/D159140, which removes these G_ANDs on O0, I
broke some UBSan tests.
We fix this by recursively selecting the previous (NZCV-setting) instruction before continuing selection for the current instruction.
It's not a verifier enforced property that implicit_def may only have
one operand. Fixes assertions after the coalescer implicit-defs to
preserve super register liveness to arbitrary instructions.
For some reason I'm unable to reproduce this as a MIR test running only
the allocator for the x86 test. Not sure it's worth keeping around.
Teach the LiveIntervals path in isPlainlyKilled to handle physical
registers, to get equivalent functionality with the LiveVariables path.
Test this by adding -early-live-intervals RUN lines to a handful of
tests that would fail without this.
This reverts revert 19505072123e43eccf528b660973067b5c9b4a26.
An issue was fixed in bea3684944c0d7962cd53ab77aad756cfee76b7c
and some newly appeared tests updated.
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.
This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.
The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
This is done by lowering v16i8 loads into LoadV4 operations with i32
results instead of letting ReplaceLoadVector split it into smaller
loads during legalization. This is done at dag-combine1 time, so that
vector operations with i8 elements can be optimised away instead of
being needlessly split during legalization, which involves storing to
the stack and loading it back.
Similar to what we already do for add/sub + saturation variants.
Scalar support will be added in a future patch covering the other variants at the same time.
Alive2: https://alive2.llvm.org/ce/z/rBDrNEFixes#69080
The RHS of a shift can have a different type than the LHS. If there are
undefs in the vector, we need the undef added to the RHS to match the
type of any shift amounts that are also added to the vector.
For now just don't add shifts if their RHS and LHS don't match.
PR #67391 improved atomic codegen by handling memory ordering specified
by the `cmpxchg` instruction. An acquire barrier needs to be generated
when memory ordering includes an acquire operation. This PR improves the
codegen further by only handling the failure ordering.
This patch consists of porting SDISel patterns of SHXADD instructions to
GISel.
Note that `non_imm12`, a predicate that was implemented with `PatLeaf`,
is now turned into a `PatFrag` of `<op>_with_non_imm12` where `op` is
the operator that uses `the non_imm12` operand, as GISel doesn't have
equivalence of `PatLeaf` at this moment.
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.
Co-authored-by: Sirish Pande <sirish.pande@amd.com>
The image intrinsic optimizer pass was neglecting to check any arguments
of the load intrinsic after the VAddr arguments. For example multiple
loads from different resources should not have been combined but were,
because the pass was not checking the resource argument.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.
If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.
Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.
On x86 we can use this re-ordering to:
1) get better `and` constants for `C0` (zero extended moves or
avoid imm64).
2) covert `srl` to `shl` if `shl` will be implementable with `lea`
or `add` (both of which can be preferable).
Proofs: https://alive2.llvm.org/ce/z/qzGM_w
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152116
This patch is a bandage to get G_FRAME_INDEX working. We could import
the SelectionDAG patterns for the ComplexPattern FrameAddrRegImm, and
perhaps we will do that in the future. For now we just select it as an
addition with 0.
The `YAMLParser.h` header file claims support for YAML 1.2 with a few
deviations, but our plain scalar parsing failed to parse some valid YAML
according to the spec. This change puts us more in compliance with the
YAML spec, now letting us parse plain scalars containing additional
special characters in cases where they are not ambiguous.
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.
Also enhanced peephole by deleting identical instructions after FoldImmediate.
Differential Revision: https://reviews.llvm.org/D151848