For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into
two halfs, the scalar i128 remainder was causing problems, causing a
crash with invalid vector types. This makes sure they are handled
correctly in fewerElementsBitcast.
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.
This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.
Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.
Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.
For SWDEV-516125.
This is a GISel equivalent of #130665, preventing a double-rounding
issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the
changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP.
Because it is legal to convert i64->f16 itofp without double-rounding,
but not a fpround f64->f16, that variant is lowered to build the two
extends.
Instead of reporting ___memmove as an implementation of memcpy,
make it unavailable and let the lowering logic consider memmove as
a fallback path.
This avoids a special case 1:N mapping for libcall implementations.
If the pointer is aligned to more than the size of the vector, we can
widen the load up to next power of 2 size, as SDAG performs.
Some of the v3 tests are currently worse - those should be addressed in
other issues.
This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.
Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
GlobalISel was previously inefficient in handling bitreverses of vector
types. This deals with i16, i32, i64 vector types and converts them into
i8 bitreverses and rev instructions.
Handles bitreverse for vector types which were previously falling back
onto Selection DAG. Includes 8-bit element vectors greater than 128 bits
and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x
i8>.
This is a reland of #138434 except that:
- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
have been dropped because they caused a test failure under asan, and
- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
been improved with structured bindings.
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.
Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized
types, because this would require (un)packing of bits across byte
boundaries.
Precommit tests: #134904
This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
Similar to other operations, s8, s16 s32 and s64 vector elements are
clamped to legal vector sizes, odd number of elements are widened to the
next power-2 and s128 is scalarized.
This helps legalize cttz as well as ctpop.
Similar to other operations, s8, s16 and s32 vector elements are clamped
to legal vector sizes, but in this case s64 are scalarized to use the
gpr instructions. This allows vector types to split as opposed to
scalarizing.
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.
This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
Non-power-2 vectors will now be padded with zero elements, smaller
vectors will be widened using anyext, which I believe will be better in
many situations than padding with zeros, although some small types may
prefer being scalarized depending on the code. Padding with zeros may
not be best for all sizes (v5i8 being the worst), we can hopefully
improve that in the future but they no longer fall back. We scalarize
other types like i128.
This case is different from the earlier <8 x i1> case handled because it triggers
a legalization failure in lowerStore() that's intended for scalar code.
It also was triggering incorrect bitcast actions in the AArch64 rules that weren't
expecting truncating stores.
With these two fixed, more cases are handled. The code is still bad, including
some missing load promotion in our combiners that result in dead stores hanging
around at the end of codegen. Again, we can fix these in separate changes.
Reviewers: davemgreen, madhur13490, topperc, arsenm
Reviewed By: davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/121185
This is essentially a port of TargetLowering::scalarizeVectorStore(), which
is used for the case where we have something like a store of <8 x s8> truncating
to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute
a scalar value to store.
AArch64's DAG implementation has some more smarts to improve this further which
we can do later.
Reviewers: topperc, davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/121169
Convert the LLT to EVT and call
TargetLowering::shouldExpandCmpUsingSelects to determine if we should do
this.
We don't have a getSetccResultType, so I'm boolean extending the
compares to the result type and using that. If the compares legalize to
the same type, these extends will get removed. Unfortunately, if the
compares legalize to a different type, we end up with truncates or
extends that might not be optimally placed.
This, like other operations, scalarizes shuffle vector operations with
types larger than 64bits. ImplicitDef and Freeze are also handled the
same way, to allow them to legalize. The legalization of
fewerElementsVectorShuffle is adjusted to handled scalarization.
This allows us to support i128 G_ICMP on RV32. I'm not sure how to test
the "left over" part of this as RISC-V always widens to a power of 2
before narrowing.
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.
This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
Each call to push_back contains a check to see if the vector needs to
grow. Using resize or giving the size to the constructor can reduce
the number of checks for growing.
narrowScalarAddSub was creating a virtual register and then overwriting
the Register variable without using it. Add an else and only create it
when needed.
When we have legal instructions we want to promote to sXLen and let isel
pattern matching removing the and/sext_inreg.
When using a libcall we want to use a 'si' libcall for small types
instead of 'di'. To match the RV64 ABI, we need to sign extend `unsigned
int` arguments. We reuse the shouldSignExtendTypeInLibCall hook from
SelectionDAG.