This case is different from the earlier <8 x i1> case handled because it triggers
a legalization failure in lowerStore() that's intended for scalar code.
It also was triggering incorrect bitcast actions in the AArch64 rules that weren't
expecting truncating stores.
With these two fixed, more cases are handled. The code is still bad, including
some missing load promotion in our combiners that result in dead stores hanging
around at the end of codegen. Again, we can fix these in separate changes.
Reviewers: davemgreen, madhur13490, topperc, arsenm
Reviewed By: davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/121185
This is essentially a port of TargetLowering::scalarizeVectorStore(), which
is used for the case where we have something like a store of <8 x s8> truncating
to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute
a scalar value to store.
AArch64's DAG implementation has some more smarts to improve this further which
we can do later.
Reviewers: topperc, davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/121169
Convert the LLT to EVT and call
TargetLowering::shouldExpandCmpUsingSelects to determine if we should do
this.
We don't have a getSetccResultType, so I'm boolean extending the
compares to the result type and using that. If the compares legalize to
the same type, these extends will get removed. Unfortunately, if the
compares legalize to a different type, we end up with truncates or
extends that might not be optimally placed.
This, like other operations, scalarizes shuffle vector operations with
types larger than 64bits. ImplicitDef and Freeze are also handled the
same way, to allow them to legalize. The legalization of
fewerElementsVectorShuffle is adjusted to handled scalarization.
This allows us to support i128 G_ICMP on RV32. I'm not sure how to test
the "left over" part of this as RISC-V always widens to a power of 2
before narrowing.
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.
This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
Each call to push_back contains a check to see if the vector needs to
grow. Using resize or giving the size to the constructor can reduce
the number of checks for growing.
narrowScalarAddSub was creating a virtual register and then overwriting
the Register variable without using it. Add an else and only create it
when needed.
When we have legal instructions we want to promote to sXLen and let isel
pattern matching removing the and/sext_inreg.
When using a libcall we want to use a 'si' libcall for small types
instead of 'di'. To match the RV64 ABI, we need to sign extend `unsigned
int` arguments. We reuse the shouldSignExtendTypeInLibCall hook from
SelectionDAG.
Need to force libcall legalization to treat the integer argument as
signed so that it can be promoted to XLen in call lowering for RV64.
Alternatively we could promote the operand before converting to libcall,
but going through call lowering is closer to what SelectionDAG does.
This converts all ptr element shuffle vectors to s64, so that the
existing vector legalization handling can lower them as needed. This
prevents a lot of fallbacks that currently try to generate things like
`<2 x ptr> G_EXT`.
I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary
for vectors of pointers, but it uses buildCast for the casts, which now
generates a ptrtoint/inttoptr.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `VecFuncs.def`: define intrinsic to sleef/armpl mapping
- `LegalizerHelper.cpp`: add missing fewerElementsVector handling for
the new atan2 intrinsic
- `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering
like neon instructions
- `AArch64LegalizerInfo.cpp`: Legalize atan2.
Part 5 for Implement the atan2 HLSL Function #70096.
This changes the existing promote logic to lower, so that it can use
normal integer operations. A minor change was needed to fneg lower code
to handle vectors.
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
This patch adds a common lower action for `G_FABS`, which generates `and
x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not
support vectors since `G_AND` does not support fp128.
This approach is different than what SDAG is doing. SDAG stores the
value onto stack, clears the sign bit in the most significant byte, and
loads the value back into register. This involves multiple memory ops
and sounds slower.
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
- Generate libcall for supported predicates.
- Generate unsupported predicates as combinations of supported
predicates.
- Vectors are scalarized, however some cases like `v3f128_fp128` are still failing, because we failed to legalize G_OR for these types.
GISel now generates the same code as SDAG, however, note the difference
in the `one` case.
This patch enables the target-independent lowering of llvm.lround via
GlobalISel. For SelectionDAG, the instrinsic is custom lowered for
AMDGPU. In order to support vector floating point input for llvm.lround,
this patch extends the target independent APIs and provide support for
scalarizing. pr98950 is needed to let verifier allow vector floating
point types
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.
I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.
This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
Attempts to handle illegal G_CONCAT_VECTOR instructions by bitcasting the source
into scalar values and using G_BUILD_VECTOR instead
Treating the G_CONCAT_VECTORS instruction in the legalization artefact by folding
away concat(bitcast, ...) into buildvector(...) would require check for ImpDef created
by the shuffles in llvm.
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.
The details of the optimization are outlined in #82075Fixes#82075
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
We masked out the sign bit from one value, and the non-sign bits
from the other so there should be no common bits set.
No idea how to test this on the DAG path, other than scraping
the debug logs. A few targets hit this path with f16 values, but
the resulting i16 ors get anyext promoted and lose the disjoint
flag. In the fp128 case, PPC gets further and the or loses the flag
somewhere else later. Adding a haveNoCommonBits assert shows this
works though.
Any fp128 need to end up as libcall, as will f32->i128 and f64->i128.
f16 are a bit special as the maximum range of the result fits in a i17,
so can be shrank to an i64. Vector with i128/fp128 types are scalarized.
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033