A vector mul(sext, sext) or mul(zext, zext) will be code generated as a
single smull or umull instruction. This most notably effects v2i64
multiplies, which are otherwise not legal and need to be expanded.
The oneuse check has also been slightly changed, as it is already
checked from the use of isWideningInstruction in getCastInstrCost.
Differential Revision: https://reviews.llvm.org/D123006
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput
To perform the cost model of vector casting, the patch consider most vector
casts as their scalar form and consider those vector form of free scalr castings
as 1.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121771
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.
In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.
Draft spec here https://github.com/riscv/riscv-v-spec/pull/780
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121345
The vectoriser sometimes generates predicated vector loops using
the llvm.get.active.lane.mask intrinsic so it's important that we
are able to calculate a valid cost for the call instruction. When
SVE is enabled we are able to use a single whilelo instruction
for some vector types - in such cases I've marked the cost as 1.
For all other cases I've set the cost according to how the intrinsic
will be expanded.
Tests added here:
Analysis/CostModel/AArch64/sve-intrinsics.ll
Analysis/CostModel/ARM/active_lane_mask.ll
Analysis/CostModel/RISCV/active_lane_mask.ll
Differential Revision: https://reviews.llvm.org/D121109
Currently the cost model under-estimates the cost of certain
FP16 conversions.
This patch updates getCastInstrCost to return more accurate costs for
the cases improved in c2ed9fd05479.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D113700
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D121105
The patch adds very basic cost model for masked memory op on scalable vector.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D117884
The costs of vector shifts was 2 as opposed to 1, as the nodes are
marked custom. Fix this like the others and mark the nodes as cheap.
Differential Revision: https://reviews.llvm.org/D120773
The patch adds very basic cost model for masked memory op on scalable vector.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D117884
This reverts commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!
Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
While testing scalable vectors I found that if we generate a
vector splice intrinsic and run the code through the loop unroller,
we'll crash due to an invalid cost.
This adds a basic cost based on the 2 slide instructions used by the
lowering in D119303.
We probably need to factor LMUL into this, but that's true for
arithmetic instructions too. So I've ignored for the moment.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D119316
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.
```
While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.
But since this was added for X86 specifically, let's just instead completely remove this hack.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114779
The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D118256
We have some bitcasts which we know will be simplified,
so their cost is zero.
Reviewed By: david-arm, sdesmalen
Differential Revision: https://reviews.llvm.org/D118019
If we are inserting into or extracting from a scalable vector we do
not know the number of elements at runtime, so we can only let the
index wrap for fixed-length vectors.
Tests added here:
Analysis/CostModel/AArch64/sve-insert-extract.ll
Differential Revision: https://reviews.llvm.org/D117099
The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.
This did not show any changes in CTMark code size benchmarks.
Reviewers: paquette, samparker, dmgreen
Differential Revision: https://reviews.llvm.org/D109388
Similar to D116732, this adds basic scalar sadd_with_overflow,
uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for
aarch64, which are usually quite efficiently lowered.
Differential Revision: https://reviews.llvm.org/D116734
This adds some AArch64 specific smul_with_overflow and umul_with_overflow
costs, overriding the default costs. The code generation for these mul
with overflow intrinsics is usually better than the default expansion on
AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various
types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll.
Differential Revision: https://reviews.llvm.org/D116732
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ibf8e397df94001f248fba609f072088a46abae08
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D115960
Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs.
2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat
intrisics since they have special processing in cost model.
3. Minor intrisics' costs tests updat and refinement.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D115385