5 Commits

Author SHA1 Message Date
Sam Parker
e5b6833e49
[WebAssembly] vi8 mul cost modelling. (#175177)
We've already optimised these, so update the cost model to reflect it.
And skip the isBeforeLegalize check when lowering i8 muls, because it
then misses the cases where, say v32i8, has been type legalised into 2x
v16i8.

Also explicitly disable memory interleaving for any factor other than
two or four.
2026-01-12 09:25:54 +00:00
Sam Parker
d10a85167a
[WebAssembly] Implement more of getCastInstrCost (#164612)
Fill out more information for sign and zero extend and add some truncate
information; however, the primary change is to int/fp conversions. In
particular, fp to (narrow) int appears to be relatively expensive.
2025-11-10 08:07:16 +00:00
Florian Hahn
bfc322dd72
Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.

There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.

Revert while I investigate.
2025-10-22 21:27:11 +01:00
Sam Parker
20340accf2
[NFC][WebAssembly] FP conversion interleave tests (#164576) 2025-10-22 11:43:44 +01:00
Sam Parker
7b3e77f8d9
[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)
First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.

I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.

It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.

I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.
2025-08-27 12:43:52 +01:00