llvm-project

Author	SHA1	Message	Date
Sam Parker	e5b6833e49	[WebAssembly] vi8 mul cost modelling. (#175177 ) We've already optimised these, so update the cost model to reflect it. And skip the isBeforeLegalize check when lowering i8 muls, because it then misses the cases where, say v32i8, has been type legalised into 2x v16i8. Also explicitly disable memory interleaving for any factor other than two or four.	2026-01-12 09:25:54 +00:00
Sam Parker	d10a85167a	[WebAssembly] Implement more of getCastInstrCost (#164612 ) Fill out more information for sign and zero extend and add some truncate information; however, the primary change is to int/fp conversions. In particular, fp to (narrow) int appears to be relatively expensive.	2025-11-10 08:07:16 +00:00
Florian Hahn	bfc322dd72	Revert "[VPlan] Run narrowInterleaveGroups during general VPlan optimizations. (#149706 )" This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c. There have been reports of mis-compiles in https://github.com/llvm/llvm-project/pull/149706. Revert while I investigate.	2025-10-22 21:27:11 +01:00
Sam Parker	20340accf2	[NFC][WebAssembly] FP conversion interleave tests (#164576 )	2025-10-22 11:43:44 +01:00
Sam Parker	7b3e77f8d9	[WebAssembly] Implement getInterleavedMemoryOpCost (#146864 ) First pass where we calculate the cost of the memory operation, as well as the shuffles required. Interleaving by a factor of two should be relatively cheap, as many ISAs have dedicated instructions to perform the (de)interleaving. Several of these permutations can be combined for an interleave stride of 4 and this is the highest stride we allow. I've costed larger vectors, and more lanes, as more expensive because not only is more work is needed but the risk of codegen going 'wrong' rises dramatically. I also filled in a bit of cost modelling for vector stores. It appears the main vector plan to avoid is an interleave factor of 4 with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on AArch64, and observe geomean improvement of ~3% with some kernels improving 40-60%. I know there is still significant performance being left on the table, so this will need more development along with the rest of the cost model.	2025-08-27 12:43:52 +01:00

5 Commits