11 Commits

Author SHA1 Message Date
Roman Lebedev
be51fa4580
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Vitaly Buka
bbef90ace4 [IRBuilder] Use PoisonValue in CreateMasked*
Followup to 72b776168c7c80d2035c7226488462dcffc97e75

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D133967
2022-09-19 11:01:41 -07:00
Philip Reames
4d87591028 [RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL
On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Differential Revision: https://reviews.llvm.org/D131519
2022-08-18 13:10:03 -07:00
Philip Reames
0ae46693f0 {RISCV][LV] Split out and expand tests for uniform loads and stores 2022-07-21 10:42:18 -07:00
Philip Reames
20dd3297b1 [LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
2022-06-27 13:38:57 -07:00
Philip Reames
9803b0d1e7 [RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled
LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN).

The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable.

As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended.

Differential Revision: https://reviews.llvm.org/D128547
2022-06-25 11:25:23 -07:00
Philip Reames
ae8fac6f98 [LV][RISCV] Add coverage showing scalable codegen when etype != ELEN
We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on other parts of costing.
2022-06-24 11:38:54 -07:00
Philip Reames
056d63938a [RISCV] Split a vectorizer test runline so that upcoming changes in defaults are visible 2022-06-24 08:48:11 -07:00
Philip Reames
46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Philip Reames
8ae0664282 LoopVect, tests] Add some basic coverage for scalable costing of scatter/gather patterns on RISCV
This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.
2022-06-21 13:54:53 -07:00
Philip Reames
2cf320d41e [LoopVect, tests] Add some basic coverage for scalable costing on RISCV
This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.  For context, I just yesterday fixed a crash in costing of the splat_ptr example - see bbf3fd.
2022-06-21 13:35:38 -07:00