llvm-project

Author SHA1 Message Date

Author	SHA1	Message	Date
Simon Pilgrim	c1af46cc20	[CostModel][X86] Add BITREVERSE cost model estimations Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates	2023-04-18 11:25:26 +01:00
Roman Lebedev	8e37b53360	[X86] Rewrite `getScalarizationOverhead()` All of our insert/extract ops work on 128-bit lanes. For `Insert`, we need to extract affected 128-bit lane, unless it's being fully overwritten (FIXME: do we need to be careful about legalization-induced padding that we obviously don't demand?), perform insertions, and then insert the 128-bit lane back. But hold on. If we are operating on an 256-bit legal vector, and thus have two 128-bit subvectors, and are fully overwriting them both, we don't actually need to insert both subvectors, only the second one, into the implicitly-widened first one. Also, `Insert` wasn't actually querying the costs, but just assuming them to be `1`. `getShuffleCost(TTI::SK_ExtractSubvector)` notes: ``` // Note that in general, the insertion starting at the beginning of a vector // isn't free, because we need to preserve the rest of the wide vector. ``` ... so as far as i can tell, we didn't account for that. I was hoping this would allow vectorization at a higher VF at one case i looked at, but the subvector insertion cost is still dis-advising that. The change for `Extract` is NFC, and is for consistency only, i wanted to get rid of of that weird explicit discounting of insertion of 0'th element, since the general code should already deal with that. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137913	2022-11-15 21:07:12 +03:00
Simon Pilgrim	ac89b03934	[CostModel][X86] Add CostKinds test coverage for bitreverse intrinsics	2022-09-13 11:23:03 +01:00

Simon Pilgrim

c1af46cc20

[CostModel][X86] Add BITREVERSE cost model estimations

Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates

2023-04-18 11:25:26 +01:00

Roman Lebedev

8e37b53360

[X86] Rewrite getScalarizationOverhead()

All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913

2022-11-15 21:07:12 +03:00

Simon Pilgrim

ac89b03934

[CostModel][X86] Add CostKinds test coverage for bitreverse intrinsics

2022-09-13 11:23:03 +01:00

3 Commits