This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695
As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
These were missed in an earlier commit, the latency/codesize/size-latency numbers aren't different from the SSE2 values that it was falling through to, hence no test change, but it did mean we were wasting a lookup.
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 which I'll update shortly
As we're using 'typical' worst case values, not all cost entries come from a single CPU - e.g. the latency/throughput from haswell but the size-latency(uops) from zen1/alderlake-e due to 'double pumping'
Building on D132216, use CostKindTblEntry cost tables to simplify the transition to supporting cost kinds other than recip-throughput
Adding full cost kinds support is going to take a while, but by converting to CostKindTblEntry first it will make it easier to support the costs on a per-ISD basis.
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.
I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.
This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.
I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.
For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.
REAPPLIED: Added early out to prevent getCmpSelInstrCost being used for anything but generic integer/float scalar/vector types - getTypeLegalizationCost can't handle the "exotic" TypeID enums that some passes attempt to get a costs for (aggregates etc.).
Differential Revision: https://reviews.llvm.org/D132216
Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations.
I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs.
This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured.
I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing.
For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc.
Differential Revision: https://reviews.llvm.org/D132216
Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support
Differential Revision: https://reviews.llvm.org/D132374
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.
This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.
I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
Both are reasonable names; this is solely that an upcoming change can use the OpNInfo name, and the compiler can tell me if I forgot to update something (instead of silently passing along properties that might not hold.)
SK_Splice should be equivalent to a PALIGNR instruction etc. - but as discussed on D132308, until full fixed vector support for SK_Splice is in place, just assume its a SK_PermuteTwoSrc.
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.
Differential Revision: https://reviews.llvm.org/D132287
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.
Differential Revision: https://reviews.llvm.org/D126885
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
This is follow up of D107082, which enable vector support according to psABI.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D127982
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.
Differential Revision: https://reviews.llvm.org/D125712
We were using the default getScalarizationOverhead expansion for extraction costs, which adds up all the individual element extraction costs.
This is fine for 128-bit vectors, but for 256/512-bit vectors each element extraction also has to account for extracting the upper 128-bit subvector extraction before it can handle the element. For scalarization costs we only need to extract each demanded subvector once.
Differential Revision: https://reviews.llvm.org/D125527
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557