llvm-project

Author	SHA1	Message	Date
Nikita Popov	e6d9542b77	[X86][Inline] Check correct function for target feature check (#152515 ) The check for ABI differences for inlined calls involves the caller, the callee and the nested callee. Before inlining, the ABI is determined by the target features of the callee. After inlining it is determined by the caller. The features of the nested callee should never actually matter.	2025-08-19 09:44:00 +02:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
Simon Pilgrim	3820206194	[CostModel][X86] Update SK_Reverse based on cost kinds (#150650 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 18:21:56 +01:00
Simon Pilgrim	0fa0ce1f3a	[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 13:52:47 +01:00
Simon Pilgrim	1a60c74c13	[CostModel][X86] SK_InsertSubvector inserted into the lowest subvector should be treated as SK_Select blend (#145892 ) X86 uses implicit widening and BLEND/MOV shuffles in these cases - otherwise we still treat it as a SK_PermuteTwoSrc	2025-06-26 16:00:51 +01:00
Simon Pilgrim	8202c94cec	[CostModel][X86] getMaskedMemoryOpCost - widening masks must compute the cost of the full width insert_subvector across multiple legal vectors (#145693 ) The memory value and mask value types might legalise differently - e.g. a v64i32 might split into 4 x v16i32 / 8 x v8i32 but the mask might legalize as 1 x v64i8 / 2 x v32i8 etc. If the legalised value type has been split, then we must ensure we compute the cost for the entire mask value type and let getShuffleCost handle any legalisation, not assume that only a single trailing split mask will require widening.	2025-06-25 16:30:35 +01:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Alexander Ziaee	44a7ecd1d7	[doc] Use ISO nomenclature for 1024 byte units (#133148 ) Increase specificity by using the correct unit sizes. KBytes is an abbreviation for kB, 1000 bytes, and the hardware industry as well as several operating systems have now switched to using 1000 byte kBs. If this change is acceptable, sometimes GitHub mangles merges to use the original email of the account. $dayjob asks contributions have my work email. Thanks!	2025-06-11 13:27:23 +02:00
Simon Pilgrim	f59742c1ea	[X86] getIntImmCostInst - recognise i64 ICMP EQ/NE special cases (#142812 ) If the lower 32-bits of a i64 value are known to be zero, then icmp lowering will shift+truncate down to a i32 allowing the immediate to be embedded. There's a lot more that could be done here to match icmp lowering, but this PR just focuses on known regressions. Fixes #142513 Fixes #62145	2025-06-06 10:21:40 +01:00
David Green	abd2c07e39	[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631 ) This does not alter much at the moment, but allows const pointers to be passed as Op0 and Op1, simplifying later patches	2025-05-01 15:55:08 +01:00
Jonas Paulsson	f5c8c1eedb	[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830 ) `ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" ` changed SLPVectorizer getScalarizationOverhead() to call TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some cases. This was due to X86 specific handlings in these (overridden) methods, and unfortunately the general preference of TTI.getScalarizationOverhead() was dropped. If VL is available it should always be preferred to use getScalarizationOverhead(), and this is indeed the case for SystemZ which has a special insertion instruction that can insert two GPR64s. Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with codegen"` reworked SLPVectorizer getGatherCost() which together with ad9909d caused the SystemZ test vec-elt-insertion.ll to fail. This patch restores the SystemZ test and reverts the change in SLPVectorizer getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always called again. The ForPoisonSrc argument is now passed on to the TTI method so that X86 can handle this as required. Fixes: #135346	2025-04-30 17:11:27 +02:00
David Green	98b6f8dc69	[CostModel] Remove optional from InstructionCost::getValue() (#135596 ) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.	2025-04-23 07:46:27 +01:00
Sergei Barannikov	3334c3597d	[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655 ) These are not diagnosed because implementations hide the methods of the base class rather than overriding them. This works as long as a hiding function is callable with the same arguments as the same function from the base class. Pull Request: https://github.com/llvm/llvm-project/pull/136655	2025-04-22 11:40:12 +03:00
Sergei Barannikov	0014b49482	[TTI] Make all interface methods const (NFCI) (#136598 ) Making `TargetTransformInfo::Model::Impl` `const` makes sure all interface methods are `const`, in `BasicTTIImpl`, its bases, and in all derived classes. Pull Request: https://github.com/llvm/llvm-project/pull/136598	2025-04-22 06:27:29 +03:00
Sergei Barannikov	e0c1e23b99	[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575 ) The main change is making `thisT` method `const`, the rest of the changes is fixing compilation errors (). () There are two tricky methods, `getVectorInstrCost()` and `getIntImmCost()`. They have several overloads; some of these overloads are typically pulled in to derived classes using the `using` directive, and then hidden by methods in the derived class. The compiler does not complain if the hiding methods are not marked as `const`, which means that clients will use the methods from the base class. If after this change your target fails cost model tests, this must be the reason. To resolve the issue you need to make all hiding overloads `const`. See the second commit in this PR. Pull Request: https://github.com/llvm/llvm-project/pull/136575	2025-04-21 21:42:40 +03:00
Reid Kleckner	2538c607e9	[CodeGen] Prune headers and move code out of line for build efficiency, NFC (#135622 ) I noticed these destructors taking time with -ftime-trace and moved some of them for minor build efficiency improvements. The main impact of moving destructors out of line is that it avoids requiring container fields containing other types from being complete, i.e. one can have uptr<T> or vector<T> as a field with an incomplete type T, and that means we can reduce transitive includes, as with LegalizerInfo.h. Move expensive getDebugOperandsForReg template out-of-line. The std::function instantiation shows up in time trace even if you don't use the function.	2025-04-14 22:23:18 -07:00
Krzysztof Drewniak	554859c736	[TTI] Make isLegalMasked{Load,Store} take an address space (#134006 ) In order to facilitate targets that only support masked loads/stores on certain address spaces (AMDGPU will support them in an upcoming patch, but only for address space 7), add an AddressSpace parameter to isLegalMaskedLoad and isLegalMaskedStore	2025-04-02 15:38:10 -05:00
Phoebe Wang	64555e3d48	[X86][NFCI] Add IsStore parameter to hasConditionalLoadStoreForType (#132153 ) Address https://github.com/llvm/llvm-project/pull/132032#issuecomment-2736936769	2025-03-20 18:25:09 +08:00
Craig Topper	39c454af01	[TTI] getScalingFactorCost should return InstructionCost::getInvalid() instead of -1. (#129802 ) Historically this function return an int with negative values meaning invalid. It was migrated to InstructionCost in 43ace8b5ce07a, but the code was not updated to return invalid cost instead of -1. In that commit, the caller in LSR was updated to assert that the cost is valid instead of positive. We should return invalid instead of a negative value so LSR will assert if the cost isn't valid.	2025-03-05 09:10:45 -08:00
Simon Pilgrim	89ca3e72ca	[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789 ) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.	2025-01-29 10:23:09 +00:00
Simon Pilgrim	7d172f96ff	[CostModel][X86] getShuffleCosts - convert all shuffle cost tables to be CostKind compatible. NFC. (#124753 ) No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.	2025-01-28 15:04:25 +00:00
Kazu Hirata	1782168c52	[X86] Fix a warning This patch fixes: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:1583:47: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'typename iterator_traits<const int *>::difference_type' (aka 'long') [-Werror,-Wsign-compare]	2025-01-27 10:02:21 -08:00
Simon Pilgrim	178f47143a	[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412 ) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS. I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :\|	2025-01-27 15:56:22 +00:00
Simon Pilgrim	dec47b76f4	[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312 ) Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs	2025-01-26 14:43:51 +00:00
Simon Pilgrim	a12d7e4b61	[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254 ) getVectorCallCosts determines the cost of a vector intrinsic, based off an existing scalar intrinsic call - but we were including the scalar argument data to the IntrinsicCostAttributes, which meant that not only was the cost calculation not type-only based, it was making incorrect assumptions about constant values etc. This also exposed an issue that x86 relied on fallback calculations for funnel shift costs - this is great when we have the argument data as that improves the accuracy of uniform shift amounts etc., but meant that type-only costs would default to Cost=2 for all custom lowered funnel shifts, which was far too cheap. This is the reverse of #124129 where we weren't including argument data when we could. Fixes #63980	2025-01-24 15:13:13 +00:00
Simon Pilgrim	1fa56038f6	[CostModel][X86] getIntrinsicInstrCost - lrint/llrint costs can use getCastInstrCost without argument data We don't use the IntrinsicCostAttributes arguments so, which allows us to use in type-only analysis in a future patch.	2025-01-24 09:12:07 +00:00
Alexey Bataev	bab7920fd7	[RISCV][CG]Use processShuffleMasks for per-register shuffles Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code. Reviewers: topperc, wangpc-pp, lukel97, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/121765	2025-01-13 17:06:25 -05:00
Simon Pilgrim	a5e129ccde	[CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0 This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0) This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).	2025-01-07 12:23:45 +00:00
Simon Pilgrim	5a7dfb4659	[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.	2025-01-07 11:31:44 +00:00
Simon Pilgrim	db88071a8b	[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778 ) Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.	2025-01-06 17:54:36 +00:00
Simon Pilgrim	7cdbde70fa	[CostModel][X86] getShuffleCost - use processShuffleMasks for all shuffle kinds to legal types (#120599 ) (#121760 ) Now that processShuffleMasks can correctly handle 2 src shuffles, we can completely remove the shuffle kind limits and correctly recognize the number of active subvectors per legalized shuffle - improveShuffleKindFromMask will determine the shuffle kind for each split subvector.	2025-01-06 13:32:55 +00:00
Simon Pilgrim	611401c115	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 10:39:45 +00:00
Simon Pilgrim	091448e3c1	Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707 ) Reverts llvm/llvm-project#120599 - some recent tests are currently failing	2024-12-20 10:06:03 +00:00
Simon Pilgrim	81e63f9e0c	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 09:55:11 +00:00
Simon Pilgrim	9bb1d0369c	[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561 ) If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free. We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.	2024-12-19 12:55:44 +00:00
Simon Pilgrim	df2356b475	[X86] getShuffleCost - ensure we treat constant folded shuffles as free	2024-12-17 09:01:54 +00:00
Simon Pilgrim	4f933277a5	[CostModel][X86] Improve cost estimation of insert_subvector shuffle patterns of legalized types (#119363 ) In cases where the base/sub vector type in an insert_subvector pattern legalize to the same width through splitting, we can assume that the shuffle becomes free as the legalized vectors will not overlap. Note this isn't true if the vectors have been widened during legalization (e.g. v2f32 insertion into v4f32 would legalize to v4f32 into v4f32). Noticed while working on adding processShuffleMasks handling for SK_PermuteTwoSrc.	2024-12-10 16:28:56 +00:00
Simon Pilgrim	53e9eee0e2	[X86][TTI] Use TargetCostConstants Free/Basic values instead of hard coded 0/1 to make the costs calculation more obvious. NFC.	2024-12-10 10:51:06 +00:00
Simon Pilgrim	85d15bd130	[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642 ) Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers. Fixes #111126	2024-12-04 16:36:00 +00:00
Simon Pilgrim	041e5c96c4	[X86] getMemoryOpCost - ensure we pass through OpInfo / Instruction args to base getMemoryOpCost calls Nothing really uses these yet, but we shouldn't be losing the info. We can also pass on the OpInfo arg to the getMemoryOpCost constant load call to indicate if its constant/uniform/pow2 etc. Prep cleanup for #111126	2024-12-04 11:43:52 +00:00
Simon Pilgrim	94df95de6b	[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999 ) SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur. Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.	2024-12-01 12:01:47 +00:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Kazu Hirata	dfe43bd1ca	[X86] Remove unused includes (NFC) (#115593 ) Identified with misc-include-cleaner.	2024-11-09 08:23:46 -08:00
Simon Pilgrim	ac1869aa70	[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680 ) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane. This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.	2024-11-04 10:19:02 +00:00
Matthias Braun	255e441613	X86: Do not return invalid cost for fp16 conversion (#114128 ) Returning invalid instruction costs when converting from/to fp16 in `X86TTIImpl::getCastInstrCost` when there is no hardware support available was triggering asserts. This changes the code to return a large (arbitrary) number to model the fact that libcalls are used to implement the conversion. This also simplifies the code by only reporting costs for the scalar fp16 conversion; vectorized costs being left to the fallback assuming scalarization. This is a follow-up to assertion issues reported for the changes in #113195	2024-10-29 17:16:17 -07:00
Matthias Braun	054c23d78f	X86: Improve cost model of fp16 conversion (#113195 ) Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer transforms the patterns: - Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE register can hold 4xfloat converted/stored to 4xf16) this is necessary as fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16 stores as legal as we generally expand fp16 operations). - Add missing cost entries to `X86TTIImpl::getCastInstrCost` conversion from/to fp16. Note that conversion from f64 to f16 is not supported by an X86 instruction.	2024-10-25 16:22:24 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Simon Pilgrim	8b6e1dc924	[X86] getIntImmCostInst - reduce i64 imm costs of AND(X,CMASK) case that can fold to BEXT/BZHI With BEXT/BZHI the i64 imm mask will be replaced with a i16/i8 control mask Fixes #111323	2024-10-07 12:55:54 +01:00
Simon Pilgrim	c978d05a26	[X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC.	2024-10-07 12:44:59 +01:00

1 2 3 4 5 ...

928 Commits