llvm-project

Author	SHA1	Message	Date
David Green	c856e8def4	[ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC	2025-08-20 12:59:32 +01:00
David Green	5836bae463	[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963 ) As fmul and fmadd are so similar, their performance characteristics tend to be the same on most platforms, at least in terms of reciprocal throughputs. Processors capable of performing a given number of fmul per cycle can usually perform the same number of fma, with the extra add being relatively simple on top. This patch makes the scores of the two operations the same, which brings the throughput cost of a fma/fmuladd to 2, and the latency to 3, which are the defaults for fmul. Note that we might also want to change the throughput cost of a fmul to 1, as most processors have ample bandwidth for them, but they should still stay in-line with one another.	2025-08-14 21:53:45 +01:00
David Green	d9d9d9ad19	[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304 ) LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as store(interleave-shuffle). Whilst the shuffle would be expensive in general for MVE (it does not have zip/uzp instructions), it should be treated as cheap when part of the LD2/ST2 pattern. This borrows some code from the AArch64 backed to produce lower costs. (Some of which still shows as higher than it should - that just shows how broken the generic shuffle costs are at the moment, they would be lower if getShuffleCost was called directly as opposed to going through getInstructionCost).	2025-08-14 06:59:37 +01:00
Luke Lau	81b576e66b	[RISCV] Cost casts with illegal types that can't be legalized (#153030 ) If we have a floating point vector and no zve32f/zve64f/zve64d, we can end up with an invalid type-legalization cost from getTypeLegalizationCost. Previously this triggered an assertion that the type must have been legalized if the "legal" type is a vector, but in this case when it's not possible to legalize the original type is spat back out. This fixes it by just checking that the legalization cost is valid. We don't have much testing for zve64x, so we may have other places in the cost model with this issue. Fixes #153008	2025-08-12 00:29:39 +08:00
David Green	7f1638efc1	[AArch64] Generalize costing for FP16 instructions (#150033 ) This extracts the code for modelling a fp16 operation as `fptrunc(fpop(fpext,fpext))` into a new function named getFP16BF16PromoteCost so that it can be reused by the arithmetic instructions. The function takes a lambda to calculate the cost of the operation with the promoted type.	2025-08-08 13:40:07 +01:00
Graham Hunter	de72cca671	[CostModel] Provide a default model for histogram intrinsics (#149348 ) Since we scalarize these intrinsics when the target does not support them, we should model that for costing purposes.	2025-08-08 11:00:00 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
David Green	6a32e2225e	[AArch64] Add SVE fmuladd and fma cost tests. NFC	2025-08-08 08:52:28 +01:00
David Green	fcae1ba775	[ARM] Use -cost-kind=all for cast and active_lane_mask tests. NFC	2025-08-05 08:14:47 +01:00
David Green	b30d5315b7	[AArch64] Add better fcmp costs for expanded predicates (#147940 ) Certain fcmp predicates need to be expanded into multiple operations and or'd together. This adds some more accurate cost modelling for them based on the predicate. Unsupported operations are given the cost of a libcall and the latency is set to 2 as that seemed to be fairly common between different CPUs.	2025-08-04 13:42:57 +01:00
David Green	e136fb04f2	[AArch64] Add sve bf16 fpext and fpround costs. (#150485 ) This prevents them from generating Invalid costs, as generating the instructions seems to work fine with and without +bf16. The costs are mostly taken from the number of instructions (minus ptrue and constants).	2025-08-04 09:47:41 +01:00
Ramkumar Ramachandra	e7200c734d	[LV] Pre-commit test for #151664 (#151671 ) Hoisted vector instructions are costed incorrectly.	2025-08-01 17:09:11 +01:00
Ramkumar Ramachandra	ec0c79df59	[RISCV] Fix bug in [l](lrint\|lround) vector-cost (#151298 ) Follow up on a review of bd66fd0 ([CostModel/RISCV] Fix costs of vector [l](lrint\|lround)) post-landing to fix a subtle problem with the cost of vector [l](lrint\|lround). We should use source LMUL in the case of a narrowing op. Co-authored-by: Luke Lau <luke@igalia.com>	2025-07-30 19:41:11 +01:00
Luke Lau	2a5ac19605	Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 )" This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves the tests that were added. The original commit mistakenly assumed that if regular bf16/f16 loads and stores could be lowered without zvfbfmin/zvfhmin, then so too could masked loads/stores and gathers/scatters. However SelectionDAG can't actually type-legalize masked.load/stores since it needs to be done in ScalarizeMaskedMemIntrinPass. This was causing crashes on IREE because we now returned true for isLegalMaskedLoadStore. The original intent of this was to remove a discrepancy in the loop vectorizer tests whenever predication was enabled, but this has gone away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we need to reapply this patch.	2025-07-30 13:29:47 +08:00
Ramkumar Ramachandra	bd66fd0d01	[CostModel/RISCV] Fix costs of vector [l](lrint\|lround) (#146058 ) Take the actual instruction cost into account, and don't fallthrough to code that doesn't apply to [l]lrint. Also strip invalid costs for [b]f16, as a companion to #146507, and unify it with [l]lround costs as a companion to #147713.	2025-07-29 19:22:11 +01:00
David Green	46526f879f	[ARM] Use -cost-kind=all for arith-overflow.ll, arith-ssat.ll and arith-usat.ll. NFC	2025-07-29 15:08:45 +01:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Luke Lau	a100f63672	[RISCV] Add FP cost model tests for no zfhmin/zfbfmin. NFC Vector costs without zvfhmin/zvfbfmin and zfhmin/zfbfmin are somehow cheaper than with zvfhmin/zvfbfmin at smaller vector sizes, despite the fact that the former are scalarized to libcalls. This adds a RUN line to showcase this, splitting out the bfloat tests into their own functions so we don't have duplicate lines for the regular float/double costs.	2025-07-28 11:27:30 +08:00
Luke Lau	e259ba8bec	[RISCV] Modernize FP cost model tests. NFC * Replace undef -> poison * Remove overloaded type in intrinsic signature	2025-07-28 10:59:39 +08:00
Simon Pilgrim	3820206194	[CostModel][X86] Update SK_Reverse based on cost kinds (#150650 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 18:21:56 +01:00
Simon Pilgrim	0fa0ce1f3a	[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 13:52:47 +01:00
Simon Pilgrim	81eb63ad7f	[CostModel][X86] Complicate the cross lane single/two source shuffle masks Try to ensure shuffle masks don't simplify too much to easier shuffle kinds when splitting	2025-07-25 17:25:08 +01:00
Simon Pilgrim	f1122a64c6	[CostModel][X86] load-broadcast.ll - regenerate checks for all cost kinds	2025-07-25 13:33:32 +01:00
David Green	66dd09a232	[AArch64] Change sve-fcmp.ll to test scalable vectors. NFC Whilst testing fixed length vectors with +sve might be useful, this was just a mistake in the generation of the test and should be using scalable vectors.	2025-07-25 10:09:54 +01:00
David Green	526b672a2c	[AArch64] Add sve bf16 fpext and fptrunc costs. NFC	2025-07-24 19:09:30 +01:00
Luke Lau	61110e0f62	[TTI] Share value and type based llvm.vector.reverse cost (#150415 ) We currently provide a generic cost for llvm.vector.reverse in BasicTTI by reusing the reverse shuffle cost, but only for the value based cost. Since the argument values aren't actually used, move this into the type based costing method so that type based costing can also reuse it.	2025-07-24 22:06:40 +08:00
Luke Lau	0e42eaa668	[RISCV] Add type based RUN line for vector intrinsic cost model tests. NFC	2025-07-24 20:50:47 +08:00
David Green	52499bbd90	[ARM] Test all cost kinds in arith.ll. NFC	2025-07-23 22:01:46 +01:00
Elvis Wang	324773e238	[RISCV][TTI] Implement vector costs for `llvm.fpto{u\|s}i.sat()`. (#143655 ) This patch implement vector costs for `llvm.fptoui.sat()` in RISCV TTI.	2025-07-23 09:52:33 +08:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
David Green	828a867ee0	[AArch64] Reduce the costs of and/or/xor reductions (#148553 ) Since the costs were added the codegen for i8/i16 and/or/xor reductions has improved. This updates the cost model to produce the same costs in terms of number of instructions.	2025-07-16 09:59:36 +01:00
David Green	0967957d7a	[CostModel] Handle all cost kinds in getCmpSelInstrCost (#148233 ) Currently we always produce a cost of 1 for all CostKinds that are not RecipThroughput, which can underestimate the cost if the type has a higher legalization cost (like larger vectors). This relaxes it to cover all cost kinds.	2025-07-15 18:08:52 +01:00
Florian Hahn	02d3738be9	[AArch64,TTI] Remove RealUse check for vector insert/extract costs. (#146526 ) getVectorInstrCostHelper would return costs of zero for vector inserts/extracts that move data between GPR and vector registers, if there was no 'real' use, i.e. there was no corresponding existing instruction. This meant that passes like LoopVectorize and SLPVectorizer, which likely are the main users of the interface, would understimate the cost of insert/extracts that move data between GPR and vector registers, which has non-trivial costs. The patch removes the special case and only returns costs of zero for lane 0 if it there is no need to transfer between integer and vector registers. This impacts a number of SLP test, and most of them look like general improvements.I think the change should make things more accurate for any AArch64 target, but if not it could also just be Apple CPU specific. I am seeing +2% end-to-end improvements on SLP-heavy workloads. PR: https://github.com/llvm/llvm-project/pull/146526	2025-07-15 15:19:27 +01:00
David Green	58d79aaba6	[AArch64] Guard against non-simple types in udiv sve costs. (#148580 ) The code here probably needs to change to handle types more uniformly, but this patch prevents it from trying to use a simple type where it does not exist. Fixes #148438.	2025-07-15 10:25:08 +01:00
David Green	a647fd7dda	[AArch64] Add a cost for v2i32 vecreduce.add. These can lower to a addp. The score does not alter with this patch, but this should help keep the scores the same with #146526.	2025-07-13 08:06:10 +01:00
Luke Lau	7eb14d9dd1	[TTI] Fix value-based BasicTTIImpl vp.{gather,scatter} costing (#148020 ) After #147677 we now preserve value based costing for vp intrinsics instead of switching it to type based costing. However for vp.gather and vp.scatter, even though they fallback to their functionally equivalent masked.gather and masked.scatter, the number of arguments are different due to the alignment being a dedicated argument. This caused a crash detected at https://lab.llvm.org/staging/#/builders/210/builds/988 Thix fixes it by explicitly handling the two intrinsics and adding test coverage. Note that the type based costing isn't yet implemented for masked.gather/masked.scatter so it doesn't show up correctly.	2025-07-11 17:29:53 +08:00
Luke Lau	763db3841d	[TTI] Handle experimental.vp.reverse in BasicTTIImpl (#147868 ) A experimental_vp_reverse isn't exactly functionally the same as vector_reverse, so previously it wasn't getting picked up by the generic VP costing code that reuses the non-VP equivalents. But for costing purposes it's good enough so we can reuse it. The type-based cost for vector_reverse is still incorrect and should be fixed in another patch.	2025-07-10 22:10:02 +08:00
Luke Lau	1d8b51667a	[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent (#147677 ) Previously we only carried the type arguments which caused value-based costs to be inadvertantly changed into type-based costs. I'm just using vp.is.fpclass as an example intrinsic for now since the type based cost seems to differ from the value based cost, and most normal intrinsics e.g. min/max have the same value + type based cost. We still need to handle the cost properly for is.fpclass in a second patch. This is needed for an upcoming patch to handle the cost of llvm.experimental.vp.reverse which suffers from the same problem.	2025-07-10 19:41:49 +08:00
David Green	2052d7bf9a	[AArch64] Expand fcmp cost model tests. NFC	2025-07-10 12:13:35 +01:00
Luke Lau	da8d7f49ff	[RISCV] Unify non-vp and vp rounding intrinsic costing (#147872 ) Currently we have slightly different costing for the vp and non-vp version of the rounding intrinsics. We can delete this code and use the generic BasicTTIImpl code for the vp intrinsics which falls back to the non-vp versions. I'm not sure if the zvfh costing is correct, this should probably be fixed in a follow up patch. At the moment the non-vp cost is more important since it is what the loop vectorizer will use.	2025-07-10 15:46:05 +08:00
Elvis Wang	213735487e	[TTI] Check type legalization of both src and result for fpto{u\|s}i.sat. (#147657 ) For the cast instructions such ass `fptoui.sat`, `fptosi.sat`, need to check both type of the source and the result type can be lowering legally. If one of them is invalid, return invalid cost. -- Fixes https://github.com/llvm/llvm-project/issues/142973. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-10 14:44:26 +08:00
David Green	10f782456e	[AArch64] Enable other cost kinds for getCmpSelInstrCost. (#144375 ) This removes the CostKind == TCK_RecipThroughput limitation from getCmpSelInstrCost, allowing it to return more accurate costs for CodeSize and Lat / SizeLat. Especially for larger vectors under CodeSize, the returned costs are currently 1, not the legalization cost.	2025-07-10 07:12:21 +01:00
Luke Lau	4c7cfe3fdb	[RISCV] Remove intrinsic declares from costmodel tests. NFC Declaring an intrinsic is no longer needed these days, and for intrinsic tests we end up with a lot of them due to the various type overloads.	2025-07-08 18:31:24 +08:00
Ramkumar Ramachandra	1c8283a30c	[BasicTTIImpl] Add cost entries for ldexp, [l]lround (#146373 ) The ldexp intrinsic is incorrectly costed as 1, due to a missing entry in BasicTTIImpl::getTypeBasedIntrinsicCost: fix this. While at it, fix missing entries for [l]lround, which is costed as 1 anyway, making the change non-functional.	2025-07-07 09:26:30 +01:00
David Green	aa9e02cc10	[AArch64] Add lrint and lround costmodel tests. NFC This adds some costmodel tests for lrint, llrint, lround and llround.	2025-07-06 17:50:01 +01:00
David Green	f1549befc1	[AArch64] Add ldexp cost-model tests. NFC Add costmodel test coverage for ldexp. The codegen for SVE is not implemented yet. Costs should improve with #146373.	2025-07-06 16:42:47 +01:00
Graham Hunter	85bc868417	[AArch64][TTI] Reduce cost for splatting whole first vector segment (SVE) (#145701 ) Improve cost modeling for splatting the first 128b segment.	2025-07-02 09:51:56 +01:00
Ramkumar Ramachandra	1e2ddc8a3d	[CostModel/RISCV] Add tests for ldexp, [l]lround (#146108 )	2025-06-28 11:40:41 +01:00
Simon Pilgrim	08f074a59f	[TTI] getInstructionCost - consistently treat all undef/poison shuffle masks as free (#146039 ) #145920 exposed an issue where we were treating undef/poison shuffles as SK_Select kinds	2025-06-27 10:53:01 +01:00
Gheorghe-Teodor Bercea	3df36a2b18	[AMDGPU] Enable vectorization of i8 values. (#134934 ) This patch adjusts the cost model to account for the ability of the AMDGPU optimizer to group together i8 values into i32 values. Co-authored-by: Erich Keane <ekeane@nvidia.com>	2025-06-26 19:15:31 -04:00

1 2 3 4 5 ...

1963 Commits