llvm-project

Author	SHA1	Message	Date
Kevin P. Neal	9c9f94063c	[FPEnv][CostModel] Correct strictfp test. Correct strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to some function definitions. Test changes verified with D146845.	2024-04-02 13:53:56 -04:00
David Green	7433120137	[CostModel] Mark ssa_copy as free (#75294 ) These are intrinsics are only used ephemerally and be should be given a zero cost.	2023-12-13 11:24:47 +00:00
David Green	b003fed283	[CostModel] Add some ssa.copy costmodel tests. NFC	2023-12-13 07:26:17 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
David Green	233fb987fc	[ARM] Improve bitwise reduction costs This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from there.	2023-09-04 16:22:52 +01:00
David Green	4cef24a886	[ARM] Improve reduction integer min/max costs This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add reductions, but include a higher cost as the costs tend to be dependant on the element size for vminv/vmaxv. These costs may not be precise, but will be more inline than the default that extracts each element.	2023-09-04 15:47:06 +01:00
David Green	2955cc15ff	[ARM] Improve costs for FMin/Max reductions Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the reduction, and otherwise need lanewise extract fro the top lanes.	2023-09-04 12:49:13 +01:00
David Green	4530f02916	[ARM] Improve reduction fadd/fmul costs This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for fp16 which may require extracting the top lanes. Differential Revision: https://reviews.llvm.org/D159367	2023-09-04 11:37:14 +01:00
David Green	5afb161ed5	[ARM] Add various vector reduce costmodel tests. NFC See D159367 and the followups.	2023-09-04 10:50:58 +01:00
David Green	12025cef3e	[CostModel] Use min/max intrinsics for vecreduce.min/max costs This changes the costmodelling of the vecreduce.min/max nodes to use the costs of the relevant min/max intrinsics instead of expanding them to compare and selects. The getMinMaxReductionCost have changed to take a Opcode for the relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are no longer needed. A follow up patch will add some basic fminimum/fmaximum costmodelling. Differential Revision: https://reviews.llvm.org/D153547	2023-07-04 15:02:30 +01:00
Luke Lau	a68dcd09e8	[TTI] Use users of GEP to guess access type in getGEPCost Currently getGEPCost uses the target type of the GEP as a heuristic for the type that will be accessed, to pass onto isLegalAddressingMode. Targets use this to work out if a GEP can then be folded into the load/store instruction that uses the GEP. For example, on RISC-V loads and stores can have an offset added to a base register folded into a single instruction, so the following GEP is free: %p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0 %x = load i32, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ lw t0, a0(42) However vector loads and stores cannot have an offset folded into them, so the following GEP is costed: %p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1 %x = load <2 x i32>, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ addi a0, 42 vle32 v8, (a0) The issue arises whenever there is a mismatch between the target type of the GEP and the type that is actually accessed: %p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0 %x = load <2 x i32>, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ addi a0, 42 vle32 v8, (a0) Even though this GEP will result in an add instruction, because TTI thinks it's loading an i32, it will think it can be folded and not charge for it. The target type can become mismatched with the memory access during transformations, noticeably during SLP where a scalar base pointer will be reused to perform a vector load or store. This patch adds an optional AccessType argument to getGEPCost which allows the type of memory accessed by users to be passed in as a hint, so that we can more accurately determine if the GEP can be folded into its users. If AccessType is not provided, getGEPCost falls back to the old behaviour of using the PointeeType to guess the memory access type. This can be revisited in a later patch. Also for now, only GEPs with exactly one user use the access type hint. Whilst we could look through all users and use all access types to determine if we can fold the GEP, this patch avoids doing so to prevent O(N) behaviour. Differential Revision: https://reviews.llvm.org/D149889	2023-06-29 13:44:37 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
David Green	a39d2d50af	[ARM] Increase the Scalarized cost of masked gather/scatter operations If a gather/scatter is masked and will need to be scalarized then the cost should be higher than we currently produce. An additional cost for scalarizing the mask, extracting i1s and branching on the result needs to be added, which this patch gives a cost of 5. Differential Revision: https://reviews.llvm.org/D147331	2023-04-11 14:49:46 +01:00
Nikita Popov	68c50b111d	[CostModel] Convert some tests to opaque pointers (NFC)	2022-12-15 09:50:34 +01:00
David Green	de6dfbbb30	[ARM] Fix for MVE i128 vector icmp costs. We were hitting an assert as the legalied type needn't be a vector. Fixes #58364	2022-10-14 18:49:25 +01:00
Simon Pilgrim	fdec50182d	[CostModel] Replace getUserCost with getInstructionCost * Replace getUserCost with getInstructionCost, covering all cost kinds. * Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks. Original Patch by @samparker (Sam Parker) Differential Revision: https://reviews.llvm.org/D79483	2022-08-18 11:55:23 +01:00
Simon Pilgrim	4178e33470	[CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows DOS really doesn't like `` quotes to be used in command lines Some prep work as I'm intending to resurrect D79483 soon	2022-08-10 17:54:06 +01:00
David Green	0a11ad2aa8	[ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present. If MVE.fp is not present then we cannot select the vector i1 fp operations to VCMP instructions, so need to expand.	2022-07-11 13:03:30 +01:00
David Green	438ffdb821	[ARM] Switch the costs of mve1beat and mve4beat These three subtarget features are meant to control where MVE instructions take 1 vs 2 vs 4 architectural beats. The mve1beat feature is described as "Model MVE instructions as a 1 beat per tick architecture", meaning MVE instruction will execute over 4 cycles. mve4beat is the opposite where the entire 4 beats of the MVE instruction execute in a single cycle. The costs for the two were backwards though, not matching the cycle counts like they should. This patch switches the costs on the two to bring them in-line with expectations. Differential Revision: https://reviews.llvm.org/D129141	2022-07-07 16:10:00 +01:00
David Green	53be6ab25c	[ARM] Fix MVE getShuffleCost legalized type check The MVE shuffle costing for VREV instructions was making incorrect assumptions as to legalized vector types remaining as vectors. Add a quick check to ensure they are indeed vectors before attempting to get the number of elements.	2022-06-07 14:36:04 +01:00
David Green	b4dd9fc370	[ARM] Cost modelling for MVE vector fptoi_sat Building on top of D125665, this adds MVE costs for fptosi.sat and fptoui.sat, providing MVE is available and the types are legal. Differential Revision: https://reviews.llvm.org/D125666	2022-05-20 11:00:34 +01:00
David Green	80aab0312a	[ARM] Cost modelling for scalar fptoi_sat Similar to D124357, this adds some cost modelling for fptoi_sat for Arm targets. Where VFP2 is available (and FP64/FP16 for the relevant types), the operations are legal as the Arm instructions naturally saturate. Otherwise they will need an extra smin/smax clamp, similar to AArch64. Differential Revision: https://reviews.llvm.org/D125665	2022-05-19 19:53:21 +01:00
David Green	4a8c13a6f4	[CostModel] Add basic fptoi_sat costs This adds some basic fptosi_sat and fptoui_sat target independent cost modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the value, followed by a fp convert. The signed values then have an additional fcmp+select for handling Nan correctly. The AArch64/Arm costs may be more incorrect, as the instruction exist natively. This can be fixed with target specific cost updates. Differential Revision: https://reviews.llvm.org/D124269	2022-04-27 09:30:00 +01:00
David Green	1159984802	[CostModel] Add fptoi_sat costmodel tests. NFC	2022-04-25 18:44:35 +01:00
David Sherwood	e7b89c2fc3	Add BasicTTIImpl cost model for llvm.get.active.lane.mask intrinsic The vectoriser sometimes generates predicated vector loops using the llvm.get.active.lane.mask intrinsic so it's important that we are able to calculate a valid cost for the call instruction. When SVE is enabled we are able to use a single whilelo instruction for some vector types - in such cases I've marked the cost as 1. For all other cases I've set the cost according to how the intrinsic will be expanded. Tests added here: Analysis/CostModel/AArch64/sve-intrinsics.ll Analysis/CostModel/ARM/active_lane_mask.ll Analysis/CostModel/RISCV/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D121109	2022-03-14 09:35:05 +00:00
Arthur Eubanks	15ba588d6d	[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'	2022-02-09 15:42:16 -08:00
Andrew Litteken	4ff4e7ea30	[CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so. This did not show any changes in CTMark code size benchmarks. Reviewers: paquette, samparker, dmgreen Differential Revision: https://reviews.llvm.org/D109388	2022-01-12 18:03:50 -06:00
David Green	255ad73424	[ARM] Make MVE v2i1 predicates legal MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the two halves. This was never treated as a legal type in llvm in the past as there are not many 64bit instructions and no 64bit compares. There are a few instructions that could use it though, notably a VSELECT (as it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for similar reasons, some gathers/scatter and long multiplies and VCTP64 instructions. This patch goes through and makes v2i1 a legal type, handling all the cases that fall out of that. It also makes VSELECT legal for v2i64 as a side benefit. A lot of the codegen changes as a result - usually in way that is a little better or a little worse, but still expensive. Costs can change a little too in the process, again in a way that expensive things remain expensive. A lot of the tests that changed are mainly to ensure correctness - the code can hopefully be improved in the future where it comes up in practice. The intrinsics currently remain using the v4i1 they previously did to emulate a v2i1. This will be changed in a followup patch but this one was already large enough. Differential Revision: https://reviews.llvm.org/D114449	2021-12-03 14:05:41 +00:00
Zarko Todorovski	7f7dac7126	[NFC][llvm] Inclusive language: reword uses of sanity test and check Part of continuing work to use more inclusive language. Reworded uses of sanity check and sanity test in llvm/test/	2021-11-25 07:21:42 -05:00
David Green	309f1e4ac8	[ARM] Add datalayout to costmodel tests. NFC This adds a sensible datalayout to the ARM cost model tests, to prevent the costs reported being incorrect for the size of pointers.	2021-11-16 09:49:42 +00:00
Simon Pilgrim	7bd097fd1e	[CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values. The LShr is performed on the extended type as we use it to truncate+extract the upper/hi bits of the extended multiply. More closely matches the default expansion from TargetLowering::expandMULO	2021-10-06 19:11:32 +01:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Simon Pilgrim	7397dcb403	[TTI] Add basic SK_InsertSubvector shuffle mask recognition This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188 This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion. The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic. The widening case will be addressed in a follow up patch that treats the cost as 0. Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors. Differential Revision: https://reviews.llvm.org/D107228	2021-08-02 11:23:44 +01:00
Sander de Smalen	97215fe3f4	[CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand. The Legalizer expands the operations of urem/srem into a div+mul+sub or divrem when those are legal/custom. This patch changes the cost-model to reflect that cost. Since there is no 'divrem' Instruction in LLVM IR, the cost of divrem is assumed to be the same as div+mul+sub since the three operations will need to be executed at runtime regardless. Patch co-authored by David Sherwood (@david-arm) Reviewed By: RKSimon, paulwalker-arm Differential Revision: https://reviews.llvm.org/D103799	2021-07-07 14:40:28 +01:00
Sander de Smalen	4ca860742d	[InstructionCost] Don't conflate Invalid costs with Unknown costs. We previously made a change to getUserCost to return a Invalid cost when one of the TTI costs returned '-1' (meaning 'unknown' or 'infinitely expensive'). It makes no sense to say that: shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3> has an invalid cost. Perhaps the cost is not known, but the IR is valid and can be code-generated. Invalid should only be used for IR that cannot possibly be code-generated and where a cost is nonsensical. With more passes now asserting that the cost must be valid, it is possible that those assertions will fail for perfectly valid IR. An incomplete cost-model probably shouldn't be a reason for the compiler to break. It's better to consider these costs as 'very expensive' and ignore them for other reasons. At some point, we should consider replacing -1 with some other mechanism. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D99502	2021-03-30 09:29:42 +01:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
Alexey Bataev	14ae0cf0f5	[Cost]Canonicalize the cost for logical or/and reductions. The generic cost of logical or/and reductions should be cost of bitcast <ReduxWidth x i1> to iReduxWidth + cmp eq\|ne iReduxWidth. Differential Revision: https://reviews.llvm.org/D97961	2021-03-19 11:01:58 -07:00
David Green	35e0567d58	[ARM] Add VREV MVE shuffle costs This uses the shuffle mask cost from D98206 to give a better cost of MVE VREV instructions. This helps especially in VectorCombine where the cost of shuffles is used to reorder bitcasts, which this helps keep the phase ordering test for fp16 reductions producing optimal code. The isVREVMask has been moved to a header file to allow it to be used across target transform and isel lowering. Differential Revision: https://reviews.llvm.org/D98210	2021-03-17 21:21:43 +00:00
Alexey Bataev	60470ac7ff	[Cost]Add tests for boolean and/or reductions, NFC. Tests with the default costs for boolean and/or reductions. Differential Revision: https://reviews.llvm.org/D97793	2021-03-03 12:34:30 -08:00
Juneyoung Lee	c89d9d8a48	[TTI] Consider select form of and/or i1 as having arithmetic cost This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b` as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`. Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe. This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked. These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97360	2021-03-02 02:18:19 +09:00
David Green	33ba220611	[ARM] Ensure types provided to getIntrinsicCost are valid It appears that pointer types were causing issues for the min/max cost code in getIntrinsicInstrCost. This makes sure that when matching icmp/select to a min/max, we only do that for normal int or float types.	2021-02-18 14:00:23 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	1fbb3287fc	[ARM] MVE ICmp costing tests. NFC	2021-02-18 10:50:34 +00:00
David Green	6d835c5fcd	[ARM] Add MVE abs costs Similar to min/max, this increases the accuracy of abs intrinsics costs under MVE.	2021-02-17 14:21:09 +00:00
David Green	415deff10b	[ARM] MVE abs intrinsic costs. NFC	2021-02-17 13:54:17 +00:00
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
David Green	6abe362ed7	[ARM] Fix duplicate fdiv tests, changing them to frem. NFC	2021-02-13 15:16:11 +00:00
David Green	7c2e061188	[ARM] Extra vector shuffle tests of various kinds. NFC	2021-02-13 15:03:10 +00:00
David Green	b7c3de8d5a	[ARM] MVE min/max cost tests. NFC	2021-02-13 11:12:12 +00:00
Sander de Smalen	63d787e5d4	[CostModel] An extending load to illegal type is not free. COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when <4 x i64> is an illegal result type that requires splitting of the operation. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96250	2021-02-12 07:59:21 +00:00

1 2 3 4

165 Commits