llvm-project

Author	SHA1	Message	Date
Vladi Krapp	f8d270474c	[ARM] Reduce loop unroll when low overhead branching is available (#120065 ) For processors with low overhead branching (LOB), runtime unrolling the innermost loop is often detrimental to performance. In these cases the loop remainder gets unrolled into a series of compare-and-jump blocks, which in deeply nested loops get executed multiple times, negating the benefits of LOB. This is particularly noticable when the loop trip count of the innermost loop varies within the outer loop, such as in the case of triangular matrix decompositions. In these cases we will prefer to not unroll the innermost loop, with the intention for it to be executed as a low overhead loop.	2024-12-18 10:10:51 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Nashe Mncube	370fd74361	Revert "[llvm][ARM]Add widen global arrays pass" (#112701 ) Reverts llvm/llvm-project#107120 Unexpected build failures in post-commit pipelines. Needs investigation	2024-10-17 13:38:01 +01:00
Nashe Mncube	ab90d2793c	[llvm][ARM]Add widen global arrays pass (#107120 ) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works within GlobalOpt but is disabled by default on all targets except ARM.	2024-10-17 11:56:00 +01:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Philip Reames	d288574363	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824 ) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API.	2024-09-25 07:25:57 -07:00
Nikita Popov	a7697c8655	[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984 ) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCombine to improve that alignment. If desired I could also adjust the clang frontend to add alignment annotations equivalent to the previous behavior, but I don't see any indication that such an assumption is correct in the ARM intrinsics docs. Fixes https://github.com/llvm/llvm-project/issues/59081.	2024-09-05 09:26:53 +02:00
David Green	dcd246cbde	[ARM] Add scalar add_sat costs. (#100988 ) These can usually generate: - qadd / qsub for signed i32 scalars - uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned i8/i16 - Are expanded to an add + cmp + sel otherwise This can lead to differences in unrolling etc, but should be a better cost for the instructions.	2024-08-05 18:56:04 +01:00
David Green	ea7cc12f61	[ARM] Add fallback fptoi_sat costs. This makes sure that the custom operations get a fallback cost, even if they are not perfect.	2024-07-28 23:38:21 +01:00
Nikita Popov	11484cb817	[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits() This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future. Additionally this also marginally strengthens the analysis by retaining the original context instruction for one-use chains.	2024-07-01 12:41:21 +02:00
Andreas Jonson	34a2889e90	[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847 )	2024-06-19 18:17:44 +02:00
Graham Hunter	2e8d815596	[TTI] Support scalable offsets in getScalingFactorCost (#88113 ) Part of the work to support vscale-relative immediates in LSR.	2024-05-10 11:22:11 +01:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Alexey Bataev	7bc079c852	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837	2024-02-12 07:09:49 -05:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
David Green	fee2953f23	[ARM] Fix for undef elements from demanded elements (#70504 ) I think this is right, that the undef bits should be the undef bits from the passthrough (operand 0), with the top/bottom lanes cleared, as they come from the second arg (operand 1). We don't yet attempt to look for undef elements in the second operand, but this should fix the bug with all elements being marked as undef and the instruction being optimized away.	2023-11-02 14:28:40 +00:00
David Green	75b3c3d267	[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709 ) For MVE tail predicated loops, better code can be generated by keeping the loop whole than to unroll to an upper bound, which requires the expansion of active lane masks that can be difficult to generate good code for. This patch disables UpperBound unrolling when we find a active_lane_mask in the loop.	2023-10-31 09:51:30 +00:00
David Green	233fb987fc	[ARM] Improve bitwise reduction costs This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from there.	2023-09-04 16:22:52 +01:00
David Green	4cef24a886	[ARM] Improve reduction integer min/max costs This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add reductions, but include a higher cost as the costs tend to be dependant on the element size for vminv/vmaxv. These costs may not be precise, but will be more inline than the default that extracts each element.	2023-09-04 15:47:06 +01:00
David Green	2955cc15ff	[ARM] Improve costs for FMin/Max reductions Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the reduction, and otherwise need lanewise extract fro the top lanes.	2023-09-04 12:49:13 +01:00
David Green	4530f02916	[ARM] Improve reduction fadd/fmul costs This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for fp16 which may require extracting the top lanes. Differential Revision: https://reviews.llvm.org/D159367	2023-09-04 11:37:14 +01:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
Nikita Popov	9deee6bffa	[SDAG] Don't transfer !range metadata without !noundef to SDAG (PR64589) D141386 changed the semantics of !range metadata to return poison on violation. If !range is combined with !noundef, violation is immediate UB instead, matching the old semantics. In theory, these IR semantics should also carry over into SDAG. In practice, DAGCombine has at least one key transform that is invalid in the presence of poison, namely the conversion of logical and/or to bitwise and/or (`c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)`). Ideally, we would fix this transform, but this will require substantial work to avoid codegen regressions. In the meantime, avoid transferring !range metadata without !noundef, effectively restoring the old !range metadata semantics on the SDAG layer. Fixes https://github.com/llvm/llvm-project/issues/64589. Differential Revision: https://reviews.llvm.org/D157685	2023-08-14 09:04:27 +02:00
Job Noorman	8de9f2b558	Move SubtargetFeature.h from MC to TargetParser SubtargetFeature.h is currently part of MC while it doesn't depend on anything in MC. Since some LLVM components might have the need to work with target features without necessarily needing MC, it might be worthwhile to move SubtargetFeature.h to a different location. This will reduce the dependencies of said components. Note that I choose TargetParser as the destination because that's where Triple lives and SubtargetFeatures feels related to that. This issues came up during a JITLink review (D149522). JITLink would like to avoid a dependency on MC while still needing to store target features. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D150549	2023-06-26 11:20:08 +02:00
NAKAMURA Takumi	c1221251fb	Restore CodeGen/MachineValueType.h from `Support` This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024	2023-05-03 00:13:20 +09:00
Simon Pilgrim	fb8038db73	[TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).	2023-04-13 11:26:28 +01:00
David Green	a39d2d50af	[ARM] Increase the Scalarized cost of masked gather/scatter operations If a gather/scatter is masked and will need to be scalarized then the cost should be higher than we currently produce. An additional cost for scalarizing the mask, extracting i1s and branching on the result needs to be added, which this patch gives a cost of 5. Differential Revision: https://reviews.llvm.org/D147331	2023-04-11 14:49:46 +01:00
David Sherwood	b4089cfa2f	[NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface Given just how many arguments we pass to preferPredicateOverEpilogue and considering this list may grow over time I've decided to pass in a pointer to a new TailFoldingInfo structure instead, similar to what we do with IntrinsicCostAttributes, etc. In addition, many of the arguments we pass in are actually available in the LoopVectorizationLegality class so I've managed to reduce the set of pointers that we need to pass in the TailFoldingInfo struct. Differential Revision: https://reviews.llvm.org/D146127	2023-04-04 14:00:49 +00:00
Sander de Smalen	c41b41eb11	[LoopVectorize] Use overflow-check analysis to improve tail-folding. This work follows on from D142109 and addresses a possible regression when we know the loop iteration counter cannot overflow. When we know the overflow-check always evaluates to false, it's better to use the other style of tail folding where it assumes a runtime check was added, because that avoids having to calculate a modified trip-count. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D142894	2023-03-01 14:17:58 +00:00
Simon Tatham	a8cd35c3b7	[LowerTypeTests] Support generating Armv6-M jump tables. (reland) [Originally committed as f6ddf7781471b71243fa3c3ae7c93073f95c7dff; reverted in bbef38352fbade9e014ec97d5991da5dee306da7 due to test breakage; now relanded with the Arm tests conditioned on `arm-registered-target`] The LowerTypeTests pass emits a jump table in the form of an `inlineasm` IR node containing a string representation of some assembly. It tests the target triple to see what architecture it should be generating assembly for. But that's not good enough for `Triple::thumb`, because the 32-bit PC-relative `b.w` branch instruction isn't available in all supported architecture versions. In particular, Armv6-M doesn't support that instruction (although the similar Armv8-M Baseline does). Most of this patch is concerned with working out whether the compilation target is Armv6-M or not, which I'm doing by going through all the functions in the module, retrieving a TargetTransformInfo for each one, and querying it via a new method I've added to check its SubtargetInfo. If any function's TTI indicates that it's targeting an architecture supporting B.W, then we assume we're also allowed to use B.W in the jump table. The Armv6-M compatible jump table format requires a temporary register, and therefore also has to use the stack in order to restore that register. Another consequence of this change is that jump tables on Arm/Thumb are no longer always the same size. In particular, on an architecture that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb tables are different sizes from //each other//. As a consequence, ``getJumpTableEntrySize`` can no longer base its answer on the target triple's architecture: it has to take into account the decision that ``selectJumpTableArmEncoding`` made, which meant I had to move that function to an earlier point in the code and store its answer in the ``LowerTypeTestsModule`` class. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D143576	2023-02-20 10:46:47 +00:00
Kazu Hirata	397265d88f	[llvm] Use APInt::isAllOnes instead of isAllOnesValue (NFC) Note that isAllOnesValue has been soft-deprecated in favor of isAllOnes.	2023-02-19 23:35:39 -08:00
Simon Tatham	bbef38352f	Revert "[LowerTypeTests] Support generating Armv6-M jump tables." This reverts commit f6ddf7781471b71243fa3c3ae7c93073f95c7dff. Eight buildbots reported that the two test files changed by that commit had started failing. The buildbots in question all had in common that they build with a very restricted `LLVM_TARGETS_TO_BUILD`, such as only X86 or AArch64 or Hexagon. I didn't notice this before commit because my own build has the full default set of targets, and in that circumstance, the tests pass. I assume the problem has something to do with the attempt to query TargetTransformInfo: if you can't make a valid TTI for the target triple then you can't ask it what kind of inline assembler you should be emitting, and so `opt` without the Arm backend can't get the Arm cases of these tests right. I don't have time to fix this until next week, so I'll revert the change for now to keep the buildbots happy.	2023-02-16 17:11:06 +00:00
Simon Tatham	f6ddf77814	[LowerTypeTests] Support generating Armv6-M jump tables. The LowerTypeTests pass emits a jump table in the form of an `inlineasm` IR node containing a string representation of some assembly. It tests the target triple to see what architecture it should be generating assembly for. But that's not good enough for `Triple::thumb`, because the 32-bit PC-relative `b.w` branch instruction isn't available in all supported architecture versions. In particular, Armv6-M doesn't support that instruction (although the similar Armv8-M Baseline does). Most of this patch is concerned with working out whether the compilation target is Armv6-M or not, which I'm doing by going through all the functions in the module, retrieving a TargetTransformInfo for each one, and querying it via a new method I've added to check its SubtargetInfo. If any function's TTI indicates that it's targeting an architecture supporting B.W, then we assume we're also allowed to use B.W in the jump table. The Armv6-M compatible jump table format requires a temporary register, and therefore also has to use the stack in order to restore that register. Another consequence of this change is that jump tables on Arm/Thumb are no longer always the same size. In particular, on an architecture that supports Arm and Thumb-1 but not Thumb-2, the Arm and Thumb tables are different sizes from //each other//. As a consequence, ``getJumpTableEntrySize`` can no longer base its answer on the target triple's architecture: it has to take into account the decision that ``selectJumpTableArmEncoding`` made, which meant I had to move that function to an earlier point in the code and store its answer in the ``LowerTypeTestsModule`` class. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D143576	2023-02-16 15:34:49 +00:00
Sander de Smalen	005311399e	[LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style. This NFC (intended) patch has several small changes: * It renames PredicationStyle to TailFoldingStyle. * It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle() * Simplifies some of its uses in the LoopVectorizer Rationale: To my surprise PredicationStyle::None did not mean 'no predication', but rather 'no active lane mask intrinsic', such that the predicate is created using a splat + compare with stepvector. The enum is also highly specific to tail folding, so it seems better to name this around that feature, i.e. 'tail folding style'. This also makes it more amenable to extend it to other tail folding styles, such as the one added in D142109. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D142887	2023-02-03 14:59:57 +00:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Krzysztof Parzyszek	86fe4dfdb6	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".	2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek	4e12d1836a	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.	2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek	b83711248c	TargetTransformInfo: convert Optional to std::optional	2022-12-02 11:27:12 -08:00
David Green	de6dfbbb30	[ARM] Fix for MVE i128 vector icmp costs. We were hitting an assert as the legalied type needn't be a vector. Fixes #58364	2022-10-14 18:49:25 +01:00
Philip Reames	f6d110e26f	[LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc] This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.	2022-09-27 15:55:44 -07:00
Kazu Hirata	2833760c57	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 17:35:09 -07:00
Philip Reames	df20ff9ae2	[TTI] Kill last couple uses of OperandValueKind in targets [nfc] Use the accessor methods on the containing class instead so that we can change the representation.	2022-08-23 08:54:41 -07:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Alexey Bataev	d53e245951	[COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC. Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to better estimate cost with immediate values. Part of D126885.	2022-08-19 07:33:00 -07:00

1 2 3 4 5 ...

351 Commits