llvm-project

Author	SHA1	Message	Date
Osama Abdelkader	aad7259ff6	[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030 ) This change improves memset code generation for non-zero values on AArch64 by using NEON's DUP instruction instead of the less efficient multiplication with 0x01010101 pattern. For small sizes, the value is extracted from a larger DUP. For non-power-of-two sizes, overlapping stores are used in some cases. TargetLowering::findOptimalMemOpLowering is modified to allow explicitly specifying the size of the constant in cases where the constant is larger than the store operations. Fixes #165949	2026-01-29 13:03:38 -08:00
Florian Hahn	b794baf8e7	[TTI] Add VectorInstrContext for context-aware insert/extract costs. (#175982 ) This commit introduces the VectorInstrContext (VIC) infrastructure to improve cost estimates for insert/extracts based on the context instruction in which the insert/extract is used. This is similar to CastContextHint, and allows providing context on how the insert/extract is going to be used before creating IR. This is useful in the LoopVectorizer, where costs need to estimated before creating IR. The new hint currently only replaces an existing check in AArch64, but new uses will be introduced in follow-ups, including https://github.com/llvm/llvm-project/pull/177201. PR: https://github.com/llvm/llvm-project/pull/175982	2026-01-27 16:30:29 +00:00
Croose	fab06fae00	[ARM] Fix inlining issue in ARM (#169337 ) There is an issue on ARM where a function wont be inlined due to mismatching target features between caller and callee. The caller has `HasV8Ops` and `FeatureDotProd` and the callee does not, but AFAIK this should not be a problem. https://godbolt.org/z/f19h3zT66 is an example showing how the call is not inlined on armv7. The expected asm output would be something like: ```asm .fnstart vsdot.s8 q0, q1, d4[0] bx lr .Lfunc_end0: ``` Thanks to @Amichaxx we managed to narrow it down and now can resolve this problem by adding `ARM::FeatureDotProd, ARM::HasV8Ops` to InlineFeaturesAllowed in llvm/lib/Target/ARM/ARMTargetTransformInfo.h, after which the inlining occurs successfully. Whilst we're at it we have also added some debugging to make it easier to tell why (or why not) a function is being inlined for ARM, and a couple other features that seem to be missing from the list. This patch was motivated by an issue experienced with rust that was traced back to llvm, and thus was designed to address that.	2026-01-16 12:06:20 +00:00
Shih-Po Hung	c2409b4bca	[TTI] Remove masked/gather-scatter/strided/expand-compress costing from TTIImpl (#169885 ) Following #165532, this patch moves scalarization‑cost computation into BaseT::getMemIntrinsicCost and lets backends override it via their getMemIntrinsicCost. It also removes the masked/gather‑scatter/strided/expand‑compress costing interfaces from TTIImpl. Targets may keep them locally if needed. Stacked on #170426 and #170436.	2025-12-04 01:34:29 +00:00
Shih-Po Hung	1c86f4a8f1	[TTI] Use MemIntrinsicCostAttributes for getGatherScatterOpCost (#168650 ) - Following #168029. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask, - Alignment, CostKind, Inst); + InstructionCost getGatherScatterOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with same information as before.	2025-12-03 03:01:35 +00:00
David Green	f741851731	Revert "[AArch64][ARM] Move ARM-specific InstCombine transforms into `Transforms/Utils` (#169589 )" This reverts commit 1c32b6f51ccaaf9c65be11d7dca9e5a476cddb5a due to failures on BUILD_SHARED_LIBS builds.	2025-12-02 11:46:50 +00:00
valadaptive	1c32b6f51c	[AArch64][ARM] Move ARM-specific InstCombine transforms into `Transforms/Utils` (#169589 ) Back when `TargetTransformInfo::instCombineIntrinsic` was added in https://reviews.llvm.org/D81728, several transforms common to both ARM and AArch64 were kept in the non-target-specific `InstCombineCalls.cpp` so they could be shared between the two targets. I want to extend the transform of the `tbl` intrinsics into static `shufflevector`s in a similar manner to https://github.com/llvm/llvm-project/pull/169110 (right now it only works with a 64-bit `tbl1`, but `shufflevector` should allow it to work with up to 2 operands, and it can definitely work with 128-bit vectors). I think separating out the transform into a TTI hook is a prerequisite. ~~I'm not happy about creating an entirely new module for this and having to wire it up through CMake and everything, but I'm not sure about the alternatives. If any maintainers can think of a cleaner way of doing this, I'm very open to it.~~ I've moved the transforms into `Transforms/Utils/ARMCommonInstCombineIntrinsic.cpp`, which is a lot simpler.	2025-12-02 11:17:12 +00:00
Vladi Krapp	34c699246d	[Arm] Control forced unrolling of small loops (#170127 ) * Add flag to control cost threshold for forced unrolling of loops. Existing value preserved as default.	2025-12-02 08:39:26 +00:00
Drew Kersnar	17852deda7	[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387 ) This backend support will allow the LoadStoreVectorizer, in certain cases, to fill in gaps when creating load/store vectors and generate LLVM masked load/stores (https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To accomplish this, changes are separated into two parts. This first part has the backend lowering and TTI changes, and a follow up PR will have the LSV generate these intrinsics: https://github.com/llvm/llvm-project/pull/159388. In this backend change, Masked Loads get lowered to PTX with `#pragma "used_bytes_mask" [mask];` (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). And Masked Stores get lowered to PTX using the new sink symbol syntax (https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st). # TTI Changes TTI changes are needed because NVPTX only supports masked loads/stores with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to check that the mask is constant and pass that result into the TTI check. Behavior shouldn't change for non-NVPTX targets, which do not care whether the mask is variable or constant when determining legality, but all TTI files that implement these API need to be updated. # Masked store lowering implementation details If the masked stores make it to the NVPTX backend without being scalarized, they are handled by the following: * `NVPTXISelLowering.cpp` - Sets up a custom operation action and handles it in lowerMSTORE. Similar handling to normal store vectors, except we read the mask and place a sentinel register `$noreg` in each position where the mask reads as false. For example, ``` t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1> t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10 -> STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1); ``` * `NVPTXInstInfo.td` - changes the definition of store vectors to allow for a mix of sink symbols and registers. * `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_". # Masked load lowering implementation details Masked loads are routed to normal PTX loads, with one difference: a `#pragma "used_bytes_mask"` is emitted before the load instruction (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). To accomplish this, a new operand is added to every NVPTXISD Load type representing this mask. * `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal NVPTXISD loads with a mask operand in two ways. 1) In type legalization through replaceLoadVector, which is the normal path, and 2) through LowerMLOAD, to handle the legal vector types (v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both share the same convertMLOADToLoadWithUsedBytesMask helper. Both default this operand to UINT32_MAX, representing all bytes on. For the latter, we need a new `NVPTXISD::MLoadV1` type to represent that edge case because we cannot put the used bytes mask operand on a generic LoadSDNode. * `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them to created machine instructions. * `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask isn't all ones. * `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update manual indexing of load operands to account for new operand. * `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to the MI definitions. * `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get tagged as invariant. Some generic changes that are needed: * `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting masked loads. * `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked load SDNode creation	2025-11-25 10:26:15 -06:00
Shih-Po Hung	961940e1a7	[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029 ) - Split from #165532. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment, - AddressSpace, CostKind); + InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with the same information as before. - Follow-up: migrate gather/scatter, strided, and expand/compress cost queries to the same attributes-based entry point.	2025-11-19 09:51:12 +08:00
Florian Hahn	af9a4263a1	[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445 ) Update isNoWrap to only use the inbounds/nusw flags from GEPs that are guaranteed to be dereferenced on every iteration. This fixes a case where we incorrectly determine no dependence. I think the issue is isolated to code that evaluates the resulting AddRec at BTC, just using it to compute the distance between accesses should still be fine; if the access does not execute in a given iteration, there's no dependence in that iteration. But isolating the code is not straight-forward, so be conservative for now. The practical impact should be very minor (only one loop changed across a corpus with 27k modules from large C/C++ workloads. Fixes https://github.com/llvm/llvm-project/issues/160912. PR: https://github.com/llvm/llvm-project/pull/161445	2025-11-04 17:08:12 +00:00
Sam Tebbs	37127f74f4	[LV] Bundle sub reductions into VPExpressionRecipe (#147255 ) This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. -> https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/147302 4. https://github.com/llvm/llvm-project/pull/147513	2025-09-01 17:25:01 +01:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
David Green	d9d9d9ad19	[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304 ) LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as store(interleave-shuffle). Whilst the shuffle would be expensive in general for MVE (it does not have zip/uzp instructions), it should be treated as cheap when part of the LD2/ST2 pattern. This borrows some code from the AArch64 backed to produce lower costs. (Some of which still shows as higher than it should - that just shows how broken the generic shuffle costs are at the moment, they would be lower if getShuffleCost was called directly as opposed to going through getInstructionCost).	2025-08-14 06:59:37 +01:00
Luke Lau	acb86fb9e0	[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657 ) In some places we were passing the type of value being accessed, in other cases we were passing the type of the pointer for the access. The most "involved" user is LoopVectorizationCostModel::getMemInstScalarizationCost, which is the only call site that passes in the SCEV, and it passes along the pointer type. This changes call sites to consistently pass the pointer type, and renames the arguments to clarify this. No target actually checks the contents of the type passed, only to see if it's a vector or not, so this shouldn't have an effect.	2025-08-11 18:00:12 +08:00
David Green	9d0ac3980d	[ARM] Use CostKind in getShuffleCost getMVEVectorCostFactor. These calls pre-date CostKind being added to getShuffleCost in 5263155d5be64b435a97fd4fa12f7f0aa97f88a8.	2025-07-13 08:05:51 +01:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
David Green	77941eba7f	[CostModel] Add a DstTy to getShuffleCost (#141634 ) A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.	2025-06-21 12:29:29 +01:00
Mel Chen	688bccb290	[TTI][LV] Simplify the prototype of preferPredicatedReductionSelect. nfc (#139265 )	2025-05-12 17:24:37 +08:00
Harald van Dijk	32752913b1	[ARM] Do not assume memory intrinsics specify alignment. (#138356 )	2025-05-07 16:25:03 +01:00
Kazu Hirata	d144c13ae5	[Target] Remove unused local variables (NFC) (#138443 )	2025-05-04 07:56:38 -07:00
David Green	abd2c07e39	[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631 ) This does not alter much at the moment, but allows const pointers to be passed as Op0 and Op1, simplifying later patches	2025-05-01 15:55:08 +01:00
Sergei Barannikov	3334c3597d	[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655 ) These are not diagnosed because implementations hide the methods of the base class rather than overriding them. This works as long as a hiding function is callable with the same arguments as the same function from the base class. Pull Request: https://github.com/llvm/llvm-project/pull/136655	2025-04-22 11:40:12 +03:00
Sergei Barannikov	0014b49482	[TTI] Make all interface methods const (NFCI) (#136598 ) Making `TargetTransformInfo::Model::Impl` `const` makes sure all interface methods are `const`, in `BasicTTIImpl`, its bases, and in all derived classes. Pull Request: https://github.com/llvm/llvm-project/pull/136598	2025-04-22 06:27:29 +03:00
Sergei Barannikov	e0c1e23b99	[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575 ) The main change is making `thisT` method `const`, the rest of the changes is fixing compilation errors (). () There are two tricky methods, `getVectorInstrCost()` and `getIntImmCost()`. They have several overloads; some of these overloads are typically pulled in to derived classes using the `using` directive, and then hidden by methods in the derived class. The compiler does not complain if the hiding methods are not marked as `const`, which means that clients will use the methods from the base class. If after this change your target fails cost model tests, this must be the reason. To resolve the issue you need to make all hiding overloads `const`. See the second commit in this PR. Pull Request: https://github.com/llvm/llvm-project/pull/136575	2025-04-21 21:42:40 +03:00
Mel Chen	409df9f74c	[TTI][LV] Change the prototype of preferInLoopReduction. nfc (#132698 ) This patch changes the preferInLoopReduction function to take a RecurKind instead of an unsigned Opcode. This makes it possible to distinguish non-arithmetic reductions such as min/max, AnyOf, and FindLastIV, and also helps unify IAnyOf with FAnyOf and IFindLastIV with FFindLastIV. Related patch #118393 #131830	2025-04-07 19:10:16 +08:00
Krzysztof Drewniak	554859c736	[TTI] Make isLegalMasked{Load,Store} take an address space (#134006 ) In order to facilitate targets that only support masked loads/stores on certain address spaces (AMDGPU will support them in an upcoming patch, but only for address space 7), add an AddressSpace parameter to isLegalMaskedLoad and isLegalMaskedStore	2025-04-02 15:38:10 -05:00
Nashe Mncube	29925b7044	[NFC][CostModel][ARM] Remove redundant lambda capture (#132018 ) Buildbot failures were caused by PR #122713. This was due to unused captures in a lambda function.	2025-03-19 13:52:53 +00:00
Nashe Mncube	4ddc8df6ca	[CostModel][ARM]Adjust cost of muls in (U/S)MLAL and patterns (#122713 ) PR #117350 made changes to the SLP vectorizer which introduced a regression on some ARM benchmarks. Investigation narrowed it down to suboptimal codegen for benchmarks that previously only used scalar (U/S)MLAL instructions. The linked change meant the SLPVectorizer thought that these could be vectorized. This change makes the cost of muls in (U/S)MLAL patterns slightly cheaper to make sure scalar instructions are preferred in these cases over SLP vectorization on targets supporting DSP	2025-03-19 12:25:44 +00:00
Elvis Wang	6dba5f6595	[TTI] Align optional FMFs in getExtendedReductionCost() to getArithmeticReductionCost(). (#131968 ) In the implementation of the getExtendedReductionCost(), it ofter calls getArithmeticReductionCost() with FMFs. But we shouldn't call getArithmeticReductionCost() with FMFs for non-floating-point reductions which will return the wrong cost. This patch makes FMFs in getExtendedReductionCost() optional and align to the getArithmeticReductionCost(). So the TTI will return the correct cost for non-FP extended-reductions query without FMFs. This patch is not quite NFC but it's hard to test from the CostModel side. Split from #113903.	2025-03-19 18:53:38 +08:00
Craig Topper	39c454af01	[TTI] getScalingFactorCost should return InstructionCost::getInvalid() instead of -1. (#129802 ) Historically this function return an int with negative values meaning invalid. It was migrated to InstructionCost in 43ace8b5ce07a, but the code was not updated to return invalid cost instead of -1. In that commit, the caller in LSR was updated to assert that the cost is valid instead of positive. We should return invalid instead of a negative value so LSR will assert if the cost isn't valid.	2025-03-05 09:10:45 -08:00
Luke Lau	e1cea0d928	[LV][TTI] Remove unused ReductionFlags. NFC (#129858 ) No in-tree targets currently use it in the preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It looks like it used to be used in LoopUtils, at least in 8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced by RecurrenceDescriptor.	2025-03-05 18:31:12 +08:00
Mats Jun Larsen	416f1c465d	[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617 ) In accordance with https://github.com/llvm/llvm-project/issues/123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.	2025-01-21 00:32:56 +09:00
Vladi Krapp	f8d270474c	[ARM] Reduce loop unroll when low overhead branching is available (#120065 ) For processors with low overhead branching (LOB), runtime unrolling the innermost loop is often detrimental to performance. In these cases the loop remainder gets unrolled into a series of compare-and-jump blocks, which in deeply nested loops get executed multiple times, negating the benefits of LOB. This is particularly noticable when the loop trip count of the innermost loop varies within the outer loop, such as in the case of triangular matrix decompositions. In these cases we will prefer to not unroll the innermost loop, with the intention for it to be executed as a low overhead loop.	2024-12-18 10:10:51 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Nashe Mncube	e37d736def	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289 ) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level	2024-10-24 10:12:01 +01:00
Nashe Mncube	370fd74361	Revert "[llvm][ARM]Add widen global arrays pass" (#112701 ) Reverts llvm/llvm-project#107120 Unexpected build failures in post-commit pipelines. Needs investigation	2024-10-17 13:38:01 +01:00
Nashe Mncube	ab90d2793c	[llvm][ARM]Add widen global arrays pass (#107120 ) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works within GlobalOpt but is disabled by default on all targets except ARM.	2024-10-17 11:56:00 +01:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Philip Reames	d288574363	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824 ) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API.	2024-09-25 07:25:57 -07:00
Nikita Popov	a7697c8655	[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984 ) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCombine to improve that alignment. If desired I could also adjust the clang frontend to add alignment annotations equivalent to the previous behavior, but I don't see any indication that such an assumption is correct in the ARM intrinsics docs. Fixes https://github.com/llvm/llvm-project/issues/59081.	2024-09-05 09:26:53 +02:00
David Green	dcd246cbde	[ARM] Add scalar add_sat costs. (#100988 ) These can usually generate: - qadd / qsub for signed i32 scalars - uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned i8/i16 - Are expanded to an add + cmp + sel otherwise This can lead to differences in unrolling etc, but should be a better cost for the instructions.	2024-08-05 18:56:04 +01:00
David Green	ea7cc12f61	[ARM] Add fallback fptoi_sat costs. This makes sure that the custom operations get a fallback cost, even if they are not perfect.	2024-07-28 23:38:21 +01:00
Nikita Popov	11484cb817	[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits() This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future. Additionally this also marginally strengthens the analysis by retaining the original context instruction for one-use chains.	2024-07-01 12:41:21 +02:00
Andreas Jonson	34a2889e90	[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847 )	2024-06-19 18:17:44 +02:00
Graham Hunter	2e8d815596	[TTI] Support scalable offsets in getScalingFactorCost (#88113 ) Part of the work to support vscale-relative immediates in LSR.	2024-05-10 11:22:11 +01:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Alexey Bataev	7bc079c852	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837	2024-02-12 07:09:49 -05:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00

1 2 3 4 5 ...

384 Commits