384 Commits

Author SHA1 Message Date
Osama Abdelkader
aad7259ff6
[AArch64] Optimize memset to use NEON DUP instruction for more sizes (#166030)
This change improves memset code generation for non-zero values on
AArch64 by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.

For small sizes, the value is extracted from a larger DUP. For
non-power-of-two sizes, overlapping stores are used in some cases.

TargetLowering::findOptimalMemOpLowering is modified to allow explicitly
specifying the size of the constant in cases where the constant is
larger than the store operations.

Fixes #165949
2026-01-29 13:03:38 -08:00
Florian Hahn
b794baf8e7
[TTI] Add VectorInstrContext for context-aware insert/extract costs. (#175982)
This commit introduces the VectorInstrContext (VIC) infrastructure to
improve cost estimates for insert/extracts based on the context
instruction in which the insert/extract is used.

This is similar to CastContextHint, and allows providing context on how
the insert/extract is going to be used before creating IR. This is
useful in the LoopVectorizer, where costs need to estimated before
creating IR.

The new hint currently only replaces an existing check in AArch64,
but new uses will be introduced in follow-ups, including
https://github.com/llvm/llvm-project/pull/177201.

PR: https://github.com/llvm/llvm-project/pull/175982
2026-01-27 16:30:29 +00:00
Croose
fab06fae00
[ARM] Fix inlining issue in ARM (#169337)
There is an issue on ARM where a function wont be inlined due to
mismatching target features between caller and callee.
The caller has `HasV8Ops` and `FeatureDotProd` and the callee does not,
but AFAIK this should not be a problem.
https://godbolt.org/z/f19h3zT66 is an example showing how the call is
not inlined on armv7.
The expected asm output would be something like:
```asm
.fnstart
	vsdot.s8	q0, q1, d4[0]
	bx	lr
.Lfunc_end0:

```
Thanks to @Amichaxx we managed to narrow it down and now can resolve
this problem by adding `ARM::FeatureDotProd, ARM::HasV8Ops` to
InlineFeaturesAllowed in llvm/lib/Target/ARM/ARMTargetTransformInfo.h,
after which the inlining occurs successfully.

Whilst we're at it we have also added some debugging to make it easier
to tell why (or why not) a function is being inlined for ARM, and a
couple other features that seem to be missing from the list.

This patch was motivated by an issue experienced with rust that was
traced back to llvm, and thus was designed to address that.
2026-01-16 12:06:20 +00:00
Shih-Po Hung
c2409b4bca
[TTI] Remove masked/gather-scatter/strided/expand-compress costing from TTIImpl (#169885)
Following #165532, this patch moves scalarization‑cost computation into
BaseT::getMemIntrinsicCost and lets backends override it via their
getMemIntrinsicCost.
It also removes the masked/gather‑scatter/strided/expand‑compress
costing interfaces from TTIImpl.
Targets may keep them locally if needed.

Stacked on #170426 and #170436.
2025-12-04 01:34:29 +00:00
Shih-Po Hung
1c86f4a8f1
[TTI] Use MemIntrinsicCostAttributes for getGatherScatterOpCost (#168650)
- Following #168029. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.

API change:
```
- InstructionCost getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
-                                        Alignment, CostKind, Inst);

+ InstructionCost getGatherScatterOpCost(MemIntrinsicCostAttributes,
+                                       CostKind);
```

Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with same
information as before.
2025-12-03 03:01:35 +00:00
David Green
f741851731 Revert "[AArch64][ARM] Move ARM-specific InstCombine transforms into Transforms/Utils (#169589)"
This reverts commit 1c32b6f51ccaaf9c65be11d7dca9e5a476cddb5a due to failures on
BUILD_SHARED_LIBS builds.
2025-12-02 11:46:50 +00:00
valadaptive
1c32b6f51c
[AArch64][ARM] Move ARM-specific InstCombine transforms into Transforms/Utils (#169589)
Back when `TargetTransformInfo::instCombineIntrinsic` was added in
https://reviews.llvm.org/D81728, several transforms common to both ARM
and AArch64 were kept in the non-target-specific `InstCombineCalls.cpp`
so they could be shared between the two targets.

I want to extend the transform of the `tbl` intrinsics into static
`shufflevector`s in a similar manner to
https://github.com/llvm/llvm-project/pull/169110 (right now it only
works with a 64-bit `tbl1`, but `shufflevector` should allow it to work
with up to 2 operands, and it can definitely work with 128-bit vectors).
I think separating out the transform into a TTI hook is a prerequisite.

~~I'm not happy about creating an entirely new module for this and
having to wire it up through CMake and everything, but I'm not sure
about the alternatives. If any maintainers can think of a cleaner way of
doing this, I'm very open to it.~~

I've moved the transforms into
`Transforms/Utils/ARMCommonInstCombineIntrinsic.cpp`, which is a lot
simpler.
2025-12-02 11:17:12 +00:00
Vladi Krapp
34c699246d
[Arm] Control forced unrolling of small loops (#170127)
* Add flag to control cost threshold for forced unrolling of loops.
  Existing value preserved as default.
2025-12-02 08:39:26 +00:00
Drew Kersnar
17852deda7
[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387)
This backend support will allow the LoadStoreVectorizer, in certain
cases, to fill in gaps when creating load/store vectors and generate
LLVM masked load/stores
(https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To
accomplish this, changes are separated into two parts. This first part
has the backend lowering and TTI changes, and a follow up PR will have
the LSV generate these intrinsics:
https://github.com/llvm/llvm-project/pull/159388.

In this backend change, Masked Loads get lowered to PTX with `#pragma
"used_bytes_mask" [mask];`
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
And Masked Stores get lowered to PTX using the new sink symbol syntax
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st).

# TTI Changes
TTI changes are needed because NVPTX only supports masked loads/stores
with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to
check that the mask is constant and pass that result into the TTI check.
Behavior shouldn't change for non-NVPTX targets, which do not care
whether the mask is variable or constant when determining legality, but
all TTI files that implement these API need to be updated.

# Masked store lowering implementation details
If the masked stores make it to the NVPTX backend without being
scalarized, they are handled by the following:
* `NVPTXISelLowering.cpp` - Sets up a custom operation action and
handles it in lowerMSTORE. Similar handling to normal store vectors,
except we read the mask and place a sentinel register `$noreg` in each
position where the mask reads as false.

For example, 
```
t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>
t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10

->

STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1);

```

* `NVPTXInstInfo.td` - changes the definition of store vectors to allow
for a mix of sink symbols and registers.
* `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_".

# Masked load lowering implementation details
Masked loads are routed to normal PTX loads, with one difference: a
`#pragma "used_bytes_mask"` is emitted before the load instruction
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
To accomplish this, a new operand is added to every NVPTXISD Load type
representing this mask.
* `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal
NVPTXISD loads with a mask operand in two ways. 1) In type legalization
through replaceLoadVector, which is the normal path, and 2) through
LowerMLOAD, to handle the legal vector types
(v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both
share the same convertMLOADToLoadWithUsedBytesMask helper. Both default
this operand to UINT32_MAX, representing all bytes on. For the latter,
we need a new `NVPTXISD::MLoadV1` type to represent that edge case
because we cannot put the used bytes mask operand on a generic
LoadSDNode.
* `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them
to created machine instructions.
* `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask
isn't all ones.
* `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update
manual indexing of load operands to account for new operand.
* `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to
the MI definitions.
* `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get
tagged as invariant.

Some generic changes that are needed:
* `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting
masked loads.
* `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked
load SDNode creation
2025-11-25 10:26:15 -06:00
Shih-Po Hung
961940e1a7
[TTI] Use MemIntrinsicCostAttributes for getMaskedMemoryOpCost (#168029)
- Split from #165532. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.

API change:
```
- InstructionCost getMaskedMemoryOpCost(Opcode, Src, Alignment,
-                                       AddressSpace, CostKind);

+ InstructionCost getMaskedMemoryOpCost(MemIntrinsicCostAttributes,
+                                       CostKind);
```
Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with the
same information as before.
- Follow-up: migrate gather/scatter, strided, and expand/compress cost
queries to the same attributes-based entry point.
2025-11-19 09:51:12 +08:00
Florian Hahn
af9a4263a1
[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445)
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.

I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.

Fixes https://github.com/llvm/llvm-project/issues/160912.

PR: https://github.com/llvm/llvm-project/pull/161445
2025-11-04 17:08:12 +00:00
Sam Tebbs
37127f74f4
[LV] Bundle sub reductions into VPExpressionRecipe (#147255)
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. https://github.com/llvm/llvm-project/pull/147026
2. -> https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-09-01 17:25:01 +01:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
David Green
d9d9d9ad19
[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304)
LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as
store(interleave-shuffle). Whilst the shuffle would be expensive in
general for MVE (it does not have zip/uzp instructions), it should be
treated as cheap when part of the LD2/ST2 pattern. This borrows some
code from the AArch64 backed to produce lower costs. (Some of which
still shows as higher than it should - that just shows how broken the
generic shuffle costs are at the moment, they would be lower if
getShuffleCost was called directly as opposed to going through
getInstructionCost).
2025-08-14 06:59:37 +01:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
David Green
9d0ac3980d [ARM] Use CostKind in getShuffleCost getMVEVectorCostFactor.
These calls pre-date CostKind being added to getShuffleCost in
5263155d5be64b435a97fd4fa12f7f0aa97f88a8.
2025-07-13 08:05:51 +01:00
Boyao Wang
697beb3f17
[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.

Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
2025-07-10 11:11:09 +08:00
David Green
77941eba7f
[CostModel] Add a DstTy to getShuffleCost (#141634)
A shuffle will take two input vectors and a mask, to produce a new
vector of size <MaskElts x SrcEltTy>. Historically it has been assumed
that the SrcTy and the DstTy are the same for getShuffleCost, with that
being relaxed in recent years. If the Tp passed to getShuffleCost is the
SrcTy, then the DstTy can be calculated from the Mask elts and the src
elt size, but the Mask is not always provided and the Tp is not reliably
always the SrcTy. This has led to situations notably in the SLP
vectorizer but also in the generic cost routines where assumption about
how vectors will be legalized are built into the generic cost routines -
for example whether they will widen or promote, with the cost modelling
assuming they will widen but the default lowering to promote for integer
vectors.

This patch attempts to start improving that - it originally tried to
alter more of the cost model but that too quickly became too many
changes at once, so this patch just plumbs in a DstTy to getShuffleCost
so that DstTy and SrcTy can be reliably distinguished. The callers of
getShuffleCost have been updated to try and include a DstTy that is more
accurate. Otherwise it tries to be fairly non-functional, keeping the
SrcTy used as the primary type used in shuffle cost routines, only using
DstTy where it was in the past (for InsertSubVector for example).

Some asserts have been added that help to check for consistent values
when a Mask and a DstTy are provided to getShuffleCost. Some of them
took a while to get right, and some non-mask calls might still be
incorrect. Hopefully this will provide a useful base to build more
shuffles that alter size.
2025-06-21 12:29:29 +01:00
Mel Chen
688bccb290
[TTI][LV] Simplify the prototype of preferPredicatedReductionSelect. nfc (#139265) 2025-05-12 17:24:37 +08:00
Harald van Dijk
32752913b1
[ARM] Do not assume memory intrinsics specify alignment. (#138356) 2025-05-07 16:25:03 +01:00
Kazu Hirata
d144c13ae5
[Target] Remove unused local variables (NFC) (#138443) 2025-05-04 07:56:38 -07:00
David Green
abd2c07e39
[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631)
This does not alter much at the moment, but allows const pointers to be
passed as Op0 and Op1, simplifying later patches
2025-05-01 15:55:08 +01:00
Sergei Barannikov
3334c3597d
[TTI] Fix discrepancies in prototypes between interface and implementations (NFCI) (#136655)
These are not diagnosed because implementations hide the methods of the base class rather than overriding them.
This works as long as a hiding function is callable with the same arguments as the same function from the base class.

Pull Request: https://github.com/llvm/llvm-project/pull/136655
2025-04-22 11:40:12 +03:00
Sergei Barannikov
0014b49482
[TTI] Make all interface methods const (NFCI) (#136598)
Making `TargetTransformInfo::Model::Impl` `const` makes sure all
interface methods are `const`, in `BasicTTIImpl`, its bases, and in all
derived classes.

Pull Request: https://github.com/llvm/llvm-project/pull/136598
2025-04-22 06:27:29 +03:00
Sergei Barannikov
e0c1e23b99
[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575)
The main change is making `thisT` method `const`, the rest of the
changes is fixing compilation errors (*).

(*) There are two tricky methods, `getVectorInstrCost()` and
`getIntImmCost()`.
They have several overloads; some of these overloads are typically
pulled in to derived classes using the `using` directive, and then
hidden by methods in the derived class.
The compiler does not complain if the hiding methods are not marked as
`const`, which means that clients will use the methods from the base
class. If after this change your target fails cost model tests, this
must be the reason. To resolve the issue you need  to make all hiding
overloads `const`. See the second commit in this PR.

Pull Request: https://github.com/llvm/llvm-project/pull/136575
2025-04-21 21:42:40 +03:00
Mel Chen
409df9f74c
[TTI][LV] Change the prototype of preferInLoopReduction. nfc (#132698)
This patch changes the preferInLoopReduction function to take a
RecurKind instead of an unsigned Opcode.
This makes it possible to distinguish non-arithmetic reductions such as
min/max, AnyOf, and FindLastIV, and also helps unify IAnyOf with FAnyOf
and IFindLastIV with FFindLastIV.

Related patch #118393 #131830
2025-04-07 19:10:16 +08:00
Krzysztof Drewniak
554859c736
[TTI] Make isLegalMasked{Load,Store} take an address space (#134006)
In order to facilitate targets that only support masked loads/stores
on certain address spaces (AMDGPU will support them in an upcoming
patch, but only for address space 7), add an AddressSpace parameter
to isLegalMaskedLoad and isLegalMaskedStore
2025-04-02 15:38:10 -05:00
Nashe Mncube
29925b7044
[NFC][CostModel][ARM] Remove redundant lambda capture (#132018)
Buildbot failures were caused by PR #122713. This was due to unused captures in a lambda function.
2025-03-19 13:52:53 +00:00
Nashe Mncube
4ddc8df6ca
[CostModel][ARM]Adjust cost of muls in (U/S)MLAL and patterns (#122713)
PR #117350 made changes to the SLP vectorizer which introduced a
regression on some ARM benchmarks. Investigation narrowed it down to
suboptimal codegen for benchmarks that previously only used scalar (U/S)MLAL
instructions. The linked change meant the SLPVectorizer thought that
these could be vectorized. This change makes the cost of muls in
(U/S)MLAL patterns slightly cheaper to make sure scalar instructions are
preferred in these cases over SLP vectorization on targets supporting DSP
2025-03-19 12:25:44 +00:00
Elvis Wang
6dba5f6595
[TTI] Align optional FMFs in getExtendedReductionCost() to getArithmeticReductionCost(). (#131968)
In the implementation of the getExtendedReductionCost(), it ofter calls
getArithmeticReductionCost() with FMFs. But we shouldn't call
getArithmeticReductionCost() with FMFs for non-floating-point reductions
which will return the wrong cost.

This patch makes FMFs in getExtendedReductionCost() optional and align
to the getArithmeticReductionCost(). So the TTI will return the correct
cost for non-FP extended-reductions query without FMFs.

This patch is not quite NFC but it's hard to test from the CostModel
side.

Split from #113903.
2025-03-19 18:53:38 +08:00
Craig Topper
39c454af01
[TTI] getScalingFactorCost should return InstructionCost::getInvalid() instead of -1. (#129802)
Historically this function return an int with negative values meaning
invalid. It was migrated to InstructionCost in 43ace8b5ce07a, but the
code was not updated to return invalid cost instead of -1. In that
commit, the caller in LSR was updated to assert that the cost is valid
instead of positive. We should return invalid instead of a negative
value so LSR will assert if the cost isn't valid.
2025-03-05 09:10:45 -08:00
Luke Lau
e1cea0d928
[LV][TTI] Remove unused ReductionFlags. NFC (#129858)
No in-tree targets currently use it in the
preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It
looks like it used to be used in LoopUtils, at least in
8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced
by RecurrenceDescriptor.
2025-03-05 18:31:12 +08:00
Mats Jun Larsen
416f1c465d
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
2025-01-21 00:32:56 +09:00
Vladi Krapp
f8d270474c
[ARM] Reduce loop unroll when low overhead branching is available (#120065)
For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performance. In these cases the
loop remainder gets unrolled into a series of compare-and-jump blocks,
which in deeply nested loops get executed multiple times, negating the
benefits of LOB.

This is particularly noticable when the loop trip count of the innermost
loop varies within the outer loop, such as in the case of triangular
matrix decompositions.

In these cases we will prefer to not unroll the innermost loop, with the
intention for it to be executed as a low overhead loop.
2024-12-18 10:10:51 +00:00
Benjamin Maxwell
c3260c65e8
[IR] Add llvm.sincos intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float }          @llvm.sincos.f32(float  %Val)
declare { double, double }        @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincos.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincos.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float>  %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
2024-10-29 10:52:20 +00:00
Nashe Mncube
e37d736def
Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.

- Pass optimizes memcpy's by padding out destinations and sources to a
  full word to make ARM backend generate full word loads instead of
  loading a single byte (ldrb) and/or half word (ldrh). Only pads
  destination when it's a stack allocated constant size array and source
  when it's constant string. Heuristic to decide whether to pad or not
  is very basic and could be improved to allow more examples to be
  padded.
- Pass works at the midend level
2024-10-24 10:12:01 +01:00
Nashe Mncube
370fd74361
Revert "[llvm][ARM]Add widen global arrays pass" (#112701)
Reverts llvm/llvm-project#107120 

Unexpected build failures in post-commit pipelines. Needs investigation
2024-10-17 13:38:01 +01:00
Nashe Mncube
ab90d2793c
[llvm][ARM]Add widen global arrays pass (#107120)
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
2024-10-17 11:56:00 +01:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Nikita Popov
a7697c8655
[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)
These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCombine to improve
that alignment.

If desired I could also adjust the clang frontend to add alignment
annotations equivalent to the previous behavior, but I don't see any
indication that such an assumption is correct in the ARM intrinsics
docs.

Fixes https://github.com/llvm/llvm-project/issues/59081.
2024-09-05 09:26:53 +02:00
David Green
dcd246cbde
[ARM] Add scalar add_sat costs. (#100988)
These can usually generate:
 - qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
 - Are expanded to an add + cmp + sel otherwise

This can lead to differences in unrolling etc, but should be a better
cost for the instructions.
2024-08-05 18:56:04 +01:00
David Green
ea7cc12f61 [ARM] Add fallback fptoi_sat costs.
This makes sure that the custom operations get a fallback cost, even if they
are not perfect.
2024-07-28 23:38:21 +01:00
Nikita Popov
11484cb817 [InstCombine] Pass SimplifyQuery to SimplifyDemandedBits()
This will enable calling SimplifyDemandedBits() with a SimplifyQuery
that has CondContext set in the future.

Additionally this also marginally strengthens the analysis by
retaining the original context instruction for one-use chains.
2024-07-01 12:41:21 +02:00
Andreas Jonson
34a2889e90
[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847) 2024-06-19 18:17:44 +02:00
Graham Hunter
2e8d815596
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
2024-05-10 11:22:11 +01:00
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Alexey Bataev
7bc079c852
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Kazu Hirata
586ecdf205
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-11 21:01:36 -08:00