llvm-project

Author	SHA1	Message	Date
David Green	c856e8def4	[ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC	2025-08-20 12:59:32 +01:00
Benjamin Maxwell	bb3066d42b	[LAA] Move scalable vector check into `getStrideFromAddRec()` (#154013 ) This moves the check closer to the `.getFixedValue()` call and fixes #153797 (which is a regression from #126971).	2025-08-19 06:40:07 +01:00
Panagiotis Karouzakis	c2e7fad446	[DemandedBits] Support non-constant shift amounts (#148880 ) This patch adds support for the shift operators to handle non-constant shift operands. ashr proof -->https://alive2.llvm.org/ce/z/EN-siK lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB shl proof --> https://alive2.llvm.org/ce/z/dpvbkq	2025-08-19 01:11:16 +08:00
Diana Picus	ac005e16f6	Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584 ) This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The buildbot failure seems to have been a cmake issue which has been discussed in more detail in this Discourse post: https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901 If any buildbots fail to select arbitrary intrinsics with this patch, it's worth considering using clean builds with ccache instead of incremental builds, as recommended here: https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds The original commit message for this patch: Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Tail calls are handled in a future patch.	2025-08-15 10:12:47 +02:00
joaosaffran	d56fa96524	[DirectX] Add Range Overlap validation (#152229 ) As part of the Root Signature Spec, we need to validate if Root Signatures are not defining overlapping ranges. Closes: https://github.com/llvm/llvm-project/issues/126645 --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com> Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com> Co-authored-by: Joao Saffran <jderezende@microsoft.com>	2025-08-14 18:40:11 -04:00
David Green	5836bae463	[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963 ) As fmul and fmadd are so similar, their performance characteristics tend to be the same on most platforms, at least in terms of reciprocal throughputs. Processors capable of performing a given number of fmul per cycle can usually perform the same number of fma, with the extra add being relatively simple on top. This patch makes the scores of the two operations the same, which brings the throughput cost of a fma/fmuladd to 2, and the latency to 3, which are the defaults for fmul. Note that we might also want to change the throughput cost of a fmul to 1, as most processors have ample bandwidth for them, but they should still stay in-line with one another.	2025-08-14 21:53:45 +01:00
David Green	d9d9d9ad19	[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304 ) LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as store(interleave-shuffle). Whilst the shuffle would be expensive in general for MVE (it does not have zip/uzp instructions), it should be treated as cheap when part of the LD2/ST2 pattern. This borrows some code from the AArch64 backed to produce lower costs. (Some of which still shows as higher than it should - that just shows how broken the generic shuffle costs are at the moment, they would be lower if getShuffleCost was called directly as opposed to going through getInstructionCost).	2025-08-14 06:59:37 +01:00
Luke Lau	81b576e66b	[RISCV] Cost casts with illegal types that can't be legalized (#153030 ) If we have a floating point vector and no zve32f/zve64f/zve64d, we can end up with an invalid type-legalization cost from getTypeLegalizationCost. Previously this triggered an assertion that the type must have been legalized if the "legal" type is a vector, but in this case when it's not possible to legalize the original type is spat back out. This fixes it by just checking that the legalization cost is valid. We don't have much testing for zve64x, so we may have other places in the cost model with this issue. Fixes #153008	2025-08-12 00:29:39 +08:00
Ramkumar Ramachandra	4443b37877	[LAA] Pre-commit tests exercising different types (#151091 ) Pre-commit tests exercising different types of source/sink in depend_diff_types.ll, in preparation to weaken the HasSameSize check in LoopAccessAnalysis. Co-authored-by: Igor Kirillov <igor.kirillov@arm.com>	2025-08-11 10:19:10 +01:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Ivan R. Ivanov	7c141e2118	[ValueTracking] Add missing check for two-value PN recurrence matching (#152700 ) When InstTy is a type like IntrinsicInst which can have a variable number of arguments, we can encounter a case where Operation will have fewer than two arguments and error at the getOperand() calls. Fixes: https://github.com/llvm/llvm-project/issues/152725.	2025-08-08 17:39:24 +02:00
David Green	7f1638efc1	[AArch64] Generalize costing for FP16 instructions (#150033 ) This extracts the code for modelling a fp16 operation as `fptrunc(fpop(fpext,fpext))` into a new function named getFP16BF16PromoteCost so that it can be reused by the arithmetic instructions. The function takes a lambda to calculate the cost of the operation with the promoted type.	2025-08-08 13:40:07 +01:00
Ryotaro Kasuga	bd39ae6125	[Delinearization] Add function for fixed size array without relying on GEP (#145050 ) The existing functions `getIndexExpressionsFromGEP` and `tryDelinearizeFixedSizeImpl` provide functionality to delinearize memory accesses for fixed size array. They use the GEP source element type in their optimization heuristics. However, driving optimization heuristics based on GEP type information is not allowed. This patch introduces new functions `findFixedSizeArrayDimensions` and `delinearizeFixedSizeArray` to delinearize a fixed size array without using the type information in GEP. The new function `findFixedSizeArrayDimensions` infers the size of each dimension of the array based on the value to be added to the address as induction variables are incremented. `delinearizeFixedSizeArray` attempts to restore the subscripts of each dimension based on the estimated array size. This is an initial implementation that may not cover all cases, but is intended to replace the existing function in the future. Related: - https://discourse.llvm.org/t/enabling-loop-interchange/82589/4 - https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501	2025-08-08 19:08:14 +09:00
Graham Hunter	de72cca671	[CostModel] Provide a default model for histogram intrinsics (#149348 ) Since we scalarize these intrinsics when the target does not support them, we should model that for costing purposes.	2025-08-08 11:00:00 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
David Green	6a32e2225e	[AArch64] Add SVE fmuladd and fma cost tests. NFC	2025-08-08 08:52:28 +01:00
Ryotaro Kasuga	05dd957cda	[DA] Fix the check between Subscript and Size after delinearization (#151326 ) Delinearization provides two values: the size of the array, and the subscript of the access. DA checks their validity (`0 <= subscript < size`), with some special handling. In particular, to ensure `subscript < size`, calculate the maximum value of `subscript - size` and check if it is negative. There was an issue in its process: when `subscript - size` is expressed as an affine format like `init + step * i`, the value in the last iteration (`start + step * (num_iterations - 1)`) was assumed to be the maximum value. This assumption is incorrect in the following cases: - When `step` is negative - When the AddRec wraps This patch introduces extra checks to ensure the sign of `step` and verify the existence of nsw/nuw flags. Also, `isKnownNonNegative(S - smax(1, Size))` was used as a regular check, which is incorrect when `Size` is negative. This patch also replace it with `isKnownNonNegative(S - Size)`, although it's still unclear whether using `isKnownNonNegative` is appropriate in the first place. Fix #150604	2025-08-08 10:58:13 +09:00
Shilei Tian	351b38f266	[AMDGPU] Mark address space cast from private to flat as divergent if target supports globally addressable scratch (#152376 ) Globally addressable scratch is a new feature introduced in gfx1250. However, this feature changes how scratch space is mapped into the flat aperture, making address space casts from private to flat no longer uniform.	2025-08-06 17:08:56 -04:00
Pedro Lobo	2bbc614713	[InstCombine] Support offsets in `memset` to load forwarding (#151924 ) Adds support for load offsets when performing `memset` load forwarding.	2025-08-05 17:09:06 +01:00
Nikita Popov	ba099c516d	[StackLifetime] Remove handling for lifetime size mismatch (#151965 ) Split out from #150248: Since #150944 the size passed to lifetime.start/end is considered meaningless. The lifetime always applies to the whole alloca. Accordingly remove handling for size mismatch in the StackLifetime analysis.	2025-08-05 09:19:10 +02:00
David Green	fcae1ba775	[ARM] Use -cost-kind=all for cast and active_lane_mask tests. NFC	2025-08-05 08:14:47 +01:00
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Nikita Popov	b18377ccad	[GlobalsModRef] Generate test checks (NFC)	2025-08-04 17:13:28 +02:00
David Green	b30d5315b7	[AArch64] Add better fcmp costs for expanded predicates (#147940 ) Certain fcmp predicates need to be expanded into multiple operations and or'd together. This adds some more accurate cost modelling for them based on the predicate. Unsupported operations are given the cost of a libcall and the latency is set to 2 as that seemed to be fairly common between different CPUs.	2025-08-04 13:42:57 +01:00
David Green	e136fb04f2	[AArch64] Add sve bf16 fpext and fpround costs. (#150485 ) This prevents them from generating Invalid costs, as generating the instructions seems to work fine with and without +bf16. The costs are mostly taken from the number of instructions (minus ptrue and constants).	2025-08-04 09:47:41 +01:00
Stanislav Mekhanoshin	33abf05af4	[AMDGPU] gfx1250 v_permlane_* instructions (#151749 )	2025-08-01 16:14:19 -07:00
Ramkumar Ramachandra	e7200c734d	[LV] Pre-commit test for #151664 (#151671 ) Hoisted vector instructions are costed incorrectly.	2025-08-01 17:09:11 +01:00
Florian Hahn	2ae996cbbe	[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047 ) This patch extends the logic added in https://github.com/llvm/llvm-project/pull/128061 to support dereferenceability information from assumptions as well. Unfortunately both assumption cache and the dominator tree need to be threaded through multiple layers to make them available where needed. PR: https://github.com/llvm/llvm-project/pull/147047	2025-08-01 14:18:07 +01:00
Antonio Frighetto	3eda63c958	[GVN] Add MemorySSA coverage to tests (NFC) Test check lines have been regenerated by ensuring parity between MemDep and MemorySSA, while migrating towards the latter.	2025-08-01 15:10:58 +02:00
Florian Hahn	d74d841b65	[SECV] Try to push the op into ZExt: A + zext (-A + B) -> zext (B) (#151227 ) Try to push the constant operand into a ZExt: A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not unsigned-wrap. The actual code supports ZExts with arbitrary number of arguments, hence the getAddExpr in the return. This helps SCEV reasoning in some cases, commonly when adding an offset to a zero-extended SCEV that subtracts the same offset. Note that this is restricted to cases where we can fold away an operand of the inner Add. This is needed to avoid bad interactions with patterns when forming ZExts, which try to push to ZExt to add operands. https://alive2.llvm.org/ce/z/q7d303 PR: https://github.com/llvm/llvm-project/pull/151227	2025-07-30 21:10:57 +01:00
Ramkumar Ramachandra	ec0c79df59	[RISCV] Fix bug in [l](lrint\|lround) vector-cost (#151298 ) Follow up on a review of bd66fd0 ([CostModel/RISCV] Fix costs of vector [l](lrint\|lround)) post-landing to fix a subtle problem with the cost of vector [l](lrint\|lround). We should use source LMUL in the case of a narrowing op. Co-authored-by: Luke Lau <luke@igalia.com>	2025-07-30 19:41:11 +01:00
Florian Hahn	c6f7fa7437	[SCEV] Add test for pushing constant add into zext. Adds a SCEV-only tests for https://github.com/llvm/llvm-project/pull/151227.	2025-07-30 10:04:40 +01:00
Luke Lau	2a5ac19605	Revert "[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 )" This reverts commit fe4f6c1a58ab4f00a88a97af01000b6783b573ee, but leaves the tests that were added. The original commit mistakenly assumed that if regular bf16/f16 loads and stores could be lowered without zvfbfmin/zvfhmin, then so too could masked loads/stores and gathers/scatters. However SelectionDAG can't actually type-legalize masked.load/stores since it needs to be done in ScalarizeMaskedMemIntrinPass. This was causing crashes on IREE because we now returned true for isLegalMaskedLoadStore. The original intent of this was to remove a discrepancy in the loop vectorizer tests whenever predication was enabled, but this has gone away after 92d09245d61dce80d3e68a27cc34d5fc6f062c93. So I don't think we need to reapply this patch.	2025-07-30 13:29:47 +08:00
Ramkumar Ramachandra	bd66fd0d01	[CostModel/RISCV] Fix costs of vector [l](lrint\|lround) (#146058 ) Take the actual instruction cost into account, and don't fallthrough to code that doesn't apply to [l]lrint. Also strip invalid costs for [b]f16, as a companion to #146507, and unify it with [l]lround costs as a companion to #147713.	2025-07-29 19:22:11 +01:00
David Green	46526f879f	[ARM] Use -cost-kind=all for arith-overflow.ll, arith-ssat.ll and arith-usat.ll. NFC	2025-07-29 15:08:45 +01:00
Luke Lau	fe4f6c1a58	[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin (#150882 ) When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f6367205c6a909d68027af6a8675a8091bd9 to fix in another patch.	2025-07-28 22:59:49 +08:00
Luke Lau	a100f63672	[RISCV] Add FP cost model tests for no zfhmin/zfbfmin. NFC Vector costs without zvfhmin/zvfbfmin and zfhmin/zfbfmin are somehow cheaper than with zvfhmin/zvfbfmin at smaller vector sizes, despite the fact that the former are scalarized to libcalls. This adds a RUN line to showcase this, splitting out the bfloat tests into their own functions so we don't have duplicate lines for the regular float/double costs.	2025-07-28 11:27:30 +08:00
Luke Lau	e259ba8bec	[RISCV] Modernize FP cost model tests. NFC * Replace undef -> poison * Remove overloaded type in intrinsic signature	2025-07-28 10:59:39 +08:00
Simon Pilgrim	3820206194	[CostModel][X86] Update SK_Reverse based on cost kinds (#150650 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 18:21:56 +01:00
Florian Hahn	9e7782db73	[LV,LAA] Add tests where RT checks are known false after expansion.	2025-07-26 14:17:35 +01:00
Simon Pilgrim	0fa0ce1f3a	[CostModel][X86] Update SK_Broadcast based on cost kinds (#150620 ) When these were converted to CostKindTblEntry the throughput was mainly copied to all cost kinds Regenerated with my check_cost_tables.py helper script	2025-07-26 13:52:47 +01:00
Ryotaro Kasuga	b06f10d96c	[DA] Add check for base pointer invariance (#148241 ) As specified in #53942, DA assumes base pointer invariance in its process. Some cases were fixed by #116628. However, that PR only addressed the parts related to AliasAnalysis, so the original issue persists in later stages, especially when the AliasAnalysis results in `MustAlias`. This patch insert an explicit loop-invariant checks for the base pointer and skips analysis when it is not loop-invariant. Fix the cases added in #148240.	2025-07-26 03:25:01 +09:00
Simon Pilgrim	81eb63ad7f	[CostModel][X86] Complicate the cross lane single/two source shuffle masks Try to ensure shuffle masks don't simplify too much to easier shuffle kinds when splitting	2025-07-25 17:25:08 +01:00
Simon Pilgrim	f1122a64c6	[CostModel][X86] load-broadcast.ll - regenerate checks for all cost kinds	2025-07-25 13:33:32 +01:00
David Green	66dd09a232	[AArch64] Change sve-fcmp.ll to test scalable vectors. NFC Whilst testing fixed length vectors with +sve might be useful, this was just a mistake in the generation of the test and should be using scalable vectors.	2025-07-25 10:09:54 +01:00
David Green	526b672a2c	[AArch64] Add sve bf16 fpext and fptrunc costs. NFC	2025-07-24 19:09:30 +01:00
Luke Lau	61110e0f62	[TTI] Share value and type based llvm.vector.reverse cost (#150415 ) We currently provide a generic cost for llvm.vector.reverse in BasicTTI by reusing the reverse shuffle cost, but only for the value based cost. Since the argument values aren't actually used, move this into the type based costing method so that type based costing can also reuse it.	2025-07-24 22:06:40 +08:00
Luke Lau	0e42eaa668	[RISCV] Add type based RUN line for vector intrinsic cost model tests. NFC	2025-07-24 20:50:47 +08:00
David Green	52499bbd90	[ARM] Test all cost kinds in arith.ll. NFC	2025-07-23 22:01:46 +01:00
Nikita Popov	2c6eec219d	[Tests] Avoid lifetime intrinsics on non-allocas (NFC) Don't rely on auto-upgrade, instead either remove unnecessary casts or remove no longer applicable tests.	2025-07-23 15:05:43 +02:00

1 2 3 4 5 ...

5072 Commits