llvm-project

Author	SHA1	Message	Date
Philip Reames	31c37a4a5e	[RISCV][TTI] Adjust VLS shuffle costing to account for sub-mask reuse (#129793 ) If we have a shuffle which can be split via VLA where two or more of the destinations have exactly the same elements, then we only need to account for them once in costing. The duplicate copies are are (at worst) whole register moves. Note that this change only handles the single source case. Doing the multiple source case seemed a bit more complicated, and I didn't have a motivating test case.	2025-03-29 15:18:44 -07:00
David Green	9c6eca28cb	[AArch64] Return an invalid cost for vscale x 2 x i128 srem. This protects against invalid size requests on scalable vectors by checking the original VT, not the legalized type when checking for scalars. The cost returned is now invalid, which lines up with the codegen not being able to produce a result.	2025-03-29 19:25:17 +00:00
Florian Hahn	8bdcd0a96e	[LAA] Add missing test coverage for retrying with runtime checks. Adds extra test coverage showing change by https://github.com/llvm/llvm-project/pull/128045.	2025-03-27 19:09:10 +00:00
David Green	c6406c8dba	[AArch64] Add getVectorInstrCost Codesize costs handling. (#130946 ) We have a lot of missing Codesize costs for vector operations. This patch starts things off by adding codesize costs for getVectorInstrCost, returning a single cost instead of the VectorInsertExtractBaseCost (which is typically 2). Insert of a load are given a cost of 0 as they use ld1, otherwise the cost is 1.	2025-03-27 17:25:02 +00:00
David Green	e2202b944b	[AArch64] Update costs for scalarizing i64->f32 int_to_fp. (#132366 ) After #130665 these operations are scalarized to avoid double-rounding. This updates the cost model to match. In the future we might be able to use SVE instructions to help, but for the moment the costs should be higher. Costsize and Latency costs are not yet expected to be accurate. The vector insert/extract will use the cost of VectorInsertExtractBaseCost (2 by default).	2025-03-26 07:26:17 +00:00
Alex MacLean	fd3a6b6005	[NVPTX] Improve modeling of inline PTX (#130675 ) Improve the modeling of the memory effects and instruction cost of inline assembly. - MemoryEffects: The CUDA spec states that inline assembly is not assumed to have any side-effects or read or write to memory. An inline assembly may be treated as NoModRef unless it is explictly marked as having side effects or has an explicit memory clobber. https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#incorrect-optimization > Normally any memory that is written to will be specified as an out operand, but if there is a hidden read or write on user memory (for example, indirect access of a memory location via an operand), or if you want to stop any memory optimizations around the asm() statement performed during generation of PTX, you can add a “memory” clobbers specification after a 3rd colon. - InstructionCost: This change implements very rough string parsing system to count the number of instructions in an inline-asm. There are corner cases it will not handle well, but in general this is an improvement over the current cost of the number of arguments plus one.	2025-03-25 13:46:16 -07:00
Graham Hunter	f737df73a3	[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts (#130756 ) If a scalable vector uitofp or sitofp effectively extends the size of each element as part of the conversion, the AArch64 backend may need to plant multiple unpacks before converting. Increase the cost in those cases to account for this.	2025-03-25 10:43:44 +00:00
Simon Pilgrim	b62e149f06	[CostModel][X86] check fma cost kinds using -cost-kind=all	2025-03-20 17:20:47 +00:00
Nikita Popov	38e8dff84b	[AA][BasicAA] Move more call logic to BasicAA (#131144 ) Currently, the handling for calls is split between AA and BasicAA in an awkward way. BasicAA does argument alias analysis for non-escaping objects (but without considering MemoryEffects), while AA handles the generic case using MemoryEffects. However, fundamentally, both of these are really trying to do the same thing. The new merged logic first tries to remove the OtherMR component of the memory effects, which includes accesses to escaped memory. If a function-local object does not escape, OtherMR can be set to NoModRef. Then we perform the argument scan in basically the same way as AA previously did. However, we also need to look at the operand bundles. To support that, I've adjusted getArgModRefInfo to accept operand bundle arguments.	2025-03-19 15:44:52 +01:00
Simon Pilgrim	0cb9c5045b	[CostModel][X86] check fp<->int conversion cost kinds using -cost-kind=all	2025-03-19 14:16:08 +00:00
Nashe Mncube	4ddc8df6ca	[CostModel][ARM]Adjust cost of muls in (U/S)MLAL and patterns (#122713 ) PR #117350 made changes to the SLP vectorizer which introduced a regression on some ARM benchmarks. Investigation narrowed it down to suboptimal codegen for benchmarks that previously only used scalar (U/S)MLAL instructions. The linked change meant the SLPVectorizer thought that these could be vectorized. This change makes the cost of muls in (U/S)MLAL patterns slightly cheaper to make sure scalar instructions are preferred in these cases over SLP vectorization on targets supporting DSP	2025-03-19 12:25:44 +00:00
Simon Pilgrim	945ce9642b	[CostModel][X86] check all reduction cost kinds using -cost-kind=all (#132000 )	2025-03-19 11:26:10 +00:00
Simon Pilgrim	4686b8a663	[CostModel][X86] merge masked intrinsics costs tests using -cost-kind=all (#131999 )	2025-03-19 11:25:54 +00:00
Simon Pilgrim	6ca1424fc1	[CostModel][X86] merge fmaxnum/fminnum costs tests using -cost-kind=all (#131922 )	2025-03-19 10:02:38 +00:00
Simon Pilgrim	e9daafdd5e	[CostModel][X86] merge integer comparison costs tests using -cost-kind=all (#131875 )	2025-03-19 10:01:31 +00:00
Simon Pilgrim	841d6c45f3	[CostModel][X86] merge fp comparison costs tests using -cost-kind=all (#131874 )	2025-03-19 10:01:19 +00:00
Simon Pilgrim	7cd9b3fcec	[CostModel][X86] merge truncation costs tests using -cost-kind=all (#131872 )	2025-03-19 10:00:49 +00:00
Simon Pilgrim	61b0bf5e01	[CostModel][X86] merge funnel shifts costs tests using -cost-kind=all (#131867 )	2025-03-19 10:00:16 +00:00
Simon Pilgrim	0f2fb2b5c5	[CostModel][X86] merge integer multiply costs tests using -cost-kind=all (#131864 )	2025-03-19 09:20:56 +00:00
Simon Pilgrim	72240fae4a	[CostModel][X86] merge select costs tests using -cost-kind=all (#131865 )	2025-03-19 09:19:07 +00:00
Simon Pilgrim	a6c09d40ed	[CostModel][X86] merge integer div/rem costs tests using -cost-kind=all (#131873 )	2025-03-18 21:46:32 +00:00
David Green	b42f8ec26d	[AArch64] Update a number of costmodel tests with -cost-kind=all. NFC	2025-03-18 18:48:35 +00:00
Simon Pilgrim	a5a9b2b92f	[CostModel][X86] merge integer arithmetic costs tests using -cost-kind=all (#131840 )	2025-03-18 17:26:31 +00:00
Simon Pilgrim	40c6f89841	[CostModel][X86] merge fp arithmetic costs tests using -cost-kind=all (#131839 )	2025-03-18 17:24:05 +00:00
Simon Pilgrim	168177a0bd	[CostModel][X86] merge arithmetic integer min/max costs tests using -cost-kind=all (#131834 )	2025-03-18 17:08:52 +00:00
Simon Pilgrim	24fbf9dd42	[CostModel][X86] merge saturated arithmetic costs tests using -cost-kind=all (#131828 )	2025-03-18 16:51:26 +00:00
Nikita Popov	93df3e8166	[BasicAA] Add additional test for call AA (NFC)	2025-03-18 17:36:50 +01:00
Simon Pilgrim	33e5d013b7	[CostModel][X86] merge vector shuffle costs tests using -cost-kind=all (#131819 )	2025-03-18 16:19:51 +00:00
Simon Pilgrim	e8f79eb898	[CostModel][X86] merge cttz costs tests using -cost-kind=all (#131810 )	2025-03-18 15:43:21 +00:00
Simon Pilgrim	df544b73e4	[CostModel][X86] merge vector shifts costs tests using -cost-kind=all (#131806 )	2025-03-18 15:32:00 +00:00
Simon Pilgrim	05dbabe329	[CostModel][X86] merge ctpop costs tests using -cost-kind=all (#131802 )	2025-03-18 15:22:20 +00:00
Simon Pilgrim	034dd4c26f	[CostModel][X86] merge ctlz costs tests using -cost-kind=all (#131797 )	2025-03-18 14:34:44 +00:00
Simon Pilgrim	a2d7451a13	[CostModel][X86] merge bitreverse costs tests using -cost-kind=all (#131791 )	2025-03-18 13:30:01 +00:00
Simon Pilgrim	4f5eed0a37	[CostModel][X86] merge bswap costs tests using -cost-kind=all (#131784 )	2025-03-18 13:29:25 +00:00
Simon Pilgrim	31e98c7037	[CostModel][X86] merge abs costs tests using -cost-kind=all (#131619 ) Now that we have #130490 - merge the cost test files to avoid bitrot Lots more set of files to do - but this is give an example	2025-03-18 11:19:05 +00:00
Yingwei Zheng	c5a491e9ea	[SCEV] Check whether the start is non-zero in `ScalarEvolution::howFarToZero` (#131522 ) https://github.com/llvm/llvm-project/pull/94525 assumes that the loop will be infinite when the stride is zero. However, it doesn't hold when the start value of addrec is also zero. Closes https://github.com/llvm/llvm-project/issues/131465.	2025-03-17 13:59:16 +08:00
Mircea Trofin	b034905c82	[ctxprof] Capture sampling info for context roots (#131201 ) When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root. We expose this in the profile and leave the normalization to the user of the profile, for a few reasons: * it's only needed if reasoning about all profiles in aggregate. * the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles * the information itself may be interesting as an indication of relative sampling of various contexts.	2025-03-14 21:10:22 -07:00
Florian Hahn	dfb661cd1c	[LAA] Add extra tests for #128061 . Extend test coverage for https://github.com/llvm/llvm-project/pull/128061.	2025-03-13 21:42:32 +00:00
Nikita Popov	de895751d2	[CaptureTracking][AA] Only consider provenance captures (#130777 ) For the purposes of alias analysis, we should only consider provenance captures, not address captures. To support this, change (or add) CaptureTracking APIs to accept a Mask and StopFn argument. The Mask determines which components we are interested in (for AA that would be Provenance). The StopFn determines when we can abort the walk early. Currently, we want to do this as soon as any of the components in the Mask is captured. The purpose of making this a separate predicate is that in the future we will also want to distinguish between capturing full provenance and read-only provenance. In that case, we can only stop early once full provenance is captured. The earliest escape analysis does not get a StopFn, because it must always inspect all captures.	2025-03-13 09:54:36 +01:00
David Green	adb44ed2b8	[AArch64] Add -cost-kind=all coverage for insert-extract.ll and shuffle-load.ll. NFC	2025-03-12 09:16:01 +00:00
David Green	6f89c1ff6b	[AArch64] Remove Kyro run lines from insert-extract.ll. NFC They are expected to match the other CHECK lines now.	2025-03-12 09:15:43 +00:00
David Green	c542f42579	[AArch64] Update cost test to use -cost-kind=all. NFC This is essentially the tests from b021bdbb3997 re-done with the new cost-model output format from #130490, to add cost-model coverage for all the cost kinds. More to come..	2025-03-11 15:31:50 +00:00
David Green	5c8760b1ab	[AArch64] Update arith-fp.ll codegen test. NFC A run line with and without +fullfp16 is added to check the differences between the two, and the fp16 tests are separated out to keep the other check lines simpler. FP128 tests are added for all operations, and fmuladd tests are added similar to fma.	2025-03-11 12:50:59 +00:00
David Green	cdf18331eb	[CostModel] Add -cost-kind=all costmodel output (#130490 ) In order to make the different cost model kinds easier to test, and to manage the complexity of all the different variants, this patch introduces a -cost-kind=all option that will print the output of all cost model kinds. It feel especially helpful for tests that already have multiple run lines (with / without +fullfp16 for example). It currently produces the output: ``` Cost Model: Found costs of RThru:1 CodeSize:1 Lat:3 SizeLat:1 for: %F16 = fadd half undef, undef ``` The output is collapsed into a single value if all costs are the same. Invalid costs print "Invalid" via the normal InstructionCost printing. Two test files are updated to show some examples with -intrinsic-cost-strategy=type-based-intrinsic-cost and Invalid costs. Once we have something we are happy with I will try to use this to update more tests, as in b021bdbb3997ef6dd13980dc44f24754f15f3652 but for more variants.	2025-03-11 10:55:37 +00:00
Sushant Gokhale	c4808741e8	[AArch64][CostModel] Alter sdiv/srem cost where the divisor is constant (#123552 ) This patch revises the cost model for sdiv/srem and draws its inspiration from the udiv/urem patch #122236 The typical codegen for the different scenarios has been mentioned as notes/comments in the code itself( this is done owing to lot of scenarios such that it would be difficult to mention them here in the patch description).	2025-03-09 22:26:39 -07:00
David Green	e44e24dfe6	[AArch64] Improve vector funnel shift by constant costs. (#130044 ) We now have better codegen, and can have better costs to match. The generated code should now produce a shl+usra and can be seen in testcases such as: `7e5821bae8/llvm/test/CodeGen/AArch64/fsh.ll (L3941)`.	2025-03-09 18:01:45 +00:00
David Sherwood	db5e4016c0	[CostModel] Add type-based cost model for get.active.lane.mask intrinsic (#130132 ) I recently realised that we return an invalid cost when requesting the type-based cost for the get.active.lane.mask intrinsic. I've fixed that in this patch by reusing the existing code for the non-type-based model.	2025-03-07 16:12:35 +00:00
Benjamin Maxwell	5239f6777a	[CostModel][Test] Replace multiple flags with `-intrinsic-cost-strategy` (#128885 ) This replaces the `-prefer-intrinsic-cost` and `type-based-intrinsic-cost` flags with a single `-intrinsic-cost-strategy=<strategy>` flag. The possible strategies are: * `instruction-cost` - Use TargetTransformInfo::getInstructionCost() * `intrinsic-cost` - Use TargetTransformInfo::getIntrinsicInstrCost() * `type-based-intrinsic-cost` - Calculate the intrinsic cost based only on argument types	2025-03-07 10:51:16 +00:00
DianQK	462eb7e28e	[ValueTracking] Skip incoming values that are the same as the phi in `isGuaranteedNotToBeUndefOrPoison` (#130111 ) Fixes (keep it open) #130110. If the incoming value is PHI itself, we can skip this. If we can guarantee that the other incoming values are neither undef nor poison, then we can also guarantee that the value isn't either. If we cannot guarantee that, it makes no sense in calculating it.	2025-03-07 05:46:32 +08:00
Matt Arsenault	5faa4130b9	AMDGPU: Add gfx950 cost model tests for minimum and maximum (#130029 )	2025-03-06 17:27:36 +07:00

... 3 4 5 6 7 ...

5072 Commits