llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	010dcfd85f	[CostModel][X86] Improve add/sub/mul overflow intrinsic costs Noticed due to x86 changes in #97463	2024-07-25 16:01:05 +01:00
Simon Pilgrim	b2b68c241a	[CostModel][X86] Add add/sat sat intrinsic costs Fixes regressions from #97463 due to missing costs for custom lowered ops	2024-07-25 14:35:39 +01:00
Tianqing Wang	3d494bfc7f	[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. (#98495 ) The `!unpredictable` metadata has been present for a long time, but it's usage in optimizations is still limited. This patch teaches `FoldTwoEntryPHINode()` to be more aggressive with an unpredictable branch to reduce mispredictions. A TTI interface `getBranchMispredictPenalty()` is added to distinguish between different hardwares to ensure we don't go too far for simpler cores. For simplicity, only a naive x86 implementation is included for the time being.	2024-07-23 07:47:21 +08:00
Shengchen Kan	a48305e0f9	[X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1 This fixes the crash when building llvm-test-suite with avx512f + cf.	2024-07-05 15:50:21 +08:00
Shengchen Kan	15fc801cf0	[X86][CodeGen] Support hoisting load/store with conditional faulting (#96720 ) 1. Add TTI interface for conditional load/store. 2. Mark 1 x i16/i32/i64 masked load/store legal so that it's not legalized in pass scalarize-masked-mem-intrin. 3. Visit 1 x i16/i32/i64 masked load/store to build a target-specific CLOAD/CSTORE node to avoid error in `DAGTypeLegalizer::ScalarizeVectorResult`. 4. Combine DAG to simplify the nodes for CLOAD/CSTORE. 5. Lower CLOAD/CSTORE to CFCMOV by pattern match. This is CodeGen part of #95515	2024-06-27 17:01:55 +08:00
Simon Pilgrim	a46a2c2b7d	[X86] Lower vXi8 multiplies using PMADDUBSW on SSSE3+ targets (#95690 ) Extends https://github.com/llvm/llvm-project/pull/95403 to handle non-constant cases - we can avoid unpacks/extensions from vXi8 to vXi16 by using PMADDUBSW instead and truncating the vXi16 results back together. Most targets benefit from performing this for non-constant cases - its just Intel Core/SandyBridge era CPUs that might experience additional Port0/15 contention (but lower instruction count). Fixes https://github.com/llvm/llvm-project/issues/90748	2024-06-25 12:25:56 +01:00
Nikita Popov	f2f18459d4	Revert "Intrinsic: introduce minimumnum and maximumnum (#93841 )" As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.	2024-06-21 08:34:04 +02:00
YunQiang Su	8988148003	Intrinsic: introduce minimumnum and maximumnum (#93841 ) Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033	2024-06-21 11:53:08 +08:00
Simon Pilgrim	fe3f8ad8cc	[X86] getIntrinsicInstrCost - begin generalizing BSWAP load/store-folding handling. Move load/store folding 'free costs' inside the adjustTableCost helper so we can some additional intrinsics in the future. The plan is to do something similar for other costs callbacks as well (getArithmeticInstrCost etc.).	2024-06-17 18:01:12 +01:00
Simon Pilgrim	22530e7985	[CostModel][X86] Update vXi8 mul costs for AVX512BW/AVX2/AVX1/SSE Later levels were inheriting some of the worst case costs from SSE/AVX1 etc. Based off llvm-mca numbers from the check_cost_tables.py script in https://github.com/RKSimon/llvm-scripts Cleanup prep work for #90748	2024-06-16 07:27:35 +01:00
Simon Pilgrim	995ba4afcd	[CostModel][X86] Adjust ABS scalar SizeLatency cost to 3uops This was previously set to 4uops which was including the cost of extra register moves in the original test code.	2024-06-11 10:29:18 +01:00
Shengchen Kan	e282118f47	[X86][TTI] Update the return value of X86TTIImpl::getNumberOfRegisters for EGPR	2024-06-05 13:45:01 +08:00
Jan Patrick Lehr	3082258d3a	[CodeGen][X86] Use TargetLowering for TypeInfo of PointerTy (#93469 ) This uses the TargetLowering getSimpleValueType mechanism to retrieve the ValueType info inside the X86 cost model. This resolves a build issue we were seeing for the miniQMC application after https://github.com/llvm/llvm-project/pull/92671.	2024-05-29 14:42:48 +02:00
Simon Pilgrim	e8877b29a3	[X86] getGatherScatterOpCost- remove unnecessary extra brackets. NFC.	2024-05-28 10:03:04 +01:00
Simon Pilgrim	1a4b113a41	[CostModel][X86] getCastInstrCost - update cost tables to support CostKinds Add TypeConversionCostKindTblEntry to hold the costs kinds and update the cast tables to take the existing default codesize/latency/sizelatency values (I'll update these values in future commits). I've moved AdjustCost to the end of the function to ensure we don't accidentally use it, apart from when we fallback to default cost calculations.	2024-05-13 13:44:32 +01:00
Simon Pilgrim	502e77df1f	[CostModel][X86] getGSVectorCost - remove FIXME getGSVectorCost has supported other TargetCostKind since a55127281b2ed5f24f848b9e5c70870ad170bc3f	2024-05-12 15:24:04 +01:00
Simon Pilgrim	a477004d82	[CostModel][X86] Remove getGSScalarCost and use getCommonMaskedMemoryOpCost directly The generic getCommonMaskedMemoryOpCost now gives the same cost estimates for scalarized gather/scatter.	2024-05-12 15:21:08 +01:00
Simon Pilgrim	23fe1fc6b7	[TTI][X86] getGSScalarCost - don't bother with adding cost of ICMP for each i1 mask element These can nearly always be folded into the existing cost of the branch, and brings the throughput costs of the scalarised gather/scatter code much closer to the llvm-mca/uica estimates	2024-05-11 13:37:48 +01:00
Graham Hunter	2e8d815596	[TTI] Support scalable offsets in getScalingFactorCost (#88113 ) Part of the work to support vscale-relative immediates in LSR.	2024-05-10 11:22:11 +01:00
Simon Pilgrim	a55127281b	[CostModel][X86] getGSVectorCost - add cost kind support Don't just assume gather/scatter non-throughput costs are 1 - latency and sizelatency (#uops) costs will be high, and codesize (#instructions) needs to account splitting.	2024-05-08 16:24:39 +01:00
Simon Pilgrim	abd314938d	[X86] Use GFNI for vXi8 shifts/rotates (#89115 ) As detailed here: https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h We can use the gf2p8affine instruction to lower byte shifts/rotates as well as the existing bitreverse case. Based off the original patch here: https://reviews.llvm.org/D137026	2024-05-07 10:28:55 +01:00
Simon Pilgrim	8a0073ad46	[CostModel][X86] Treat lrint/llrint as fptosi calls (#90883 ) X86 can use the CVTP2SI instructions to lower lrint/llrint calls, which have the same costs as the CVTTP2SI (fptosi) instructions Followup to #90065	2024-05-03 18:06:50 +01:00
Simon Pilgrim	fc7e74e879	[CostModel][X86] getCastInstrCost - improve CostKind adjustment when splitting src/dst types Noticed in #90883 review - for non-Throughput costs, we weren't applying the split count to '0 or 1' cost value. This still doesn't work well as many of the type legalizations are hidden so we don't have the split count, really we need to move a CostKindCosts based costs table, but that's going to be a lot of work :/	2024-05-03 12:17:18 +01:00
Simon Pilgrim	f89f670d92	[CostModel][X86] Broadcast shuffles can be free if they are from a one-use load AVX1+ can handle 32/64-bit broadcast loads, AVX2+ can handle all broadcast loads (we should be able to improve isLegalBroadcastLoad to handle more of this type matching).	2024-04-23 11:11:15 +01:00
Nikita Popov	8521550896	[X86] Use m_APIntAllowPoison instead of m_APIntAllowUndef Fix build after 1baa3850656382d1d549a13f8a716ef5dc886eb8.	2024-04-18 15:56:03 +09:00
Simon Pilgrim	c02ed29ec1	[CostModel][X86] Recognise vector rotation by uniform constant patterns Adds suitable costs for AVX512 targets (we still rely on default expansion for AVX2 and earlier)	2024-04-17 19:08:36 +01:00
Simon Pilgrim	e49043512d	[CostModel][X86] Update BITREVERSE costs for GFNI targets Inspired by the recent patches by @shamithoke - we have real scheduler model numbers for GFNI instructions now, allowing us to calculate an upper bounds costs table instead of performing it analytically.	2024-04-17 15:46:38 +01:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Simon Pilgrim	f09f9bc0aa	[X86] Add TODO to remove getGSScalarCost and use BaseT / getCommonMaskedMemoryOpCost directly There are only a few differences in the use of AddressSpace and getScalarizationOverhead that need to be handled.	2024-04-05 13:16:27 +01:00
Simon Pilgrim	1b4c37fec2	[TTI][X86] getGSVectorCost/getGSScalarCost - add CostKind to the function arguments. Initial refactor - only getGSScalarCost can actually use CostKind so far, and currently both are only ever set to TCK_RecipThroughput.	2024-04-05 11:15:46 +01:00
Simon Pilgrim	ed41249498	[CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 17:17:55 +01:00
Simon Pilgrim	3871eaba6b	[CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 12:26:35 +01:00
Il-Capitano	308ed0233a	[Intrinsics] Make `patchpoint.i64` generic on its return type (#85911 ) Currently patchpoints can only have two result types, `void` and `i64`. This limits the result to general purpose registers. This patch makes `patchpoint.i64` an overloadable intrinsic, allowing result values that can fit in a single register (e.g. integers, pointers, floats).	2024-03-26 19:08:52 +05:30
Simon Pilgrim	ee5e027cc6	[X86] getShuffleCost - recognise concat_vector(X,Y) shuffle as InsertSubvector instead of PermuteTwoSrc We don't have a concat_vector shuffle kind and improveShuffleKindFromMask won't alter the base type to match it as InsertSubvector. But since this is how X86 will lower concat_vector anyhow, just recognise it explicitly. Another step for #67803	2024-03-21 09:29:39 +00:00
Kolya Panchenko	889d99a50f	[TTI] Add alignment argument to TTI for compress/expand support (#83516 ) Since `llvm.compressstore` and `llvm.expandload` do require memory access, it's essential for some target to check if alignment is good to be able to lower them to target-specific instructions	2024-03-05 20:33:56 -05:00
Nikita Popov	e84182af91	[X86][Inline] Skip inline asm in inlining target feature check (#83820 ) When inlining across functions with different target features, we perform roughly two checks: 1. The caller features must be a superset of the callee features. 2. Calls in the callee cannot use types where the target features would change the call ABI (e.g. by changing whether something is passed in a zmm or two ymm registers). The latter check is very crude right now. The latter check currently also catches inline asm "calls". I believe that inline asm should be excluded from this check, as it is independent from the usual call ABI, and instead governed by the inline asm constraint string. Fixes https://github.com/llvm/llvm-project/issues/67054.	2024-03-05 14:21:33 +01:00
Simon Pilgrim	9978f6a10f	[CostModel][X86] Reduce the extra costs for ICMP complex predicates when an operand is constant In most cases, SETCC lowering will be able to simplify/commute the comparison by adjusting the constant. TODO: We still need to adjust ExtraCost based on CostKind Fixes #80122	2024-02-21 16:19:39 +00:00
Simon Pilgrim	a0869b14cd	[CostModel][X86] Fix expanded CTPOP i8 costs Updated to match #79989 / 9410019ac977141bc73aee19690b5896ded59219	2024-02-21 14:54:50 +00:00
Alexey Bataev	7bc079c852	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837	2024-02-12 07:09:49 -05:00
HaohaiWen	4147b72301	[CostModel][X86] Fix fpext conversion cost for 16 elements (#76278 ) The fpext conversion cost for 16 elements should be 4 from Znver4.	2024-01-09 09:05:11 +08:00
Simon Pilgrim	5842dfe34d	[CostModel][X86] Update SSSE3/AVX1 BSWAP costs Drop atom/slm costs from the default bswap costs, and update the avx1 latency costs based off latest codegen. Based off analysis report from https://github.com/RKSimon/llvm-scripts/check_cost_tables.py Fixes #62659	2024-01-02 12:06:48 +00:00
Alexey Bataev	5096501082	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-28 05:04:04 -08:00
Douglas Yung	fb981e6b4b	Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 )" This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d. Change is failing to build on several bots: - https://lab.llvm.org/buildbot/#/builders/127/builds/60184 - https://lab.llvm.org/buildbot/#/builders/123/builds/23709 - https://lab.llvm.org/buildbot/#/builders/216/builds/32302	2023-12-27 23:52:04 -08:00
Alexey Bataev	bc8c4bbd79	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-27 15:57:21 -05:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Fangrui Song	8e247b8f47	Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC	2023-10-27 00:30:41 -07:00
Phoebe Wang	58d4fe287e	[X86][EVEX512] Do not allow 512-bit memcpy without EVEX512 (#70420 ) Solves crash mentioned in #65920.	2023-10-27 15:26:05 +08:00
Alexey Bataev	e22818d5c9	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-05 06:17:07 -07:00
Simon Pilgrim	baecc9e997	[CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs Add initial half/bfloat broadcast shuffles test coverage (more to follow) Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......	2023-10-05 11:12:40 +01:00
Arthur Eubanks	07389535a7	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit b186f1f68be11630355afb0c08b80374a6d31782. Causes crashes, see https://reviews.llvm.org/D158449.	2023-10-04 14:37:16 -07:00

1 2 3 4 5 ...

872 Commits