llvm-project

Author	SHA1	Message	Date
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Fangrui Song	8e247b8f47	Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC	2023-10-27 00:30:41 -07:00
Phoebe Wang	58d4fe287e	[X86][EVEX512] Do not allow 512-bit memcpy without EVEX512 (#70420 ) Solves crash mentioned in #65920.	2023-10-27 15:26:05 +08:00
Alexey Bataev	e22818d5c9	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-05 06:17:07 -07:00
Simon Pilgrim	baecc9e997	[CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs Add initial half/bfloat broadcast shuffles test coverage (more to follow) Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......	2023-10-05 11:12:40 +01:00
Arthur Eubanks	07389535a7	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit b186f1f68be11630355afb0c08b80374a6d31782. Causes crashes, see https://reviews.llvm.org/D158449.	2023-10-04 14:37:16 -07:00
Alexey Bataev	b186f1f68b	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-04 07:53:30 -07:00
Alexey Bataev	1129dec778	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix a crash reported in https://reviews.llvm.org/D158449.	2023-10-03 13:02:16 -07:00
Alexey Bataev	6f43d28f34	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-03 10:26:11 -07:00
Alexey Bataev	ebcb5d59fc	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.	2023-09-29 15:03:46 -07:00
Alexey Bataev	9f5960e004	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-29 13:16:03 -07:00
Alexey Bataev	3204f88a8b	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.	2023-09-28 11:57:32 -07:00
Alexey Bataev	c88c281cf1	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-28 11:03:21 -07:00
Youngsuk Kim	e5026f0179	[llvm] Remove uses of Type::getPointerTo() (NFC) Partial progress towards removing in-tree uses of `getPointerTo()`, by employing the following options: * Drop the call entirely if the sole purpose of it is to support a no-op bitcast (remove the no-op bitcast as well). * Replace with `PointerType::get()`/`PointerType::getUnqual()` This is a NFC cleanup effort. Reviewed By: barannikov88 Differential Revision: https://reviews.llvm.org/D155232	2023-09-22 19:44:38 -04:00
Alexey Bataev	9a207578ac	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425	2023-08-18 13:47:01 -07:00
XinWang10	993bdb047c	[X86]Support options -mno-gather -mno-scatter Gather instructions could lead to security issues, details please refer to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html. This supported options -mno-gather and -mno-scatter, which could avoid generating gather/scatter instructions in backend except using intrinsics or inline asms. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D157680	2023-08-17 23:02:25 -07:00
Simon Pilgrim	bbfdb8cc2d	[CostModel][X86] Add scalar rotate-by-immediate costs As noted on #63980 rotate by immediate amounts is much cheaper than variable amounts. This still needs to be expanded to vector rotate cases, and we need to add reasonable funnel-shift costs as well (very tricky as there's a huge range in CPU behaviour for these).	2023-07-27 16:54:30 +01:00
Simon Pilgrim	9da119a6a6	[X86] getIntImmCostInst - avoid repeating getNumOperands() in for-loop (style). NFC.	2023-07-23 15:49:33 +01:00
Simon Pilgrim	1ebc965116	[X86] getIntImmCostInst - silence static analyzer overflow warning. NFCI. Use the divideCeil uint64_t return type directly	2023-07-23 15:49:33 +01:00
Phoebe Wang	f11526b091	[X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16 Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-22 18:16:49 +08:00
Phoebe Wang	fbae3d1d3c	Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI" This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360. It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870	2023-07-21 23:29:11 +08:00
Phoebe Wang	ca1c05208e	[X86][BF16] Do not scalarize masked load for BF16 when we have BWI Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D155952	2023-07-21 23:18:54 +08:00
David Green	12025cef3e	[CostModel] Use min/max intrinsics for vecreduce.min/max costs This changes the costmodelling of the vecreduce.min/max nodes to use the costs of the relevant min/max intrinsics instead of expanding them to compare and selects. The getMinMaxReductionCost have changed to take a Opcode for the relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are no longer needed. A follow up patch will add some basic fminimum/fmaximum costmodelling. Differential Revision: https://reviews.llvm.org/D153547	2023-07-04 15:02:30 +01:00
Luke Lau	a68dcd09e8	[TTI] Use users of GEP to guess access type in getGEPCost Currently getGEPCost uses the target type of the GEP as a heuristic for the type that will be accessed, to pass onto isLegalAddressingMode. Targets use this to work out if a GEP can then be folded into the load/store instruction that uses the GEP. For example, on RISC-V loads and stores can have an offset added to a base register folded into a single instruction, so the following GEP is free: %p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0 %x = load i32, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ lw t0, a0(42) However vector loads and stores cannot have an offset folded into them, so the following GEP is costed: %p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1 %x = load <2 x i32>, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ addi a0, 42 vle32 v8, (a0) The issue arises whenever there is a mismatch between the target type of the GEP and the type that is actually accessed: %p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0 %x = load <2 x i32>, ptr %p ; getInstructionCost = 1 ------------------------------------------------------------------------ addi a0, 42 vle32 v8, (a0) Even though this GEP will result in an add instruction, because TTI thinks it's loading an i32, it will think it can be folded and not charge for it. The target type can become mismatched with the memory access during transformations, noticeably during SLP where a scalar base pointer will be reused to perform a vector load or store. This patch adds an optional AccessType argument to getGEPCost which allows the type of memory accessed by users to be passed in as a hint, so that we can more accurately determine if the GEP can be folded into its users. If AccessType is not provided, getGEPCost falls back to the old behaviour of using the PointeeType to guess the memory access type. This can be revisited in a later patch. Also for now, only GEPs with exactly one user use the access type hint. Whilst we could look through all users and use all access types to determine if we can fold the GEP, this patch avoids doing so to prevent O(N) behaviour. Differential Revision: https://reviews.llvm.org/D149889	2023-06-29 13:44:37 +01:00
Simon Pilgrim	595a74391d	[CostModel][X86] Tweak SSE2 v2i64 multiply costs based off D46276 script It looks like we were trying to account for SLM costs, which are actually handled separately Fixes #62969	2023-06-14 11:06:15 +01:00
Simon Pilgrim	e64f9140c5	[TTI][X86] Recognise PMULUDQ costs for vXi64 multiplies Addresses part of Issue #62969 - if the upper 32-bits of the vXi64 elements are known to be zero, then a multiply simplifies to a single (fast) PMULUDQ instruction We still have the problem that minRequiredElementSize can't determine that the upper bits are zero for the test case from Issue #62969 - I'll take a look at that next.	2023-06-14 10:34:02 +01:00
Luke Lau	c27a0b21c5	[SLP][RISCV] Account for offset folding in getPointersChainCost For a GEP in a pointer chain, if: 1) a pointer chain is unit-strided 2) the base pointer wasn't folded and is sitting in a register somewhere 3) the distance between the GEP and the base pointer is small enough and can be folded into the addressing mode of the using load/store Then we can exclude that GEP from the total cost of the pointer chain, as it will likely be folded away. In order to check if 3) holds, we need to know the type of memory access being made by the users of the pointer chain. For that, we need to pass along a new argument to getPointersChainCost. (Using the source pointer type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for more details). Also note that 2) is currently an assumption, and could be modelled more accurately. This prevents some unprofitable cases from being SLP vectorized on RISC-V by making the scalar costs cheaper and closer to the actual codegen. For now the getPointersChainCost hook is duplicated for RISC-V to prevent disturbing other targets, but could be merged back in and shared with other targets in a following patch. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D149654	2023-05-22 13:55:30 +01:00
ManuelJBrito	d22edb9794	[IR][NFC] Change UndefMaskElem to PoisonMaskElem Following the change in shufflevector semantics, poison will be used to represent undefined elements in shufflevector masks. Differential Revision: https://reviews.llvm.org/D149256	2023-04-27 18:01:54 +01:00
Simon Pilgrim	aca5f9aeea	[CostModel][X86] getMemoryOpCost - increase cost of sub-32-bit vector load/stores For 8-bit/16-bit vector loads/stores we scalarize and transfer to/from the vector unit, or use the (usually slow) PINSR/PEXTR instructions. Fixes #59867	2023-04-23 21:48:25 +01:00
Simon Pilgrim	fed28ada47	[CostModel][X86] Add i64 MUL latency/codesize/size-latency cost estimates	2023-04-21 15:55:22 +01:00
Simon Pilgrim	ceccc59aac	[CostModel][X86] Add i32 MUL latency/codesize/size-latency cost estimates	2023-04-21 15:42:41 +01:00
Simon Pilgrim	3e9d046bfc	[CostModel][X86] Improve i16 and vXi16 MUL costs Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates	2023-04-21 15:42:40 +01:00
Simon Pilgrim	4060042384	[CostModel][X86] Improve i8 and vXi8 MUL costs We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates Helps address some of the regressions identified in D148806	2023-04-20 19:38:51 +01:00
Alexey Bataev	0e1312fbe0	[SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert. There are 2 problems in the cost estimation for buildvector/gather. 1. If the buildvector/gather node is the same as another one node, need to estimate the cost of this node as 0. 2. The cost of inserting float point register to non-poison vector is not 0, it should not be considered free. Differential Revision: https://reviews.llvm.org/D148801	2023-04-20 09:34:46 -07:00
Simon Pilgrim	16808117c3	[CostModel][X86] Add BSWAP cost model estimations Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates	2023-04-18 16:04:59 +01:00
Simon Pilgrim	c1af46cc20	[CostModel][X86] Add BITREVERSE cost model estimations Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates	2023-04-18 11:25:26 +01:00
Simon Pilgrim	48fca4b6f3	[CostModel][X86] Add latency/code-size/size-latency target costs for minnum/maxnum intrinsics Using the latest version of the script from D103695 to compare costmodel vs llvm-mca statistics. Avoids using the default costs, which was assuming libm calls.	2023-04-13 18:07:03 +01:00
Simon Pilgrim	2bfd7a07b3	[TTI][X86] getMinMaxCost - use existing float minnum/maxnum intrinsic cost values instead of maintaining a duplicate cost table Without fastmath (nnan) flags, minnum/maxnum must perform isnan handling as well as fmin/fmax - meaning the costs are notably higher, this is correctly handled in getIntrinsicInstrCost but was missing from the getMinMaxCost cost tables (which assumed fastmath). Followup to 63c3895327839ba5b57f5b99ec9e888abf976ac6 which handled the integer cases	2023-04-13 11:58:38 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
Simon Pilgrim	63c3895327	[TTI][X86] getMinMaxCost - use existing integer min/max intrinsic cost values instead of maintaining a duplicate cost table getMinMaxCost has an alternative set of min/max costs to getIntrinsicInstrCost that are only used by getMinMaxReductionCost, but are a lot less thorough and fallback to an expansion in most cases resulting in cost overestimations - we're better off just using getIntrinsicInstrCost. getIntrinsicInstrCost is still missing complete FMINNUM/FMAXNUM costs, so until then getMinMaxCost will still be used for these, after that we can remove getMinMaxCost and have getMinMaxReductionCost call getIntrinsicInstrCost directly. Fixes regression noticed in D148036	2023-04-12 15:33:12 +01:00
Simon Pilgrim	4b5a4d4814	[X86] Cleanup reduction cost table names. NFC. We merged the costs for split/pairwise reductions sometime ago.	2023-04-12 15:33:12 +01:00
Wang, Xin10	7bb14f196b	[X86] Remove unreachable code in X86TargetTransformInfo.cpp In Function getVectorInstrCost, situation Opcode == Instruction::ExtractElement and Opcode == Instruction::InsertElement are all handled in the first 2 if-statements, So we have no chance for the code in line 4401. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D145908	2023-03-17 00:33:06 -04:00
Valery N Dmitriev	4c2299003f	[TTI] Add X86 target specific version of getPointersChainCost. When all the pointers are off the same base address and have known distances to each other these differences can be encoded into displacements in x86 arch. So the only cost that matters is cost of the base GEP. Differential Revision: https://reviews.llvm.org/D146102	2023-03-16 10:26:50 -07:00
Luke Lau	b02b1e0ed6	[LV][NFC] Use ElementCount for getMaxInterleaveFactor In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor. This is based off of the approach used here: `8d36708507` The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch. See https://reviews.llvm.org/D143723#4132349 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144474	2023-02-22 10:15:05 +00:00
Kazu Hirata	a7baaab952	Use APInt::isZero instead of APInt::isNulLValue (NFC) Note that APInt::isNullValue has been soft-deprecated in favor of APInt::isZero.	2023-02-19 22:23:58 -08:00
Kazu Hirata	cbde2124f1	Use APInt::popcount instead of APInt::countPopulation (NFC) This is for consistency with the C++20-style bit manipulation functions in <bit>.	2023-02-19 11:29:12 -08:00
ShihPo Hung	5fb3a57ea7	[Cost] Add CostKind to getVectorInstrCost and its related users LoopUnroll estimates the loop size via getInstructionCost(), but getInstructionCost() cannot pass CostKind to getVectorInstrCost(). And so does getShuffleCost() to getBroadcastShuffleOverhead(), getPermuteShuffleOverhead(), getExtractSubvectorOverhead(), and getInsertSubvectorOverhead(). To address this, this patch adds an argument CostKind to these functions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D142116	2023-01-21 05:29:24 -08:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
Alexey Bataev	f698c21345	[X86][NFC]Move and rephrase the comment, NFC	2023-01-10 04:35:11 -08:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00

1 2 3 4 5 ...

828 Commits