llvm-project

Author	SHA1	Message	Date
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into 0aa75fb12faa , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( 267ff79 ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit 4284afdf9432f7d756f56b0ab21d69191adefa8d. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Sanjay Patel	4c7148d75c	[SLP] remove opcode identifier for reduction; NFC Another step towards allowing intrinsics in reduction matching.	2021-01-07 14:07:27 -05:00
Alexey Bataev	4284afdf94	[SLP]Need shrink the load vector after reordering. After merging the shuffles, we cannot rely on the previous shuffle anymore and need to shrink the final shuffle, if it is required. Reported in D92668 Differential Revision: https://reviews.llvm.org/D93967	2021-01-07 04:50:48 -08:00
Oliver Stannard	76f6b125ce	Revert "[llvm] Use BasicBlock::phis() (NFC)" Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140. This reverts commit 9b228f107d43341ef73af92865f73a9a076c5a76.	2021-01-07 09:43:33 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	9b228f107d	[llvm] Use BasicBlock::phis() (NFC)	2021-01-06 18:27:35 -08:00
Sanjay Patel	4c022b5a41	[SLP] use reduction kind's opcode to create new instructions; NFC Similar to 5a1d31a28 - This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-06 14:37:44 -05:00
Sanjay Patel	5d24089a70	[SLP] reduce code for propagating flags on reductions; NFC If we add/change to match intrinsics, this might get more wordy, but there's no need to list each kind currently.	2021-01-06 14:37:44 -05:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Sanjay Patel	6a03f8ab62	[SLP] reduce code for finding reduction costs; NFC We can get both (vector/scalar) costs in a single switch instead of sequentially.	2021-01-05 17:35:54 -05:00
Sanjay Patel	5a1d31a284	[SLP] use reduction kind's opcode for cost model queries; NFC This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-05 15:12:40 -05:00
Sanjay Patel	d4a999b453	[SLP] reduce code duplication; NFC	2021-01-05 15:12:40 -05:00
Sanjay Patel	3b8b2c7da2	[SLP] delete unused pairwise reduction option SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as: /// Pairwise: /// (v0, v1, v2, v3) /// ((v0+v1), (v2+v3), undef, undef) /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements. Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction(). Differential Revision: https://reviews.llvm.org/D93860	2021-01-05 13:23:07 -05:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Sanjay Patel	8ca60db40b	[LoopUtils] reduce FMF and min/max complexity when forming reductions I don't know if there's some way this changes what the vectorizers may produce for reductions, but I have added test coverage with 3567908 and 5ced712 to show that both passes already have bugs in this area. Hopefully this does not make things worse before we can really fix it.	2020-12-30 15:22:26 -05:00
Sanjay Patel	21a3a0225d	[SLP] replace local reduction enum with RecurrenceKind; NFCI I'm not sure if the SLP enum was created before the IVDescriptor RecurrenceDescriptor / RecurrenceKind existed, but the code in SLP is now redundant with that class, so it just makes things more complicated to have both. We eventually call LoopUtils createSimpleTargetReduction() to create reduction ops, so we might as well standardize on those enum names. There's still a question of whether we need to use TTI::ReductionFlags vs. MinMaxRecurrenceKind, but that can be another clean-up step. Another option would just be to flatten the enums in RecurrenceDescriptor into a single enum. There isn't much benefit (smaller switches?) to having a min/max subset.	2020-12-29 14:52:11 -05:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Sanjay Patel	badf0f20f3	[SLP] rename reduction variables for readability; NFC I am hoping to extend the reduction matching code, and it is hard to distinguish "ReductionData" from "ReducedValueData". So extend the tree/root metaphor to include leaves. Another problem is that the name "OperationData" does not provide insight into its purpose. I'm not sure if we can alter that underlying data structure to make the code clearer.	2020-12-26 11:20:25 -05:00
Sanjay Patel	c4ca108966	[SLP] use switch to improve readability; NFC This will get more complicated when we handle intrinsics like maxnum.	2020-12-26 10:59:45 -05:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Sanjay Patel	0d15d4b6f4	[SLP] use operand index abstraction for number of operands I think this is NFC currently, but the bug would be exposed when we allow binary intrinsics (maxnum, etc) as candidates for reductions. The code in matchAssociativeReduction() is using OperationData::getNumberOfOperands() when comparing whether the "EdgeToVisit" iterator is in-bounds, so this code must use the same (potentially offset) operand value to set the "EdgeToVisit".	2020-12-22 16:05:39 -05:00
David Sherwood	3bf7d47a97	[NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174	2020-12-21 09:12:28 +00:00
Sanjay Patel	37d0dda739	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Caroline Concatto	be9184bc55	[SLPVectorizer]Migrate getEntryCost to return InstructionCost This patch also changes: the return type of getGatherCost and the signature of the debug function dumpTreeCosts to use InstructionCost. This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 Depends on D93049 Differential Revision: https://reviews.llvm.org/D93127	2020-12-16 14:18:40 +00:00
Caroline Concatto	07217e0a1b	[CostModel]Migrate getTreeCost() to use InstructionCost This patch changes the type of cost variables (for instance: Cost, ExtractCost, SpillCost) to use InstructionCost. This patch also changes the type of cost variables to InstructionCost in other functions that use the result of getTreeCost() This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D91174 Differential Revision: https://reviews.llvm.org/D93049	2020-12-16 13:08:37 +00:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Anton Afanasyev	e5bf2e8989	[SLP] Use the width of value truncated just before storing For stores chain vectorization we choose the size of vector elements to ensure we fit to minimum and maximum vector register size for the number of elements given. This patch corrects vector element size choosing the width of value truncated just before storing instead of the width of value stored. Fixes PR46983 Differential Revision: https://reviews.llvm.org/D92824	2020-12-09 16:38:45 +03:00
Sanjay Patel	5fe1a49f96	[SLP] fix typo in debug string; NFC	2020-12-07 15:09:21 -05:00
Alexey Bataev	438682de6a	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the ivectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D92668	2020-12-07 07:50:00 -08:00
Sanjay Patel	56fd29e93b	[SLP] use 'match' for binop/select; NFC This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.	2020-12-02 09:04:08 -05:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Sjoerd Meijer	10ad64aa3b	[SLP] Dump Tree costs. NFC. This adds LLVM_DEBUG messages to dump the (intermediate) tree cost calculations, which is useful to trace and see how the final cost is calculated.	2020-11-27 11:37:33 +00:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Benjamin Kramer	4dbe12e866	[SLP] Use the minimum alignment of the load bundle when forming a masked.gather Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in fcad8d3635cff61a2749dcef94c0d51fa1e3e413.	2020-11-18 12:53:39 +01:00
Sanjay Patel	08834979e3	[SLP] avoid unreachable code crash/infloop Example based on the post-commit comments for D88735.	2020-11-17 15:10:23 -05:00
Anton Afanasyev	0a1d315f9f	[SLPVectorizer] Fix assert	2020-11-17 18:46:31 +03:00
Anton Afanasyev	fcad8d3635	[SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic For the scattered operands of load instructions it makes sense to use gathering load intrinsic, which can lower to native instruction for X86/AVX512 and ARM/SVE. This also enables building vectorization tree with entries containing scattered operands. The next step is to add scattered store. Fixes PR47629 and PR47623 Differential Revision: https://reviews.llvm.org/D90445	2020-11-17 18:11:45 +03:00

1 2 3 4 5 ...

785 Commits