llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	087b67cc06	[AMDGPU][LoadStoreVectorizer] Pre-commit test for addrspace 7 crash Differential Revision: https://reviews.llvm.org/D151751	2023-05-30 21:15:21 +00:00
Justin Lebar	420cf6927c	[LSV] Return same bitwidth from getConstantOffset. Previously, getConstantOffset could return an APInt with a different bitwidth than the input pointers. For example, we might be loading an opaque 64-bit pointer, but stripAndAccumulateInBoundsConstantOffsets might give a 32-bit offset. This was OK in most cases because in gatherChains, we casted the APInt back to the original ASPtrBits. But it was not OK when considering selects. We'd call getConstantOffset twice and compare the resulting APInt's, which might not have the same bit width. This fixes that. Now getConstantOffset always returns offsets with the correct width, so we don't need the hack of casting it in gatherChains, and it works correctly when we're handling selects. Differential Revision: https://reviews.llvm.org/D151640	2023-05-29 08:43:47 -07:00
Justin Lebar	f225471c68	[LSV] Fix the ContextInst for computeKnownBits. Previously we used the later of GEPA or GEPB. This is hacky because really we should be using the later of the two load/store instructions being considered. But also it's flat-out incorrect, because GEPA and GEPB might be in different BBs, in which case we cannot ask which one comes last (assertion failure, https://reviews.llvm.org/D149893#4378332). Fixed, now we use the correct context instruction. Differential Revision: https://reviews.llvm.org/D151630	2023-05-28 08:00:52 -07:00
Justin Lebar	2be0abb7fe	Rewrite load-store-vectorizer. The motivation for this change is a workload generated by the XLA compiler targeting nvidia GPUs. This kernel has a few hundred i8 loads and stores. Merging is critical for performance. The current LSV doesn't merge these well because it only considers instructions within a block of 64 loads+stores. This limit is necessary to contain the O(n^2) behavior of the pass. I'm hesitant to increase the limit, because this pass is already one of the slowest parts of compiling an XLA program. So we rewrite basically the whole thing to use a new algorithm. Before, we compared every load/store to every other to see if they're consecutive. The insight (from tra@) is that this is redundant. If we know the offset from PtrA to PtrB, then we don't need to compare PtrC to both of them in order to tell whether C may be adjacent to A or B. So that's what we do. When scanning a basic block, we maintain a list of chains, where we know the offset from every element in the chain to the first element in the chain. Each instruction gets compared only to the leaders of all the chains. In the worst case, this is still O(n^2), because all chains might be of length 1. To prevent compile time blowup, we only consider the 64 most recently used chains. Thus we do no more comparisons than before, but we have the potential to make much longer chains. This rewrite affects many tests. The changes to tests fall into two categories. 1. The old code had what appears to be a bug when deciding whether a misaligned vectorized load is fast. Suppose TTI reports that load <i32 x 4> align 4 has relative speed 1, and suppose that load i32 align 4 has relative speed 32. The intent of the code seems to be that we prefer the scalar load, because it's faster. But the old code would choose the vectorized load. accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not even call into TTI to get the relative speed), because the scalar load is aligned. After this patch, we will prefer the scalar load if it's faster. 2. This patch changes the logic for how we vectorize. Usually this results in vectorizing more. Explanation of changes to tests: - AMDGPU/adjust-alloca-alignment.ll: #1 - AMDGPU/flat_atomic.ll: #2, we vectorize more. - AMDGPU/int_sideeffect.ll: #2, there are two possible locations for the call to @foo, and the pass is brittle to this. Before, we'd vectorize in case 1 and not case 2. Now we vectorize in case 2 and not case 1. So we just move the call. - AMDGPU/adjust-alloca-alignment.ll: #2, we vectorize more - AMDGPU/insertion-point.ll: #2 we vectorize more - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117d476, which appear to have hit the bug from #1) - AMDGPU/multiple_tails.ll: #1 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above). - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like #2, we vectorize more. - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above). - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split) - X86/correct-order.ll: #2, we vectorize more, probably because of changes to the chain-splitting logic. - X86/subchain-interleaved.ll: #2, we vectorize more - X86/vector-scalar.ll: #2, we can now vectorize scalar float + <1 x float> - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical. It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0. - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll Differential Revision: https://reviews.llvm.org/D149893	2023-05-26 15:15:39 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Artem Belevich	faa631f939	[LSV] Improve chain splitting in some corner cases. Currently we happen to split a chain of 12xi8 accesses into 6xi8 + 6xi8, which produces rather suboptimal code. This change attempts to split-off non-multiples of 4bytes at the end and if that does not work, splits on the smaller power-of-2 boundary. Differential Revision: https://reviews.llvm.org/D147976	2023-04-17 13:42:00 -07:00
Krzysztof Drewniak	916425b2d1	[llvm] Use pointer index type for more GEP offsets (pre-codegen) Many uses of getIntPtrType() were using that type to calculate the neened type for GEP offset arguments. However, some time ago, DataLayout was extended to support pointers where the size of the pointer is not equal to the size of the values used to index it. Much code was already migrated to, for example, use getIndexSizeInBits instead of getPtrSizeInBits, but some rewrites still used getIntPtrType() to get the type for GEP offsets. This commit changes uses of getIntPtrType() to getIndexType() where they are involved in a GEP-related calculation. In at least one case (bounds check insertion) this resolves a compiler crash that the new test added here would previously trigger. This commit does not impact - C library-related rewriting (memcpy()), which are operating under the assumption that intptr_t == size_t. While all the mechanisms for breaking this assumption now exist, doing so is outside the scope of this commit. - Code generation and below. Note that the use of getIntPtrType() in CodeGenPrepare will be changed in a future commit. - Usage of getIntPtrType() in any backend Depends on D143435 Reviewed By: arichardson Differential Revision: https://reviews.llvm.org/D143437	2023-03-28 16:41:02 +00:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
Nikita Popov	5867241eac	[Transforms] Convert some tests to opaque pointers (NFC)	2023-01-06 12:14:45 +01:00
Nikita Popov	0d18d36b18	[LoadStoreVectorizer] Convert tests to opaque pointers (NFC)	2022-12-27 13:13:56 +01:00
Nikita Popov	314d0dbb20	[LoadStoreVectorize] Regenerate test checks (NFC)	2022-12-27 13:08:57 +01:00
Nikita Popov	ba1759c498	[LoadStoreVectorizer] Convert some tests to opaque pointers (NFC)	2022-12-27 12:57:01 +01:00
Roman Lebedev	c37dfd0fae	[NFC] Port last few Transforms tests to `-passes=` syntax	2022-12-09 02:07:27 +03:00
Roman Lebedev	b1a9584818	[opt] Disincentivize new tests from using old pass syntax Over the past day or so, i've took a large swing at our tests, and reduced the number of tests that were still using the old syntax from ~1800 to just 200. Left to handle: (as it is seen in this patch) * Transforms/LSR * Transforms/CGP * Transforms/TypePromotion * Transforms/HardwareLoops * Analysis/* * some misc. I think this is the right point to start actively refusing to honor the old syntax, except for the old tests, to prevent the old syntax from creeping back in. Thus, let's add temporary default-off flag, and if it is not passed refuse to accept old syntax. The tests that still need porting are annotated with this flag. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D139647	2022-12-08 23:54:03 +03:00
Roman Lebedev	d6e7e477ee	[NFC] Port all LoadStoreVectorizer tests to `-passes=` syntax	2022-12-08 02:38:45 +03:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Arthur Eubanks	d3d8465446	[opt] Stop treating alias analysis specially when translating legacy opt syntax I've attempted to keep AA tests as close to their original intent as possible.	2022-10-07 11:50:43 -07:00
Johannes Doerfert	1fb415fee9	[AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers The original code relied on the fact that we needed a bitcast instruction (for non constant base objects). With opaque pointers there might not be a bitcast. Always check if reordering is required instead. Fixes: https://github.com/llvm/llvm-project/issues/54896 Differential Revision: https://reviews.llvm.org/D123694	2022-04-15 13:42:46 -05:00
Stanislav Mekhanoshin	a41a676e8a	[AMDGPU] Check SI LDS offset bug in the allowsMisalignedMemoryAccesses Differential Revision: https://reviews.llvm.org/D123268	2022-04-06 18:05:02 -07:00
Benjamin Kramer	0776f6e04d	[LSV] Vectorize loads of vectors by turning it into a larger vector Use shufflevector to do the subvector extracts. This allows a lot more load merging on AMDGPU and also on NVPTX when <2 x half> is involved. Differential Revision: https://reviews.llvm.org/D117219	2022-01-26 11:38:41 +01:00
Nikita Popov	330cb03269	[LoadStoreVectorizer] Check for guaranteed-to-transfer (PR52950) Rather than checking for nounwind in particular, make sure the instruction is guaranteed to transfer execution, which will also handle non-willreturn calls correctly. Fixes https://github.com/llvm/llvm-project/issues/52950.	2022-01-03 10:55:47 +01:00
hyeongyu kim	cf284f6c5e	[LSV] Change the default value of InstertElement to poison This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior. Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value. Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111005	2021-10-03 17:57:34 +09:00
Nikita Popov	90ec6dff86	[OpaquePtr] Forbid mixing typed and opaque pointers Currently, opaque pointers are supported in two forms: The -force-opaque-pointers mode, where all pointers are opaque and typed pointers do not exist. And as a simple ptr type that can coexist with typed pointers. This patch removes support for the mixed mode. You either get typed pointers, or you get opaque pointers, but not both. In the (current) default mode, using ptr is forbidden. In -opaque-pointers mode, all pointers are opaque. The motivation here is that the mixed mode introduces additional issues that don't exist in fully opaque mode. D105155 is an example of a design problem. Looking at D109259, it would probably need additional work to support mixed mode (e.g. to generate GEPs for typed base but opaque result). Mixed mode will also end up inserting many casts between i8* and ptr, which would require significant additional work to consistently avoid. I don't think the mixed mode is particularly valuable, as it doesn't align with our end goal. The only thing I've found it to be moderately useful for is adding some opaque pointer tests in between typed pointer tests, but I think we can live without that. Differential Revision: https://reviews.llvm.org/D109290	2021-09-10 15:18:23 +02:00
Nikita Popov	9d720dcb89	[LoadStoreVectorizer] Make aliasing check more precise The load store vectorizer currently uses isNoAlias() to determine whether memory-accessing instructions should prevent vectorization. However, this only works for loads and stores. Additionally, a couple of intrinsics like assume are special-cased to be ignored. Instead use getModRefInfo() to generically determine whether the instruction accesses/modifies the relevant location. This will automatically handle all inaccessiblememonly intrinsics correctly (as well as other calls that don't modref for other reasons). This requires generalizing the code a bit, as it was previously only considering loads and stored in particular. Differential Revision: https://reviews.llvm.org/D109020	2021-09-01 18:10:09 +02:00
Nikita Popov	dc37f5374c	[LoadStoreVectorizer] Add test for inaccessiblememonly call (NFC)	2021-08-31 22:12:45 +02:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Justin Bogner	e7d26aceca	Change the context instruction for computeKnownBits in LoadStoreVectorizer pass This change enables cases for which the index value for the first load/store instruction in a pair could be a function argument. This allows using llvm.assume to provide known bits information in such cases. Patch by Viacheslav Nikolaev. Thanks! Differential Revision: https://reviews.llvm.org/D101680	2021-05-12 15:29:29 -07:00
Justin Bogner	9542721085	Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass Patch by Viacheslav Nikolaev. Thanks!	2021-04-30 13:39:46 -07:00
Juneyoung Lee	db7a2f347f	Precommit transform tests that have poison as insertelement's placeholder This commit copies existing tests at llvm/Transforms and replaces 'insertelement undef' in those files with 'insertelement poison'. (see https://reviews.llvm.org/D93586) Tests listed using this script: grep -R -E '^[^;]insertelement <.> undef,' . \| cut -d":" -f1 \| uniq \| wc -l Tests updated: file_org=llvm/test/Transforms/$1 file=${file_org%.ll}-inseltpoison.ll cp $file_org $file sed -i -E 's/^([^;])insertelement <(.)> undef/\1insertelement <\2> poison/g' $file head -1 $file \| grep "Assertions have been autogenerated by utils/update_test_checks.py" -q if [ "$?" == 1 ]; then echo "$file : should be manually updated" # I manually updated the script exit 1 fi python3 ./llvm/utils/update_test_checks.py --opt-binary=./build-releaseassert/bin/opt $file	2020-12-24 11:46:17 +09:00
Stanislav Mekhanoshin	ca4bf58e4e	[AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669	2020-12-22 16:12:31 -08:00
Mircea Trofin	f9a27df16b	[FileCheck] Enforce --allow-unused-prefixes=false for llvm/test/Transforms Explicitly opt-out llvm/test/Transforms/Attributor. Verified by flipping the default value of allow-unused-prefixes and observing that none of the failures were under llvm/test/Transforms. Differential Revision: https://reviews.llvm.org/D92404	2020-12-09 08:51:38 -08:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Arthur Eubanks	9bb6ce78be	Rename scoped-noalias -> scoped-noalias-aa Summary: To match NewPM name. Also the new name is clearer and more consistent. Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84542	2020-07-24 12:14:27 -07:00
Fangrui Song	f31811f2dc	[BasicAA] Rename deprecated -basicaa to -basic-aa Follow-up to D82607 Revert an accidental change (empty.ll) of D82683	2020-06-26 20:41:37 -07:00
Arthur Eubanks	9c56e94a9f	[NPM] Bail out when -foo and --passes=foo are both specified Summary: Currently when --passes is used, any passes specified via -foo are ignored. Explicitly bail out when that happens. This requires changing some tests. Most were straightforward, but codegenprepare-produced-address-math.ll is tricky. One of its RUNs runs CodeGenPrepare. I tried porting CodeGenPrepare to the NPM, but ended up getting stuck when I needed a TargetMachine. NPM doesn't have support for MachineFunctions yet. So I just deleted that RUN line, since it was mass-added in https://reviews.llvm.org/D54848 and is likely not that useful. Reviewers: echristo, hans Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82271	2020-06-22 08:27:13 -07:00
Volkan Keles	63081dc6f6	LoadStoreVectorizer: Match nested adds to prove vectorization is safe If both OpA and OpB is an add with NSW/NUW and with the same LHS operand, we can guarantee that the transformation is safe if we can prove that OpA won't overflow when IdxDiff added to the RHS of OpA. Review: https://reviews.llvm.org/D79817	2020-05-18 12:13:01 -07:00
Matt Arsenault	3f465d0d36	AMDGPU: Fix broken check lines	2020-04-01 10:52:22 -07:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Stanislav Mekhanoshin	6fe00a21f2	Handle casts changing pointer size in the vectorizer Added code to truncate or shrink offsets so that we can continue base pointer search if size has changed along the way. Differential Revision: https://reviews.llvm.org/D65612 llvm-svn: 367646	2019-08-02 04:03:37 +00:00
Stanislav Mekhanoshin	eee9312a85	Relax load store vectorizer pointer strip checks The previous change to fix crash in the vectorizer introduced performance regressions. The condition to preserve pointer address space during the search is too tight, we only need to match the size. Differential Revision: https://reviews.llvm.org/D65600 llvm-svn: 367624	2019-08-01 22:18:56 +00:00
Stanislav Mekhanoshin	ba1e845c21	[AMDGPU] Fix for vectorizer crash with pointers of different size When vectorizer strips pointers it can eventually end up with pointers of two different sizes, then SCEV will crash. Differential Revision: https://reviews.llvm.org/D65480 llvm-svn: 367443	2019-07-31 16:33:11 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00

1 2 3

104 Commits