llvm-project

Author	SHA1	Message	Date
Thomas	0a7a926007	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65799 ) recommit https://github.com/llvm/llvm-project/pull/65432 with minor bug fix for bitcasts	2023-09-08 13:44:58 -07:00
Dmitri Gribenko	b3a14cac4f	Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 )" This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba. As per PR discussion "Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA."	2023-09-08 19:28:15 +02:00
Thomas	db5d845c73	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 ) On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions. In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2. This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.	2023-09-06 21:59:13 -07:00
Artem Belevich	ef8655adc8	[NVPTX] Adapt tests to make them usable with CUDA-12.x CUDA-12 no longer supports 32-bit compilation. Tests agnostic to 32/64 compilation mode are switched to use nvptx64. Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+. Differential Revision: https://reviews.llvm.org/D152199	2023-06-06 14:22:12 -07:00
Andrew Savonichev	0f1b5f115a	[NVPTX] Integrate ptxas to LIT tests ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc. Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines. LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration. Differential Revision: https://reviews.llvm.org/D121727	2022-04-28 14:59:45 +03:00
Artem Belevich	29bbdc1c32	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784	2017-02-21 22:56:05 +00:00
Nico Rieck	b5262d6d8f	Fix non-deterministic SDNodeOrder-dependent codegen Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code generation. llvm-svn: 199050	2014-01-12 14:09:17 +00:00
Justin Holewinski	dff28d215f	[NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos llvm-svn: 185332	2013-07-01 12:59:01 +00:00

8 Commits