8 Commits

Author SHA1 Message Date
Thomas
0a7a926007
[NVPTX] Make i16x2 a native type and add supported vec instructions (#65799)
recommit https://github.com/llvm/llvm-project/pull/65432 with minor bug
fix for bitcasts
2023-09-08 13:44:58 -07:00
Dmitri Gribenko
b3a14cac4f Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)"
This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba.

As per PR discussion "Looks like we've missed lowering of bitcasts
between v2f16 and v2i16 and it breaks XLA."
2023-09-08 19:28:15 +02:00
Thomas
db5d845c73
[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432)
On sm_90 some instructions now support i16x2 which allows hardware to
execute more efficiently add, min and max instructions.

In order to support that we need to make i16x2 a native type in the
backend. This does the necessary changes to make i16x2 a native type and
adds support for the instructions natively supporting i16x2.

This caused a negative test in nvptx slp to start passing. Changed the
test to a positive one as the IR is correctly vectorized.
2023-09-06 21:59:13 -07:00
Artem Belevich
ef8655adc8 [NVPTX] Adapt tests to make them usable with CUDA-12.x
CUDA-12 no longer supports 32-bit compilation.

Tests agnostic to 32/64 compilation mode are switched to use nvptx64.
Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+.

Differential Revision: https://reviews.llvm.org/D152199
2023-06-06 14:22:12 -07:00
Andrew Savonichev
0f1b5f115a [NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727
2022-04-28 14:59:45 +03:00
Artem Belevich
29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
Nico Rieck
b5262d6d8f Fix non-deterministic SDNodeOrder-dependent codegen
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
2014-01-12 14:09:17 +00:00
Justin Holewinski
dff28d215f [NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos
llvm-svn: 185332
2013-07-01 12:59:01 +00:00