13 Commits

Author SHA1 Message Date
Alex MacLean
11fba35916
[NVPTX] Add SimplifyDemandedBitsForTargetNode for PRMT (#149395) 2025-07-22 18:44:50 -07:00
Alex MacLean
86203b6b33
[NVPTX] Use PRMT more widely, and improve folding around this instruction (#148261)
Replace uses of BFE with PRMT when lowering v4i8 vectors. This will
generally lead to equivalent or better SASS and reduces the number of
target specific operations we need to represent.
(https://cuda.godbolt.org/z/M75W6f8xd) Also implement KnownBits tracking
for PRMT allowing elimination of redundant AND instructions when
lowering various i8 operations.
2025-07-13 15:06:53 -07:00
Alex MacLean
76c9bfefa4
[NVPTX] Remove Float register classes (#140487)
These classes are redundant, as the untyped "Int" classes can be used
for all float operations. This change is intended to be as minimal as
possible and leaves the many potential simplifications and refactors
this exposes as future work.
2025-05-21 11:33:57 -07:00
Alex MacLean
369891b674
[NVPTX] use untyped loads and stores where ever possible (#137698)
In most cases, the type information attached to load and store
instructions is meaningless and inconsistently applied. We can usually
use ".b" loads and avoid the complexity of trying to assign the correct
type. The one expectation is sign-extending load, which will continue to
use ".s" to ensure the sign extension into a larger register is done
correctly.
2025-05-10 08:26:26 -07:00
Alex MacLean
b6f32ad8b0
[NVPTX] Switch to untyped float registers (#137011)
Register types in PTX are simply syntactic sugar and emitting them has
added lots of unnecessary complexity to the NVPTX backend. This change
takes the first step to their removal by using ".b" registers instead of
".f" in all cases. This should shake out any potential issues or bugs in
ptxas preventing full removal and pre-fetches many of the required test
updates.
2025-04-23 15:37:38 -07:00
Drew Kersnar
932d9c13fa
[NVPTX] Generalize and extend upsizing when lowering 8/16-bit-element vector loads/stores (#119622)
This addresses the following issue I opened:
https://github.com/llvm/llvm-project/issues/118851.

This change generalizes the Type Legalization mechanism that currently
handles `v8[i/f/bf]16` upsizing to include loads _and_ stores of `v8i8`
+ `v16i8`, allowing all of the mentioned vectors to be lowered to ptx as
vectors of `b32`. This extension also allows us to remove the DagCombine
that only handled exactly `load v16i8`, thus centralizing all the
upsizing logic into one place.

Test changes include adding v8i8, v16i8, and v8i16 cases to
load-store.ll, and updating the CHECKs for other tests to match the
improved codegen.
2024-12-17 15:23:22 -08:00
Youngsuk Kim
0f0a96b862
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
2024-10-19 10:05:36 -04:00
Artem Belevich
26b786ae2f
[NVPTX] Restrict combining to properly aligned v16i8 vectors. (#107919)
Fixes generation of invalid loads leading to misaligned access errors.
The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP
to produce `v16i8` vectors.

Also updated the tests to use automatic check generator.
2024-09-09 16:15:00 -07:00
Pierre-Andre Saulais
0b80288e9e [NVPTX] Preserve v16i8 vector loads when legalizing
This is done by lowering v16i8 loads into LoadV4 operations with i32
results instead of letting ReplaceLoadVector split it into smaller
loads during legalization. This is done at dag-combine1 time, so that
vector operations with i8 elements can be optimised away instead of
being needlessly split during legalization, which involves storing to
the stack and loading it back.
2023-10-19 12:34:25 +01:00
Nikita Popov
9b81548a68 [NVPTX] Convert some tests to opaque pointers (NFC) 2022-12-19 12:57:23 +01:00
Andrew Savonichev
0f1b5f115a [NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727
2022-04-28 14:59:45 +03:00
Artem Belevich
620db1f3dd [NVPTX] Added support for .f16x2 instructions.
This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

llvm-svn: 296032
2017-02-23 22:38:24 +00:00
Justin Lebar
cd564c6b46 [NVPTX] Enable the load-store vectorizer on nvptx.
Reviewers: tra

Subscribers: jholewinski, arsenm, asbirlea

Differential Revision: https://reviews.llvm.org/D22592

llvm-svn: 276196
2016-07-20 22:11:36 +00:00