16 Commits

Author SHA1 Message Date
Justin Fargnoli
4d2e1e1c74
Reland "[lit] Refactor available ptxas features" (#155923)
Reland #154439.  Reverted with #155914.

Account for:
- Windows `ptxas` outputting error messages to `stdout` instead of
`stderr`
- Tests in `llvm/test/DebugInfo/NVPTX`
2025-09-02 10:42:25 -07:00
Justin Fargnoli
826780a84e
Revert "[lit] Refactor available ptxas features" (#155914)
Reverts llvm/llvm-project#154439 in order to resolve
https://github.com/llvm/llvm-project/pull/154439#issuecomment-3234638253.
2025-08-28 14:18:58 -07:00
Justin Fargnoli
d77cf579d8
[lit] Refactor available ptxas features (#154439)
ToT `lit` currently assumes that a given `ptxas` version supports all
capabilities of prior `ptxas` releases. This approach was flexible
enough to support the removal of 32-bit address compilation from `ptxas`
in CUDA 12.1, but it struggles with the removal of Volta and prior
compilation in CUDA 13.0.

To deal with this, this PR refactors how `lit` defines the set of
features available for a given `ptxas` version. It invokes `ptxas` not
just to get its version, but also to get the list of supported SMs,
supported PTX ISA versions, and support for 32-bit compilation.

This approach should be flexible enough to deal with the changing
support matrix of `ptxas` as it goes forward. One obvious downside is
that this relies on parsing the `stdout` of `ptxas`, something that's
inherently unstable. But, IMO, this is something that we can fix as
needed.
2025-08-28 09:43:49 -07:00
Alex MacLean
d494eb0fa3
[NVPTX] Skip numbering unreferenced virtual registers (readability) (#154391)
When assigning numbers to registers, skip any with neither uses nor
defs. This is will not have any impact at all on the final SASS but it
makes for slightly more readable PTX. This change should also ensure
that future minor changes are less likely to cause noisy diffs in
register numbering.
2025-08-19 12:27:46 -07:00
Alex MacLean
35693daa70
[NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (#146930) 2025-07-28 10:41:51 -07:00
Alex MacLean
70333de6cf
[NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (#145581)
This change consolidates and cleans up various NVPTXISD target-specific
nodes in order to simplify SDAG ISel. While there are some whitespace
changes in the emitted PTX it is otherwise a non-functional change.

NVPTXISD::Wrapper - This node was used to wrap external-symbol and
global-address nodes. It is redundant and has been removed. Instead we
use the non-target versions of these nodes and convert them
appropriately during ISel.

NVPTXISD::CALL - Much of the family of nodes used to represent a PTX
call instruction have been replaced by this new single node. It
corresponds to a single instruction and is therefore much simpler to
create and lower.
2025-06-25 11:42:21 -07:00
Alex MacLean
76c9bfefa4
[NVPTX] Remove Float register classes (#140487)
These classes are redundant, as the untyped "Int" classes can be used
for all float operations. This change is intended to be as minimal as
possible and leaves the many potential simplifications and refactors
this exposes as future work.
2025-05-21 11:33:57 -07:00
Alex MacLean
831592d617
[NVPTX] Fixup under-aligned dynamic alloca lowering (#139628)
The alignment on a ISD::DYNAMIC_STACKALLOC node may be 0 to indicate
that the default stack alignment should be used. Prior to this change,
we passed this alignment through unchanged leading to an error in
ptxas. Now, we use the stack-alignment in this case. Also did a little
cleanup while I'm here.
2025-05-13 09:56:41 -07:00
Alex MacLean
369891b674
[NVPTX] use untyped loads and stores where ever possible (#137698)
In most cases, the type information attached to load and store
instructions is meaningless and inconsistently applied. We can usually
use ".b" loads and avoid the complexity of trying to assign the correct
type. The one expectation is sign-extending load, which will continue to
use ".s" to ensure the sign extension into a larger register is done
correctly.
2025-05-10 08:26:26 -07:00
peterbell10
0068078dca
[NVPTX] Remove NVPTX::IMAD opcode, and rely on intruction selection only (#121724)
I noticed that NVPTX will sometimes emit `mad.lo` to multiply by 1, e.g.
in https://gcc.godbolt.org/z/4j47Y9W4c.

This happens when DAGCombiner operates on the add before the mul, so the
imad contraction happens regardless of whether the mul could have been
simplified.

To fix this, I remove `NVPTXISD::IMAD` and only combine to mad during
selection. This allows the default DAGCombiner patterns to simplify
the graph without any NVPTX-specific intervention.
2025-01-15 20:09:18 +00:00
Fangrui Song
b279f6b098 [NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.
2024-12-15 10:45:11 -08:00
Youngsuk Kim
0f0a96b862
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
2024-10-19 10:05:36 -04:00
Youngsuk Kim
5a0942cd74
[llvm][NVPTX] Don't emit unused var 'temp_param_reg' (NFC) (#89004)
Don't emit unused variable 'temp_param_reg' which has been around since
ae556d3ef72dfe5f40a337b7071f42b7bf5b66a4 .
2024-04-17 14:45:33 -04:00
Adrian Kuegel
f0a5e50550 [llvm][NVPTX] Add missing feature guard. 2024-03-19 06:53:14 +00:00
Alex MacLean
89b7b3b995
[NVPTX] support dynamic allocas with PTX alloca instruction (#84585)
Add support for dynamically sized alloca instructions with the PTX
alloca instruction introduced in PTX 7.3 
([9.7.15.3. Stack Manipulation Instructions: alloca]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions-alloca))
2024-03-15 11:51:46 -07:00
Youngsuk Kim
f9304974cc
[llvm][NVPTX] Inform that 'DYNAMIC_STACKALLOC' is unsupported (#74684)
Catch unsupported path early up, and emit error with information.

Motivated by the following threads:
* https://discourse.llvm.org/t/nvptx-problems-with-dynamic-alloca/70745
* #64017
2023-12-14 22:06:22 -05:00