Reland #154439. Reverted with #155914.
Account for:
- Windows `ptxas` outputting error messages to `stdout` instead of
`stderr`
- Tests in `llvm/test/DebugInfo/NVPTX`
ToT `lit` currently assumes that a given `ptxas` version supports all
capabilities of prior `ptxas` releases. This approach was flexible
enough to support the removal of 32-bit address compilation from `ptxas`
in CUDA 12.1, but it struggles with the removal of Volta and prior
compilation in CUDA 13.0.
To deal with this, this PR refactors how `lit` defines the set of
features available for a given `ptxas` version. It invokes `ptxas` not
just to get its version, but also to get the list of supported SMs,
supported PTX ISA versions, and support for 32-bit compilation.
This approach should be flexible enough to deal with the changing
support matrix of `ptxas` as it goes forward. One obvious downside is
that this relies on parsing the `stdout` of `ptxas`, something that's
inherently unstable. But, IMO, this is something that we can fix as
needed.
When assigning numbers to registers, skip any with neither uses nor
defs. This is will not have any impact at all on the final SASS but it
makes for slightly more readable PTX. This change should also ensure
that future minor changes are less likely to cause noisy diffs in
register numbering.
This change consolidates and cleans up various NVPTXISD target-specific
nodes in order to simplify SDAG ISel. While there are some whitespace
changes in the emitted PTX it is otherwise a non-functional change.
NVPTXISD::Wrapper - This node was used to wrap external-symbol and
global-address nodes. It is redundant and has been removed. Instead we
use the non-target versions of these nodes and convert them
appropriately during ISel.
NVPTXISD::CALL - Much of the family of nodes used to represent a PTX
call instruction have been replaced by this new single node. It
corresponds to a single instruction and is therefore much simpler to
create and lower.
These classes are redundant, as the untyped "Int" classes can be used
for all float operations. This change is intended to be as minimal as
possible and leaves the many potential simplifications and refactors
this exposes as future work.
The alignment on a ISD::DYNAMIC_STACKALLOC node may be 0 to indicate
that the default stack alignment should be used. Prior to this change,
we passed this alignment through unchanged leading to an error in
ptxas. Now, we use the stack-alignment in this case. Also did a little
cleanup while I'm here.
In most cases, the type information attached to load and store
instructions is meaningless and inconsistently applied. We can usually
use ".b" loads and avoid the complexity of trying to assign the correct
type. The one expectation is sign-extending load, which will continue to
use ".s" to ensure the sign extension into a larger register is done
correctly.
I noticed that NVPTX will sometimes emit `mad.lo` to multiply by 1, e.g.
in https://gcc.godbolt.org/z/4j47Y9W4c.
This happens when DAGCombiner operates on the add before the mul, so the
imad contraction happens regardless of whether the mul could have been
simplified.
To fix this, I remove `NVPTXISD::IMAD` and only combine to mad during
selection. This allows the default DAGCombiner patterns to simplify
the graph without any NVPTX-specific intervention.
Similar to 806761a7629df268c8aed49657aeccffa6bca449
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.