4 Commits

Author SHA1 Message Date
Justin Fargnoli
312055d1da
[NVPTX] Fix ptxas failures (NFC) (#125147)
Note:
[lower-args.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-649d37d1f897d829fb809025437ba5df2e0c8da8395bbac7be713cd8f5bd8237)
and
[kernel-param-align.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-31f196478b41b95b51298eb8e2efccc8a6f1156f13b648c07db27dd09579f74e)
fail because`ptxas` doesn't support constant pointers in separate
complication mode (`-c`).
2025-02-01 16:44:37 -08:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Fangrui Song
b279f6b098 [NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.
2024-12-15 10:45:11 -08:00
Lewis Crawford
6d058317e6
Enable .ptr .global .align attributes for kernel attributes for CUDA (#114874)
Emit .ptr, .address-space, and .align attributes for kernel
args in CUDA (previously handled only for OpenCL).

This allows for more vectorization opportunities if the PTX consumer
is able to know about the pointer alignments.

If no alignment is explicitly specified, .align 1 will be emitted
to match the LLVM IR semantics in this case.

PTX ISA doc -
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr

This is a rework of the original patch proposed in #79646

---------

Co-authored-by: Vandana <vandanak@nvidia.com>
2024-11-15 12:40:53 +00:00