10 Commits

Author SHA1 Message Date
Alex MacLean
76c9bfefa4
[NVPTX] Remove Float register classes (#140487)
These classes are redundant, as the untyped "Int" classes can be used
for all float operations. This change is intended to be as minimal as
possible and leaves the many potential simplifications and refactors
this exposes as future work.
2025-05-21 11:33:57 -07:00
Alex MacLean
831592d617
[NVPTX] Fixup under-aligned dynamic alloca lowering (#139628)
The alignment on a ISD::DYNAMIC_STACKALLOC node may be 0 to indicate
that the default stack alignment should be used. Prior to this change,
we passed this alignment through unchanged leading to an error in
ptxas. Now, we use the stack-alignment in this case. Also did a little
cleanup while I'm here.
2025-05-13 09:56:41 -07:00
Alex MacLean
369891b674
[NVPTX] use untyped loads and stores where ever possible (#137698)
In most cases, the type information attached to load and store
instructions is meaningless and inconsistently applied. We can usually
use ".b" loads and avoid the complexity of trying to assign the correct
type. The one expectation is sign-extending load, which will continue to
use ".s" to ensure the sign extension into a larger register is done
correctly.
2025-05-10 08:26:26 -07:00
peterbell10
0068078dca
[NVPTX] Remove NVPTX::IMAD opcode, and rely on intruction selection only (#121724)
I noticed that NVPTX will sometimes emit `mad.lo` to multiply by 1, e.g.
in https://gcc.godbolt.org/z/4j47Y9W4c.

This happens when DAGCombiner operates on the add before the mul, so the
imad contraction happens regardless of whether the mul could have been
simplified.

To fix this, I remove `NVPTXISD::IMAD` and only combine to mad during
selection. This allows the default DAGCombiner patterns to simplify
the graph without any NVPTX-specific intervention.
2025-01-15 20:09:18 +00:00
Fangrui Song
b279f6b098 [NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.
2024-12-15 10:45:11 -08:00
Youngsuk Kim
0f0a96b862
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
2024-10-19 10:05:36 -04:00
Youngsuk Kim
5a0942cd74
[llvm][NVPTX] Don't emit unused var 'temp_param_reg' (NFC) (#89004)
Don't emit unused variable 'temp_param_reg' which has been around since
ae556d3ef72dfe5f40a337b7071f42b7bf5b66a4 .
2024-04-17 14:45:33 -04:00
Adrian Kuegel
f0a5e50550 [llvm][NVPTX] Add missing feature guard. 2024-03-19 06:53:14 +00:00
Alex MacLean
89b7b3b995
[NVPTX] support dynamic allocas with PTX alloca instruction (#84585)
Add support for dynamically sized alloca instructions with the PTX
alloca instruction introduced in PTX 7.3 
([9.7.15.3. Stack Manipulation Instructions: alloca]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions-alloca))
2024-03-15 11:51:46 -07:00
Youngsuk Kim
f9304974cc
[llvm][NVPTX] Inform that 'DYNAMIC_STACKALLOC' is unsupported (#74684)
Catch unsupported path early up, and emit error with information.

Motivated by the following threads:
* https://discourse.llvm.org/t/nvptx-problems-with-dynamic-alloca/70745
* #64017
2023-12-14 22:06:22 -05:00