llvm-project

History

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292 )

PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st

2025-05-13 13:36:09 -07:00

4x2xhalf.ll

…

lit.local.cfg

…

load-store-256-bit.ll

…

many_loads_stores.ll

…

merge-across-side-effects.ll

…

non-instr-bitcast.ll

…

overlapping_chains.ll

…

propagate-invariance-metadata.ll

…

vectorize_i1.ll

…

vectorize_i8.ll

…

vectorize_i16.ll

…

vectorize_i24.ll

…

vectorize_vectors.ll

…