llvm-project

History

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292 )

PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain
conditions. This change extends the backend to lower these loads/stores.
It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the
LoadStoreVectorizer to create these wider vector operations.

See the spec for the three relevant PTX instructions here:
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc
- https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st

2025-05-13 13:36:09 -07:00

AArch64

Revert "LSV: forbid load-cycles when vectorizing; fix bug (#104815 )" (#106245 )

2024-08-27 18:45:22 +02:00

AMDGPU

[NFC] Precommit tests for an LSV patch (#138167 )

2025-05-01 12:50:31 -04:00

NVPTX

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292 )

2025-05-13 13:36:09 -07:00

X86

[LoadStoreVectorizer] Postprocess and merge equivalence classes (#121861 )

2025-01-07 17:17:26 -08:00

int_sideeffect.ll

Rewrite load-store-vectorizer.

2023-05-26 15:15:39 -07:00