llvm-project

Author	SHA1	Message	Date
Alex MacLean	11fba35916	[NVPTX] Add SimplifyDemandedBitsForTargetNode for PRMT (#149395 )	2025-07-22 18:44:50 -07:00
Alex MacLean	86203b6b33	[NVPTX] Use PRMT more widely, and improve folding around this instruction (#148261 ) Replace uses of BFE with PRMT when lowering v4i8 vectors. This will generally lead to equivalent or better SASS and reduces the number of target specific operations we need to represent. (https://cuda.godbolt.org/z/M75W6f8xd) Also implement KnownBits tracking for PRMT allowing elimination of redundant AND instructions when lowering various i8 operations.	2025-07-13 15:06:53 -07:00
Alex MacLean	76c9bfefa4	[NVPTX] Remove Float register classes (#140487 ) These classes are redundant, as the untyped "Int" classes can be used for all float operations. This change is intended to be as minimal as possible and leaves the many potential simplifications and refactors this exposes as future work.	2025-05-21 11:33:57 -07:00
Alex MacLean	369891b674	[NVPTX] use untyped loads and stores where ever possible (#137698 ) In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.	2025-05-10 08:26:26 -07:00
Alex MacLean	b6f32ad8b0	[NVPTX] Switch to untyped float registers (#137011 ) Register types in PTX are simply syntactic sugar and emitting them has added lots of unnecessary complexity to the NVPTX backend. This change takes the first step to their removal by using ".b" registers instead of ".f" in all cases. This should shake out any potential issues or bugs in ptxas preventing full removal and pre-fetches many of the required test updates.	2025-04-23 15:37:38 -07:00
Drew Kersnar	932d9c13fa	[NVPTX] Generalize and extend upsizing when lowering 8/16-bit-element vector loads/stores (#119622 ) This addresses the following issue I opened: https://github.com/llvm/llvm-project/issues/118851. This change generalizes the Type Legalization mechanism that currently handles `v8[i/f/bf]16` upsizing to include loads _and_ stores of `v8i8` + `v16i8`, allowing all of the mentioned vectors to be lowered to ptx as vectors of `b32`. This extension also allows us to remove the DagCombine that only handled exactly `load v16i8`, thus centralizing all the upsizing logic into one place. Test changes include adding v8i8, v16i8, and v8i16 cases to load-store.ll, and updating the CHECKs for other tests to match the improved codegen.	2024-12-17 15:23:22 -08:00
Youngsuk Kim	0f0a96b862	[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017 ) Remove the extraneous '+0' immediate offset part in PTX load/stores, to improve readability of output PTX code.	2024-10-19 10:05:36 -04:00
Artem Belevich	26b786ae2f	[NVPTX] Restrict combining to properly aligned v16i8 vectors. (#107919 ) Fixes generation of invalid loads leading to misaligned access errors. The bug got exposed by SLP vectorizer change ec360d6 which allowed SLP to produce `v16i8` vectors. Also updated the tests to use automatic check generator.	2024-09-09 16:15:00 -07:00
Pierre-Andre Saulais	0b80288e9e	[NVPTX] Preserve v16i8 vector loads when legalizing This is done by lowering v16i8 loads into LoadV4 operations with i32 results instead of letting ReplaceLoadVector split it into smaller loads during legalization. This is done at dag-combine1 time, so that vector operations with i8 elements can be optimised away instead of being needlessly split during legalization, which involves storing to the stack and loading it back.	2023-10-19 12:34:25 +01:00
Nikita Popov	9b81548a68	[NVPTX] Convert some tests to opaque pointers (NFC)	2022-12-19 12:57:23 +01:00
Andrew Savonichev	0f1b5f115a	[NVPTX] Integrate ptxas to LIT tests ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc. Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines. LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration. Differential Revision: https://reviews.llvm.org/D121727	2022-04-28 14:59:45 +03:00
Artem Belevich	620db1f3dd	[NVPTX] Added support for .f16x2 instructions. This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032	2017-02-23 22:38:24 +00:00
Justin Lebar	cd564c6b46	[NVPTX] Enable the load-store vectorizer on nvptx. Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196	2016-07-20 22:11:36 +00:00

13 Commits