llvm-project

Author	SHA1	Message	Date
Dávid Bolvanský	f02a0a69af	[NFCI] Fixed missing colon in CHECK directives	2022-04-03 11:52:38 +02:00
Daniil Kovalev	a8c277041a	[NVPTX] Fix poorly designed assertion introduced in D120129 NVPTXTargetLowering::getFunctionParamOptimizedAlign, which was introduces in D120129, contained a poorly designed assertion checking that a function with internal or private linkage is not a kernel. It relied on invariants that were not actually guaranteed, and that resulted in compiler crash with some CUDA versions (see discussion with @jdoerfert in D120129). This patch changes that assertion and makes it use isKernelFunction which is designed exactly for such checks. This patch also includes a test with IR that caused compiler crash before. Differential Revision: https://reviews.llvm.org/D122562	2022-03-28 17:34:58 +03:00
Daniil Kovalev	828b63c309	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D120129	2022-03-24 12:36:52 +03:00
Daniil Kovalev	a034878564	Revert "[NVPTX] Enhance vectorization of ld.param & st.param" This reverts commit f854434f0f2a01027bdaad8e6fdac5a782fce291. Placed URL to wrong differential revision in commit message.	2022-03-24 12:32:06 +03:00
Daniil Kovalev	f854434f0f	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:25:36 +03:00
Igor Kudrin	d7681d9f77	[NVPTX] Avoid a crash when 'llc' is called with '-filetype=null' For '-filetype=null', 'NVPTXTargetStreamer' is not created, so the return value of 'OutStreamer->getTargetStreamer()' should be checked before calling the methods. Differential Revision: https://reviews.llvm.org/D122001	2022-03-22 16:46:47 +04:00
Kristina Bessonova	57aaab3b17	[NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32) NVVM IR specification defines them with i32 return type: declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value) declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value) ... The i32 return value is a 32-bit mask where bit position in mask corresponds to thread’s laneid. as well as PTX ISA: 9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync match.any.sync.type d, a, membermask; match.all.sync.type d[\|p], a, membermask; ... Destination d is a 32-bit mask where bit position in mask corresponds to thread’s laneid. Additionally, ptxas doesn't accept intructions, produced by NVPTX backend. After this patch, it compiles with no issues. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D120499	2022-03-01 12:26:16 +02:00
Kristina Bessonova	3fe6f9388f	[NVPTX][AsmPrinter] Emit .attribute(.managed) for global variable declarations Declaration and definition attributes must match, otherwise it may cause issues on linking. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D120493	2022-02-25 10:21:31 +02:00
Nicolas Miller	69a8350c23	[NVPTX] Add ex2.approx.f16/f16x2 support his patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2 instruction. These two variants were added in PTX7.0, and are supported by sm_75 and above. Note that this isn't wired with the exp2 llvm intrinsic because the ex2 instruction is only available in its approx variant. Running ptxas on the assembly generated by the test f16-ex2.ll works as expected. Differential Revision: https://reviews.llvm.org/D119157	2022-02-23 13:56:53 -08:00
Jakub Chlanda	be672934ff	[NVPTX] Add more FMA intriniscs/builtins This patch adds builtins/intrinsics for the following variants of FMA: - f16, f16x2 - rn - rn_ftz - rn_sat - rn_ftz_sat - rn_relu - rn_ftz_relu - bf16, bf16x2 - rn - rn_relu ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly. Differential Revision: https://reviews.llvm.org/D118977	2022-02-23 13:56:53 -08:00
Jakub Chlanda	e0dc4ac28f	[NVPTX] Expose float tys min, max, abs, neg as builtins Adds support for the following builtins: - abs, neg: - .bf16, - .bf16x2 - min, max - {.ftz}{.NaN}{.xorsign.abs}.f16 - {.ftz}{.NaN}{.xorsign.abs}.f16x2 - {.NaN}{.xorsign.abs}.bf16 - {.NaN}{.xorsign.abs}.bf16x2 - {.ftz}{.NaN}{.xorsign.abs}.f32 Differential Revision: https://reviews.llvm.org/D117887	2022-02-23 13:56:53 -08:00
Dmitry Vassiliev	885140171a	[NVPTX] Fix NVPTXReplaceImageHandles for multiple uses of a texref The texsurf_handle is removed by NVPTXReplaceImageHandles.cpp. There are more than one uses of the texsurf_handle, one of them is a regular function call, and one of them is a texture intrinsic. The current hacky logic in NVPTXReplaceImageHandles.cpp for CUDA cannot handle such a mixed use. This patch fixes this issue. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D119635	2022-02-15 01:30:13 +03:00
Dmitry Vassiliev	6645bfa8f5	[NVPTX] Fix bug with int_nvvm_rotate_b64 when operand immediate Need to subract from 64, not 32. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D119639	2022-02-15 01:23:11 +03:00
Fangrui Song	a00ae86ab2	Revert D119669 "[NVPTX] Prefix "$L__" for branch label names" This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d. Broke clang-cuda-t4 ``` /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time -UNDEBUG --cuda-path=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0 -I/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0/include --cuda-gpu-arch=sm_75 -std=c++20 -stdlib=libstdc++ --gcc-toolchain=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/gcc-8 -DSTDLIB_VERSION=2014 -MD -MT External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -MF External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o.d -o External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -c /buildbot/cuda-t4-0/work/clang-cuda-t4/llvm-test-suite/External/CUDA/complex.cu ptxas /tmp/complex-cfa050/complex-sm_75.s, line 250; fatal : Parsing error near '$L__BB6_2': syntax error ptxas fatal : Ptx assembly aborted due to errors ```	2022-02-14 13:23:22 -08:00
Dmitry Vassiliev	cccef32109	[NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Differential Revision: https://reviews.llvm.org/D119669	2022-02-14 23:51:36 +03:00
Nikita Popov	1c729d719a	[NVPTX] Use align attribute for kernel pointer arg alignment Instead of determining the alignment based on the pointer element type (which is incompatible with opaque pointers), make use of alignment annotations added by the frontend. In particular, clang will add alignment attributes to OpenCL kernels since D118894. Other frontends might need to be adjusted to add the attribute as well. Differential Revision: https://reviews.llvm.org/D119247	2022-02-10 11:56:48 +01:00
Daniil Kovalev	0f9109cc9d	[NVPTX] Eliminate StoreRetval instructions with undef operand Previously a lot of StoreRetval instructions with undef operand were generated on NVPTX target when a big struct was returned by value. It resulted in a lot of unneeded st.param.* instructions in final assembly. The patch solves the issue by implementing the logic in NVPTX-specific part of DAG combiner. Differential Revision: https://reviews.llvm.org/D118973	2022-02-10 11:39:43 +03:00
Christian Sigg	f7da4a5d4d	[NVPTX] Remove fmin/fmax.NaN.f64 again Added in https://reviews.llvm.org/D117204, but it does not exist. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D118398	2022-01-28 07:46:16 +01:00
Christian Sigg	dc441d776f	[NVPTX] NFC: Remove unused arguments and attribute from test	2022-01-26 15:57:27 +01:00
Jack Kirk	bef3eb8344	[Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions Adds NVPTX intrinsics and builtins for CUDA PTX cvt instructions for sm80 architectures and above. Requires ptx 7.0. PTX ISA description of cvt instructions : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt Signed-off-by: JackAKirk <jack.kirk@codeplay.com> Differential Revision: https://reviews.llvm.org/D116673	2022-01-13 13:29:48 -08:00
Christian Sigg	ffee3b2f7a	[NVPTX] Add version test for sm_75, sm_80, sm_86. Combine the sm-version tests into a single file. Reviewed By: bkramer, tra Differential Revision: https://reviews.llvm.org/D117198	2022-01-13 20:24:09 +01:00
Christian Sigg	efb8d4cff3	[NVPTX] Add fmin/fmax.NaN lowering for sm_80+. Reviewed By: bkramer, tra Differential Revision: https://reviews.llvm.org/D117204	2022-01-13 20:22:41 +01:00
Christian Sigg	cc1b9acf55	[NVPTX] Lower fp16 fminnum, fmaxnum to native on sm_80. Reviewed By: bkramer, tra Differential Revision: https://reviews.llvm.org/D117122	2022-01-13 08:52:31 +01:00
Andrew Savonichev	e29ba97d23	[NVPTX] Auto-generate tests for sufrace and texture instructions The patch adds LIT tests for SULD, SUST, TEX and TLD4 instructions as a follow up for D112232. There are a number of FIXME marks that highlight possible bugs or missed instruction variants. Differential Revision: https://reviews.llvm.org/D114367	2021-12-07 15:27:51 +03:00
Andrew Savonichev	00aa0aeb06	[NVPTX] Add imm variants for surface and texture instructions Texture/sampler/surface operands can be either a register or an immediate (an index of .texref, .samplerref or .surfref). TableGen declarations for these instructions used to only have Int64Regs operands, so this caused issues when machine verifier is turned on: * Bad machine code: Expected a register operand. * - function: bar - basic block: %bb.0 (0x55b144d99ab8) - instruction: %4:int32regs = SULD_1D_I32_TRAP 0, killed %2:int32regs - operand 1: 0 The solution is to duplicate these instructions for all possible operand types (i16imm and Int64Regs). Since this would essentially double the amount code in TableGen, the patch also does some refactoring for the original instructions to keep things manageable. Differential Revision: https://reviews.llvm.org/D112232	2021-11-10 19:05:03 +03:00
Andrew Savonichev	123ad720f1	[NVPTX] Mark special registers as reserved A reserved register: - is not allocatable - is considered always live - is ignored by liveness tracking NVPTX special registers match the criteria, and marking them as reserved helps to avoid machine verifier error: * Bad machine code: Using an undefined physical register * - function: foo - basic block: %bb.0 (0x557bb178b708) - instruction: %0:int32regs = MOV_SPECIAL $envreg0 - operand 1: $envreg0 Differential Revision: https://reviews.llvm.org/D113008	2021-11-03 15:48:04 +03:00
Andrew Savonichev	0e70785538	[NVPTX] Add MoveParam instruction for TargetExternalSymbol operand TargetExternalSymbol is considered to be an immediate and not a register, so machine verifier emits an error: * Bad machine code: Expected a register operand. * - function: static_offset - basic block: %bb.0 bb (0x560e9b306028) - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1 - operand 1: &static_offset_param_1 The patch adds variants of this instruction with an immediate operand for byval arguments on 64-bit and 32-bit targets. Differential Revision: https://reviews.llvm.org/D113006	2021-11-03 14:43:41 +03:00
Andrew Savonichev	30a3a17df8	[NVPTX] Copy machine operand flags in TII::insertBranch Before this patch, flags such as undef were dropped by TII::insertBranch (used by BranchFolding pass), resulting in the following error from machine verifier: * Bad machine code: Reading virtual register without a def * - function: hoge - basic block: %bb.0 bb (0x562e9c240e68) - instruction: CBranch %2:int1regs, %bb.3 - operand 0: %2:int1regs Differential Revision: https://reviews.llvm.org/D113001	2021-11-03 12:38:27 +03:00
Artem Belevich	b6b7fe60a4	[NVPTX] Add a late SROA pass which allows optimizing away more allocas. Fixes performance regression https://bugs.llvm.org/show_bug.cgi?id=52037 Differential Revision: https://reviews.llvm.org/D111471	2021-10-19 16:18:28 -07:00
Arthur Eubanks	15fefcb9eb	[opt] Directly translate -O# to -passes='default<O#>' Right now when we see -O# we add the corresponding 'default<O#>' into the list of passes to run when translating legacy -pass-name. This has the side effect of not using the default AA pipeline. Instead, treat -O# as -passes='default<O#>', but don't allow any other -passes or -pass-name. I think we can keep `opt -O#` as shorthand for `opt -passes='default<O#>` but disallow anything more than just -O#. Tests need to be updated to not use `opt -O# -pass-name`. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112036	2021-10-18 16:48:10 -07:00
Andrew Savonichev	51eefa8164	[NVPTX] Add VRFrame and VRFrameLocal to integer register classes These registers are used as operands for instructions that expect an integer register, so they should be added to Int32Regs or Int64Regs register classes. Otherwise the machine verifier emits an error for the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1 environment variable is set: * Bad machine code: Illegal physical register for instruction * - function: kernel_func - basic block: %bb.0 entry (0x55c8903d5438) - instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0 - operand 1: $vrframelocal $vrframelocal is not a Int64Regs register. CodeGen/NVPTX/call-with-alloca-buffer.ll CodeGen/NVPTX/disable-opt.ll CodeGen/NVPTX/lower-alloca.ll CodeGen/NVPTX/lower-args.ll CodeGen/NVPTX/param-align.ll CodeGen/NVPTX/reg-types.ll DebugInfo/NVPTX/dbg-declare-alloca.ll DebugInfo/NVPTX/dbg-value-const-byref.ll Differential Revision: https://reviews.llvm.org/D110164	2021-10-14 16:19:03 +03:00
Jake Egan	56049b7129	Fix tests defaulting to incorrect triples on AIX The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave. This patch constrains the tests by specifying -mtriple instead of -march. Reviewed By: daltenty, jsji, MaskRay Differential Revision: https://reviews.llvm.org/D110186	2021-09-27 11:30:45 -04:00
Artem Belevich	d99a83b4e5	[NVPTX] Simplify and generalize constant printer. This allows handling i128 values and fixes https://bugs.llvm.org/show_bug.cgi?id=51789. Differential Revision: https://reviews.llvm.org/D109458	2021-09-09 11:30:19 -07:00
Steffen Larsen	1b4c85fc02	[NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5. PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D107046	2021-08-06 16:13:35 -07:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Simon Pilgrim	fcb710a7ad	[NVPTX] Add select(cc,binop(),binop()) fast-math tests As discussed on D106058 - we're not propagating the common flags to the merged binop	2021-07-18 15:30:24 +01:00
Artem Belevich	d774b4aa5e	[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. That should allow clang to compile mma.h from CUDA-11.3. Differential Revision: https://reviews.llvm.org/D105384	2021-07-15 12:02:09 -07:00
Simon Pilgrim	3cc38703d5	[NVPTX] Tweak fast-math tests to avoid select(binop(x,y),binop(x,z)) fold As suggested on D106058, tweak the tests to keep the combineRepeatedFPDivisors test coverage.	2021-07-15 15:42:25 +01:00
Simon Pilgrim	e21663d32b	[NVPTX] Add selp.f32 checks to select(cond,fpbinop(),fpbinop()) tests Will help show codegen diffs in an upcoming patch	2021-07-15 12:42:29 +01:00
Tom Stellard	7f1c077c30	tests/CodeGen: Use %python lit substitution when invoking python This will use the python that LLVM was configured to use rather than python from PATH. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D105224	2021-07-06 18:46:36 -07:00
Steffen Larsen	3644726a78	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0. PTX ISA description of - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D104847	2021-06-29 15:44:07 -07:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
thomasraoux	505933a489	[NVPTX] Fix lowering of frem for negative values to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was returning 5 instead of -2. We should just a ftrunc instead of floor when lowering to get the right behavior. Differential Revision: https://reviews.llvm.org/D102528	2021-05-24 07:45:03 -07:00
Steffen Larsen	f226e28a88	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions for `sm_80` architecture or newer. PTX ISA description of `redux.sync`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Differential Revision: https://reviews.llvm.org/D100124	2021-05-17 09:46:59 -07:00
Stuart Adams	02c2468864	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for `sm_80` architecture or newer. PTX ISA description of `cp.async`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive Authored-by: Stuart Adams <stuart.adams@codeplay.com> Co-Authored-by: Alexander Johnston <alexander@codeplay.com> Differential Revision: https://reviews.llvm.org/D100394	2021-05-17 09:46:59 -07:00
William S. Moses	7aa3cad46a	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 20:12:12 -04:00
William S. Moses	8ede96493c	Revert "[NVPTX] Enable lowering of atomics on local memory" This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.	2021-04-26 19:33:01 -04:00
William S. Moses	fede99d386	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 19:27:27 -04:00

1 2 3 4 5 ...

406 Commits