llvm-project

Author	SHA1	Message	Date
Benjamin Chetioui	2c3f82b775	[NVPTX] Fix NVPTX lowering of frem when denominator is infinite. `frem x, {+,-}inf` must return x to match the specification of LLVM's frem. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D140846	2023-01-05 09:27:54 +01:00
Hugh Delaney	ce43e2f074	[llvm][CUDA] Allow NVVMREflect to process OpenCL-specific __nvvm_reflect_ocl() OpenCL requires constant string arguments to be in a particular address space, so OpenCL sources can't use the regular `__nvvm_reflect()`. Allow NVVMReflect pass to accept an Open_CL specific variant with a constant string in a non-default address space. Differential Revision: https://reviews.llvm.org/D139213	2023-01-04 12:03:00 -08:00
Dmitry Borisenkov	0ec51a460a	DAG: Prevent store value forwarding to distinct addrspace load DAGCombiner replaces (load const_addr1) directly chained with (store (val, const_addr2)) with val if address space stripped const_addr1 == const_addr2. The patch fixes the issue by checking address spaces as well. However, it might makes sense to not to chain together side effects that belong to different address spaces in the first place and make SelectionDAG::root address space aware.	2022-12-29 18:19:55 -05:00
Pavel Kopyl	fa023e0fe8	[NVPTX] Emit .noreturn directive Differential Revision: https://reviews.llvm.org/D140238	2022-12-28 21:45:51 +03:00
Nikita Popov	a2087a9c81	[NVPTX] Convert test to opaque pointers (NFC)	2022-12-22 14:04:41 +01:00
Nikita Popov	9b81548a68	[NVPTX] Convert some tests to opaque pointers (NFC)	2022-12-19 12:57:23 +01:00
Ron Lieberman	38f1abef86	Revert "[SelectionDAG] Do not second-guess alignment for alloca" Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193 23491 This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.	2022-12-15 10:55:18 -06:00
Andrew Savonichev	ffedf47d8b	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2022-12-15 18:18:12 +03:00
Pavel Kopyl	619b7cecf3	[NVPTX] Backend support for variadic functions This patch adds lowering for function calls with variadic number of arguments as well as enables support for the following instructions/intrinsics: - va_arg - va_start - va_end - va_copy Note that this patch doesn't intent to include clang's support for variadic functions for CUDA. According to the docs: PTX version 6.0 supports passing unsized array parameter to a function which can be used to implement variadic functions. [0] The last parameter in the parameter list may be a .param array of type .b8 with no size specified. It is used to pass an arbitrary number of parameters to the function packed into a single array object. When calling a function with such an unsized last argument, the last argument may be omitted from the call instruction if no parameter is passed through it. Accesses to this array parameter must be within the bounds of the array. The result of an access is undefined if no array was passed, or if the access was outside the bounds of the actual array being passed. [1] Note that aggregates passed by value as variadic arguments are not currently supported. [0] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#variadic-functions [1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func Differential Revision: https://reviews.llvm.org/D138531	2022-12-13 19:07:43 +03:00
Roman Lebedev	62f91c1262	[NFC] Port codegen NVPTX tests that invoke opt to `-passes=` syntax	2022-12-09 01:04:47 +03:00
Roman Lebedev	b1a9584818	[opt] Disincentivize new tests from using old pass syntax Over the past day or so, i've took a large swing at our tests, and reduced the number of tests that were still using the old syntax from ~1800 to just 200. Left to handle: (as it is seen in this patch) * Transforms/LSR * Transforms/CGP * Transforms/TypePromotion * Transforms/HardwareLoops * Analysis/* * some misc. I think this is the right point to start actively refusing to honor the old syntax, except for the old tests, to prevent the old syntax from creeping back in. Thus, let's add temporary default-off flag, and if it is not passed refuse to accept old syntax. The tests that still need porting are annotated with this flag. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D139647	2022-12-08 23:54:03 +03:00
Andrew Savonichev	4f9321f92c	[NVPTX] Fix alignment for arguments of function pointer calls Alignment of function arguments can be increased only if we can do this for all call sites. Therefore we do not increase it for external functions, and now we skip functions that have address taken, to avoid any issues with functions pointers. Differential Revision: https://reviews.llvm.org/D135708	2022-11-15 21:43:06 +03:00
Andrew Savonichev	69e73d076b	[NVPTX] Fix pointer argument declaration for --nvptx-short-ptr When --nvptx-short-ptr is set, local pointers are stored as 32-bit on nvptx64 target. Before this patch, arguments for a function declaration were always emitted as b64 regardless of their address space, but they were set as b32 for the corresponding call instruction: .extern .func test ( .param .b64 test_param_0 ) [...] .param .b32 param0; st.param.b32 [param0+0], %r1; call.uni test, (param0); This is not supported: ptxas: Type of argument does not match formal parameter 'test_param_0' Now short pointers in a function declaration are emitted as b32 if --nvptx-short-ptr is set. Differential Revision: https://reviews.llvm.org/D135674	2022-11-15 21:41:33 +03:00
Andrew Savonichev	c38fa7c014	[NVPTX] Fix pointer type for short 32-bit pointers Global variables used to be printed as u64/b64 even when -nvptx-short-ptr is set. Differential Revision: https://reviews.llvm.org/D127668	2022-11-15 21:39:34 +03:00
Dmitry Vassiliev	c6a199fb4f	[NVPTX] Emit pragma nounroll for llvm.loop.unroll.count=1 Emit pragma nounroll for llvm.loop.unroll.count=1 (#pragma unroll 1). Reviewed By: tra Differential Revision: https://reviews.llvm.org/D137991	2022-11-15 04:30:00 +04:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Jakub Chlanda	8407fdbd69	[NVPTX] Support neg{.ftz} for f16 and f16x2 Differential Revision: https://reviews.llvm.org/D135428	2022-10-13 10:48:33 -07:00
Luke Drummond	940fa35ece	[NVPTX] Fix a segfault for bitcasted calls with byval params `getFunctionParamOptimizedAlign` was being passed a null function argument when getting the callee of a bitcasted function symbol. This is because `CallBase::getCalledFunction` does not look through bitcasts. There is already code to handle this case in `NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into an NVPTX util. The alignment computation now gracefully handles computing alignment of virtual functions with a check for null.	2022-10-11 15:12:25 +01:00
Andrew Savonichev	d420110a1e	[NVPTX] Fix constant expression initializers for global variables Before this patch the code in printScalarConstant was unable to handle nested constant expressions like (gep (addrspacecast ptr)) and crashed with: LLVM ERROR: Unsupported expression in static initializer: addrspacecast ([4 x i8] addrspace(1)* @ga to [4 x i8]*) We can use lowerConstantForGV instead which is a customized version of lowerConstant that supports generic() and nested expressions. Differential Revision: https://reviews.llvm.org/D127878	2022-10-04 00:29:42 +03:00
Andrew Savonichev	5585d99835	[NVPTX] Fix issues in ptxas integration to LIT tests 1) Fixed a typo in PTXAS_EXECUTABLE CMake variable (PXTAS -> PTXAS). 2) Version check was implemented incorrectly, now version (major, minor) is converted to int for comparison. 3) ptxas -arch argument was incorrect (or missing) in 3 tests. Differential Revision: https://reviews.llvm.org/D127866	2022-10-04 00:29:42 +03:00
Shivam Gupta	e2632fbcdd	[NVPTX] Use MBB.begin() instead MBB.front() in NVPTXFrameLowering::emitPrologue The second argument of `NVPTXFrameLowering::emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB)` is the first MBB of the MF. In that function, it assumes the first MBB always contains instructions, so it gets the first instruction by MachineInstr *MI = &MBB.front();. However, with the reproducer/test case attached, all instructions in the first MBB is cleared in a previous pass for stack coloring. As a consequence, MBB.front() triggers the assertion that the first node is actually a sentinel node. Hence we are using MachineBasicBlock::iterator to iterate over MBB. Fix #52623. Differential Revision: https://reviews.llvm.org/D132663	2022-09-14 08:30:55 +05:30
Craig Topper	efd5acf120	[LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB. This is the ultimate fallback code if UADDO isn't supported. If the target uses 0/1 we used one compare, but if the target doesn't use 0/1 we emitted two compares. Regardless of boolean constants we should only need to check that the Result is less than one of the original operands. So we only need one compare. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D133708	2022-09-13 09:07:56 -07:00
Benjamin Kramer	3ccaabe051	[NVPTX] Lower llvm.roundeven to cvt.rni	2022-08-25 13:36:22 +02:00
Dmitry Vassiliev	9174a5e9a8	[NVPTX] SHL.64 $r, 31 cannot be converted to a mulwide.s32 In order to convert to mulwide.s32, we compute the 2nd operand as MulWide.32 $r, (1 << 31). (1 << 31) is interpreted as a negative number, and is not equivalent to the original instruction. The code `int64_t r = (int64_t)a << 31;` incorrectly compiled to `mul.wide.s32 %rd7, %r1, -2147483648;` Reviewed By: jchlanda Differential Revision: https://reviews.llvm.org/D132516	2022-08-24 11:39:41 +02:00
Kjetil Kjeka	ff1920d106	[NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing Today llc will crash when attempting to use non-power-of-two integer types as function arguments or returns. This patch enables passing non standard integer values in functions by promoting them before store and truncating after load. The main motivation of implementing this change is that rust casts small structs (less than pointer size) into an integer of the same size. As an example, if a struct contains three u8 then it will be passed as an i24. This patch is a step towards enabling rust compilation to ptx while retaining the target independent optimizations. More context can be found in https://github.com/llvm/llvm-project/issues/55764 Differential Revision: https://reviews.llvm.org/D129291	2022-07-22 14:14:12 -07:00
Artem Belevich	35029d8374	Changed EOL to UNIX. NFC.	2022-07-22 14:11:36 -07:00
Igor Kudrin	32eed8828e	Reapply "[NVPTX] Use the mask() operator to initialize packed structs with pointers" The original patch revealed an issue of reading incorrect values on BE hosts. That is now changed to use `endian::read32le()` and `endian::read64le()`. Original commit message: The current implementation assumes that all pointers used in the initialization of an aggregate are aligned according to the pointer size of the target; that might not be so if the object is packed. In that case, an array of .u8 should be used and pointers should be decorated with the mask() operator. The operator was introduced in PTX ISA 7.1, so an error is issued if the case is detected for an earlier version. Differential Revision: https://reviews.llvm.org/D127504	2022-07-18 20:56:26 +04:00
Igor Kudrin	1e451369d2	Revert "[NVPTX] Use the mask() operator to initialize packed structs with pointers" The new test fails on BE hosts. This reverts commit 04e978ccba1e6c8b600b2fbad1a82b4b64ffc34b.	2022-07-18 20:08:39 +04:00
Igor Kudrin	04e978ccba	[NVPTX] Use the mask() operator to initialize packed structs with pointers The current implementation assumes that all pointers used in the initialization of an aggregate are aligned according to the pointer size of the target; that might not be so if the object is packed. In that case, an array of .u8 should be used and pointers should be decorated with the mask() operator. The operator was introduced in PTX ISA 7.1, so an error is issued if the case is detected for an earlier version. Differential Revision: https://reviews.llvm.org/D127504	2022-07-18 04:08:59 -07:00
Igor Kudrin	9ff10a0d62	[NVPTX] Add missing pass names Differential Revision:	2022-07-12 07:58:13 -07:00
Igor Kudrin	8958e70ccb	[NVPTX] Keep metadata attached to module-scope variables This helps to preserve the debug information of global variables. Differential Revision: https://reviews.llvm.org/D127510	2022-06-22 05:51:29 -07:00
Shilei Tian	ecf5b78053	[NVPTX] Enable AtomicExpandPass for NVPTX This patch enables `AtomicExpandPass` for NVPTX. Depend on D125652. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125639	2022-05-20 17:25:28 -04:00
Dmitry Vassiliev	2e7e0975c0	[NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Differential Revision: https://reviews.llvm.org/D119669	2022-04-30 21:55:20 +02:00
Dmitry Vassiliev	8c49ab040c	[NVPTX] Add add.cc/addc.cc/sub.cc/subc.cc for i64 PTX supports those instructions for i64 starting from 4.3. The patch also marks corresponding DAG nodes legal for both i32 and i64. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124698	2022-04-29 15:32:22 -07:00
Andrew Savonichev	0f1b5f115a	[NVPTX] Integrate ptxas to LIT tests ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc. Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines. LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration. Differential Revision: https://reviews.llvm.org/D121727	2022-04-28 14:59:45 +03:00
Igor Chebykin	84cf290c84	[NVPTX][tests] Do not run the tests which are not supported by nvptx Some generic tests are not supported by the nvptx now. Moreover, they are no plans to fix the tested features in nvptx. So, suggest to mark them as UNSUPPORTED Differential Revision: https://reviews.llvm.org/D123928	2022-04-26 17:26:56 +03:00
Jakub Chlanda	76d1f5eaa8	[NVPTX] Support float <-> 2 x half bitcasts Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types. This was discovered through: https://github.com/intel/llvm/issues/5969 I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124171	2022-04-25 14:37:41 -07:00
Artem Belevich	993054c1c9	Change NVPTX/f16x2-instructions.ll to use unix EOL. NFC	2022-04-25 14:30:23 -07:00
Daniil Kovalev	eb3d64695f	[NVPTX] Use opaque pointers in param space vectorization tests Opaque pointers are enabled by default since D123300, so test IR should be regenerated correspondingly. Differential Revision: https://reviews.llvm.org/D123842	2022-04-17 19:08:31 +03:00
Andrew Savonichev	52053aa94f	[NVPTX] Disable parens for identifiers starting with '$' ptxas fails to parse such syntax: mov.u64 %rd1, ($str); fatal : Parsing error near '$str': syntax error A new MCAsmInfo option was added because InParens parameter of MCExpr::print is not sufficient to disable parens completely. MCExpr::print resets it to false for a recursive call in case of unary or binary expressions. Targets that require parens around identifiers that start with '$' should always pass MCAsmInfo to MCExpr::print. Therefore 'operator<<(raw_ostream &, MCExpr&)' should be avoided because it calls MCExpr::print with nullptr MAI. Differential Revision: https://reviews.llvm.org/D123702	2022-04-17 18:02:33 +03:00
Andrew Savonichev	5193f2a558	Revert "[NVPTX] Disable parens for identifiers starting with '$'" This reverts commit 78d70a1c976934587e6d4c5698c348b8f09d9d96. Failed on Mips32: https://lab.llvm.org/buildbot#builders/109/builds/36628 # CHECK: # fixup A - offset: 0, value: ($tmp0), kind: fixup_Mips_26 <stdin>:580:2: note: possible intended match here # fixup A - offset: 0, value: $tmp0, kind: fixup_Mips_26	2022-04-14 21:25:31 +03:00
Andrew Savonichev	78d70a1c97	[NVPTX] Disable parens for identifiers starting with '$' ptxas fails to parse such syntax: mov.u64 %rd1, ($str); fatal : Parsing error near '$str': syntax error A new MCAsmInfo option was added because InParens parameter of MCExpr::print is not sufficient to disable parens completely. MCExpr::print resets it to false for a recursive call in case of unary or binary expressions. Differential Revision: https://reviews.llvm.org/D123702	2022-04-14 21:07:43 +03:00
Andrew Savonichev	b6183a57a1	[NVPTX] Fix barrier.ll LIT test The second parameter should be a multiple of the warp size (32). PTX ISA spec, s9.7.12.1. Parallel Synchronization and Communication Instructions: bar, barrier barrier.sync{.aligned} a{, b}; Operand b specifies the number of threads participating in the barrier. If no thread count is specified, all threads in the CTA participate in the barrier. When specifying a thread count, the value must be a multiple of the warp size. Differential Revision: https://reviews.llvm.org/D123470	2022-04-14 17:07:53 +03:00
Andrew Savonichev	32949401a8	[NVPTX] Avoid dots in global names It seems that ptxas cannot parse them: ptxas fatal: Parsing error near '.2': syntax error Differential Revision: https://reviews.llvm.org/D123041	2022-04-14 17:07:52 +03:00
Andrew Savonichev	4cef5c397d	[NVPTX] .attribute(.managed) is only supported for sm_30 and PTX 4.0 PTX ISA spec, s5.4.8. Variable Attribute Directive: .attribute PTX ISA Notes Introduced in PTX ISA version 4.0. Target ISA Notes .managed attribute requires sm_30 or higher. Differential Revision: https://reviews.llvm.org/D123040	2022-04-14 17:07:52 +03:00
Andrew Savonichev	230f326964	[NVPTX] shfl.sync is introduced in PTX 6.0 PTX ISA spec, s9.7.8.6. Data Movement and Conversion Instructions: shfl.sync PTX ISA Notes Introduced in PTX ISA version 6.0. Target ISA Notes Requires sm_30 or higher. Differential Revision: https://reviews.llvm.org/D123039	2022-04-14 17:07:51 +03:00
Andrew Savonichev	369adba043	[NVPTX] 64-bit atom.{and,or,xor,min,max} require sm_32 or higher PTX ISA spec, s9.7.12.4. Parallel Synchronization and Communication Instructions: atom Target ISA Notes 64-bit atom.{and,or,xor,min,max} require sm_32 or higher. Differential Revision: https://reviews.llvm.org/D123038	2022-04-14 17:07:51 +03:00
Johannes Doerfert	0f070bee82	[NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers Differential Revision: https://reviews.llvm.org/D123522	2022-04-12 16:42:50 -05:00
Matt Arsenault	9fdd25848a	Transforms: Fix code duplication between LowerAtomic and AtomicExpand	2022-04-08 19:06:36 -04:00
Dávid Bolvanský	f02a0a69af	[NFCI] Fixed missing colon in CHECK directives	2022-04-03 11:52:38 +02:00

1 2 3 4 5 ...

455 Commits