455 Commits

Author SHA1 Message Date
Benjamin Chetioui
2c3f82b775 [NVPTX] Fix NVPTX lowering of frem when denominator is infinite.
`frem x, {+,-}inf` must return x to match the specification of LLVM's frem.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D140846
2023-01-05 09:27:54 +01:00
Hugh Delaney
ce43e2f074 [llvm][CUDA] Allow NVVMREflect to process OpenCL-specific __nvvm_reflect_ocl()
OpenCL requires constant string arguments to be in a particular address space,
so OpenCL sources can't use the regular `__nvvm_reflect()`.

Allow NVVMReflect pass to accept an Open_CL specific variant with a constant
string in a non-default address space.

Differential Revision: https://reviews.llvm.org/D139213
2023-01-04 12:03:00 -08:00
Dmitry Borisenkov
0ec51a460a DAG: Prevent store value forwarding to distinct addrspace load
DAGCombiner replaces (load const_addr1) directly chained with (store
(val, const_addr2)) with val if address space stripped const_addr1 ==
const_addr2. The patch fixes the issue by checking address spaces as
well.  However, it might makes sense to not to chain together side
effects that belong to different address spaces in the first place and
make SelectionDAG::root address space aware.
2022-12-29 18:19:55 -05:00
Pavel Kopyl
fa023e0fe8 [NVPTX] Emit .noreturn directive
Differential Revision: https://reviews.llvm.org/D140238
2022-12-28 21:45:51 +03:00
Nikita Popov
a2087a9c81 [NVPTX] Convert test to opaque pointers (NFC) 2022-12-22 14:04:41 +01:00
Nikita Popov
9b81548a68 [NVPTX] Convert some tests to opaque pointers (NFC) 2022-12-19 12:57:23 +01:00
Ron Lieberman
38f1abef86 Revert "[SelectionDAG] Do not second-guess alignment for alloca"
Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193
 23491

This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.
2022-12-15 10:55:18 -06:00
Andrew Savonichev
ffedf47d8b [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2022-12-15 18:18:12 +03:00
Pavel Kopyl
619b7cecf3 [NVPTX] Backend support for variadic functions
This patch adds lowering for function calls with variadic number of
arguments as well as enables support for the following
instructions/intrinsics:

  - va_arg
  - va_start
  - va_end
  - va_copy

Note that this patch doesn't intent to include clang's support for
variadic functions for CUDA.

According to the docs:

  PTX version 6.0 supports passing unsized array parameter to a
  function which can be used to implement variadic functions. [0]

  The last parameter in the parameter list may be a .param array of
  type .b8 with no size specified. It is used to pass an arbitrary
  number of parameters to the function packed into a single array
  object.

  When calling a function with such an unsized last argument, the last
  argument may be omitted from the call instruction if no parameter is
  passed through it.  Accesses to this array parameter must be within
  the bounds of the array.  The result of an access is undefined if no
  array was passed, or if the access was outside the bounds of the
  actual array being passed. [1]

Note that aggregates passed by value as variadic arguments are not
currently supported.

[0] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#variadic-functions
[1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func

Differential Revision: https://reviews.llvm.org/D138531
2022-12-13 19:07:43 +03:00
Roman Lebedev
62f91c1262
[NFC] Port codegen NVPTX tests that invoke opt to -passes= syntax 2022-12-09 01:04:47 +03:00
Roman Lebedev
b1a9584818
[opt] Disincentivize new tests from using old pass syntax
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.

Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.

I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.

Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D139647
2022-12-08 23:54:03 +03:00
Andrew Savonichev
4f9321f92c [NVPTX] Fix alignment for arguments of function pointer calls
Alignment of function arguments can be increased only if we can do
this for all call sites. Therefore we do not increase it for external
functions, and now we skip functions that have address taken, to avoid
any issues with functions pointers.

Differential Revision: https://reviews.llvm.org/D135708
2022-11-15 21:43:06 +03:00
Andrew Savonichev
69e73d076b [NVPTX] Fix pointer argument declaration for --nvptx-short-ptr
When --nvptx-short-ptr is set, local pointers are stored as 32-bit on
nvptx64 target.

Before this patch, arguments for a function declaration were always
emitted as b64 regardless of their address space, but they were set as
b32 for the corresponding call instruction:

   .extern .func test
   (
    .param .b64 test_param_0
   )
   [...]
    .param .b32 param0;
    st.param.b32 [param0+0], %r1;
    call.uni test, (param0);

This is not supported:

  ptxas: Type of argument does not match formal parameter
  'test_param_0'

Now short pointers in a function declaration are emitted as b32 if
--nvptx-short-ptr is set.

Differential Revision: https://reviews.llvm.org/D135674
2022-11-15 21:41:33 +03:00
Andrew Savonichev
c38fa7c014 [NVPTX] Fix pointer type for short 32-bit pointers
Global variables used to be printed as u64/b64 even when
-nvptx-short-ptr is set.

Differential Revision: https://reviews.llvm.org/D127668
2022-11-15 21:39:34 +03:00
Dmitry Vassiliev
c6a199fb4f [NVPTX] Emit pragma nounroll for llvm.loop.unroll.count=1
Emit pragma nounroll for llvm.loop.unroll.count=1 (#pragma unroll 1).

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D137991
2022-11-15 04:30:00 +04:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Jakub Chlanda
8407fdbd69 [NVPTX] Support neg{.ftz} for f16 and f16x2
Differential Revision: https://reviews.llvm.org/D135428
2022-10-13 10:48:33 -07:00
Luke Drummond
940fa35ece [NVPTX] Fix a segfault for bitcasted calls with byval params
`getFunctionParamOptimizedAlign` was being passed a null function
argument when getting the callee of a bitcasted function symbol. This is
because `CallBase::getCalledFunction` does not look through bitcasts.

There is already code to handle this case in
`NVPTXTargetLowering::getArgumentAlignment`, which is now hoisted into
an NVPTX util.

The alignment computation now gracefully handles computing alignment of
virtual functions with a check for null.
2022-10-11 15:12:25 +01:00
Andrew Savonichev
d420110a1e [NVPTX] Fix constant expression initializers for global variables
Before this patch the code in printScalarConstant was unable to handle
nested constant expressions like (gep (addrspacecast ptr)) and crashed
with:

LLVM ERROR: Unsupported expression in static initializer:
  addrspacecast ([4 x i8] addrspace(1)* @ga to [4 x i8]*)

We can use lowerConstantForGV instead which is a customized version of
lowerConstant that supports generic() and nested expressions.

Differential Revision: https://reviews.llvm.org/D127878
2022-10-04 00:29:42 +03:00
Andrew Savonichev
5585d99835 [NVPTX] Fix issues in ptxas integration to LIT tests
1) Fixed a typo in PTXAS_EXECUTABLE CMake variable (PXTAS -> PTXAS).

2) Version check was implemented incorrectly,
   now version (major, minor) is converted to int for comparison.

3) ptxas -arch argument was incorrect (or missing) in 3 tests.

Differential Revision: https://reviews.llvm.org/D127866
2022-10-04 00:29:42 +03:00
Shivam Gupta
e2632fbcdd [NVPTX] Use MBB.begin() instead MBB.front() in NVPTXFrameLowering::emitPrologue
The second argument of `NVPTXFrameLowering::emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB)` is the first MBB of the MF. In that function, it assumes the first MBB always contains instructions, so it gets the first instruction by MachineInstr *MI = &MBB.front();. However, with the reproducer/test case attached, all instructions in the first MBB is cleared in a previous pass for stack coloring. As a consequence, MBB.front() triggers the assertion that the first node is actually a sentinel node. Hence we are using MachineBasicBlock::iterator to iterate over MBB.

Fix #52623.

Differential Revision: https://reviews.llvm.org/D132663
2022-09-14 08:30:55 +05:30
Craig Topper
efd5acf120 [LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB.
This is the ultimate fallback code if UADDO isn't supported.

If the target uses 0/1 we used one compare, but if the target doesn't
use 0/1 we emitted two compares. Regardless of boolean constants we
should only need to check that the Result is less than one of the
original operands. So we only need one compare.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D133708
2022-09-13 09:07:56 -07:00
Benjamin Kramer
3ccaabe051 [NVPTX] Lower llvm.roundeven to cvt.rni 2022-08-25 13:36:22 +02:00
Dmitry Vassiliev
9174a5e9a8 [NVPTX] SHL.64 $r, 31 cannot be converted to a mulwide.s32
In order to convert to mulwide.s32, we compute the 2nd operand as MulWide.32 $r, (1 << 31).
(1 << 31) is interpreted as a negative number, and is not equivalent to the original instruction.
The code `int64_t r = (int64_t)a << 31;` incorrectly compiled to `mul.wide.s32 %rd7, %r1, -2147483648;`

Reviewed By: jchlanda

Differential Revision: https://reviews.llvm.org/D132516
2022-08-24 11:39:41 +02:00
Kjetil Kjeka
ff1920d106 [NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing
Today llc will crash when attempting to use non-power-of-two integer types as
function arguments or returns. This patch enables passing non standard integer
values in functions by promoting them before store and truncating after load.

The main motivation of implementing this change is that rust casts small structs
(less than pointer size) into an integer of the same size. As an example, if a
struct contains three u8 then it will be passed as an i24. This patch is a step
towards enabling rust compilation to ptx while retaining the target independent
optimizations.

More context can be found in https://github.com/llvm/llvm-project/issues/55764

Differential Revision: https://reviews.llvm.org/D129291
2022-07-22 14:14:12 -07:00
Artem Belevich
35029d8374 Changed EOL to UNIX. NFC. 2022-07-22 14:11:36 -07:00
Igor Kudrin
32eed8828e Reapply "[NVPTX] Use the mask() operator to initialize packed structs with pointers"
The original patch revealed an issue of reading incorrect values on BE hosts.
That is now changed to use `endian::read32le()` and `endian::read64le()`.

Original commit message:

The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.

The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.

Differential Revision: https://reviews.llvm.org/D127504
2022-07-18 20:56:26 +04:00
Igor Kudrin
1e451369d2 Revert "[NVPTX] Use the mask() operator to initialize packed structs with pointers"
The new test fails on BE hosts.

This reverts commit 04e978ccba1e6c8b600b2fbad1a82b4b64ffc34b.
2022-07-18 20:08:39 +04:00
Igor Kudrin
04e978ccba [NVPTX] Use the mask() operator to initialize packed structs with pointers
The current implementation assumes that all pointers used in the
initialization of an aggregate are aligned according to the pointer size
of the target; that might not be so if the object is packed. In that
case, an array of .u8 should be used and pointers should be decorated
with the mask() operator.

The operator was introduced in PTX ISA 7.1, so an error is issued if the
case is detected for an earlier version.

Differential Revision: https://reviews.llvm.org/D127504
2022-07-18 04:08:59 -07:00
Igor Kudrin
9ff10a0d62 [NVPTX] Add missing pass names
Differential Revision:
2022-07-12 07:58:13 -07:00
Igor Kudrin
8958e70ccb [NVPTX] Keep metadata attached to module-scope variables
This helps to preserve the debug information of global variables.

Differential Revision: https://reviews.llvm.org/D127510
2022-06-22 05:51:29 -07:00
Shilei Tian
ecf5b78053 [NVPTX] Enable AtomicExpandPass for NVPTX
This patch enables `AtomicExpandPass` for NVPTX.

Depend on D125652.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D125639
2022-05-20 17:25:28 -04:00
Dmitry Vassiliev
2e7e0975c0 [NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Differential Revision: https://reviews.llvm.org/D119669
2022-04-30 21:55:20 +02:00
Dmitry Vassiliev
8c49ab040c [NVPTX] Add add.cc/addc.cc/sub.cc/subc.cc for i64
PTX supports those instructions for i64 starting from 4.3.
The patch also marks corresponding DAG nodes legal for both i32 and i64.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124698
2022-04-29 15:32:22 -07:00
Andrew Savonichev
0f1b5f115a [NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727
2022-04-28 14:59:45 +03:00
Igor Chebykin
84cf290c84 [NVPTX][tests] Do not run the tests which are not supported by nvptx
Some generic tests are not supported by the nvptx now.  Moreover, they
are no plans to fix the tested features in nvptx. So, suggest to mark
them as UNSUPPORTED

Differential Revision: https://reviews.llvm.org/D123928
2022-04-26 17:26:56 +03:00
Jakub Chlanda
76d1f5eaa8 [NVPTX] Support float <-> 2 x half bitcasts
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.

This was discovered through: https://github.com/intel/llvm/issues/5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124171
2022-04-25 14:37:41 -07:00
Artem Belevich
993054c1c9 Change NVPTX/f16x2-instructions.ll to use unix EOL. NFC 2022-04-25 14:30:23 -07:00
Daniil Kovalev
eb3d64695f [NVPTX] Use opaque pointers in param space vectorization tests
Opaque pointers are enabled by default since D123300, so test IR should be
regenerated correspondingly.

Differential Revision: https://reviews.llvm.org/D123842
2022-04-17 19:08:31 +03:00
Andrew Savonichev
52053aa94f [NVPTX] Disable parens for identifiers starting with '$'
ptxas fails to parse such syntax:

    mov.u64 %rd1, ($str);
    fatal   : Parsing error near '$str': syntax error

A new MCAsmInfo option was added because InParens parameter of
MCExpr::print is not sufficient to disable parens
completely. MCExpr::print resets it to false for a recursive call in
case of unary or binary expressions.

Targets that require parens around identifiers that start with '$'
should always pass MCAsmInfo to MCExpr::print.
Therefore 'operator<<(raw_ostream &, MCExpr&)' should be avoided
because it calls MCExpr::print with nullptr MAI.

Differential Revision: https://reviews.llvm.org/D123702
2022-04-17 18:02:33 +03:00
Andrew Savonichev
5193f2a558 Revert "[NVPTX] Disable parens for identifiers starting with '$'"
This reverts commit 78d70a1c976934587e6d4c5698c348b8f09d9d96.

Failed on Mips32:
https://lab.llvm.org/buildbot#builders/109/builds/36628

   # CHECK: # fixup A - offset: 0, value: ($tmp0), kind: fixup_Mips_26
   <stdin>:580:2: note: possible intended match here
   # fixup A - offset: 0, value: $tmp0, kind: fixup_Mips_26
2022-04-14 21:25:31 +03:00
Andrew Savonichev
78d70a1c97 [NVPTX] Disable parens for identifiers starting with '$'
ptxas fails to parse such syntax:

    mov.u64 %rd1, ($str);
    fatal   : Parsing error near '$str': syntax error

A new MCAsmInfo option was added because InParens parameter of
MCExpr::print is not sufficient to disable parens
completely. MCExpr::print resets it to false for a recursive call in
case of unary or binary expressions.

Differential Revision: https://reviews.llvm.org/D123702
2022-04-14 21:07:43 +03:00
Andrew Savonichev
b6183a57a1 [NVPTX] Fix barrier.ll LIT test
The second parameter should be a multiple of the warp size (32).

PTX ISA spec, s9.7.12.1. Parallel Synchronization and Communication
Instructions: bar, barrier

barrier.sync{.aligned}      a{, b};

Operand b specifies the number of threads participating in the
barrier. If no thread count is specified, all threads in the CTA
participate in the barrier. When specifying a thread count, the value
must be a multiple of the warp size.

Differential Revision: https://reviews.llvm.org/D123470
2022-04-14 17:07:53 +03:00
Andrew Savonichev
32949401a8 [NVPTX] Avoid dots in global names
It seems that ptxas cannot parse them:
ptxas fatal: Parsing error near '.2': syntax error

Differential Revision: https://reviews.llvm.org/D123041
2022-04-14 17:07:52 +03:00
Andrew Savonichev
4cef5c397d [NVPTX] .attribute(.managed) is only supported for sm_30 and PTX 4.0
PTX ISA spec, s5.4.8. Variable Attribute Directive: .attribute

PTX ISA Notes
Introduced in PTX ISA version 4.0.

Target ISA Notes
.managed attribute requires sm_30 or higher.

Differential Revision: https://reviews.llvm.org/D123040
2022-04-14 17:07:52 +03:00
Andrew Savonichev
230f326964 [NVPTX] shfl.sync is introduced in PTX 6.0
PTX ISA spec, s9.7.8.6. Data Movement and Conversion Instructions:
shfl.sync

PTX ISA Notes
Introduced in PTX ISA version 6.0.

Target ISA Notes
Requires sm_30 or higher.

Differential Revision: https://reviews.llvm.org/D123039
2022-04-14 17:07:51 +03:00
Andrew Savonichev
369adba043 [NVPTX] 64-bit atom.{and,or,xor,min,max} require sm_32 or higher
PTX ISA spec, s9.7.12.4. Parallel Synchronization and Communication
Instructions: atom

Target ISA Notes
64-bit atom.{and,or,xor,min,max} require sm_32 or higher.

Differential Revision: https://reviews.llvm.org/D123038
2022-04-14 17:07:51 +03:00
Johannes Doerfert
0f070bee82 [NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers
Differential Revision: https://reviews.llvm.org/D123522
2022-04-12 16:42:50 -05:00
Matt Arsenault
9fdd25848a Transforms: Fix code duplication between LowerAtomic and AtomicExpand 2022-04-08 19:06:36 -04:00
Dávid Bolvanský
f02a0a69af [NFCI] Fixed missing colon in CHECK directives 2022-04-03 11:52:38 +02:00