NVPTXTargetLowering::getFunctionParamOptimizedAlign, which was introduces in
D120129, contained a poorly designed assertion checking that a function with
internal or private linkage is not a kernel. It relied on invariants that
were not actually guaranteed, and that resulted in compiler crash with some
CUDA versions (see discussion with @jdoerfert in D120129). This patch changes
that assertion and makes it use isKernelFunction which is designed exactly for
such checks. This patch also includes a test with IR that caused compiler crash
before.
Differential Revision: https://reviews.llvm.org/D122562
Since function parameters and return values are passed via param space, we
can force special alignment for values hold in it which will add vectorization
options. This change may be done if the function has private or internal
linkage. Special alignment is forced during 2 phases.
1) Instruction selection lowering. Here we use special alignment for function
prototypes (changing both own return value and parameters alignment), call
lowering (changing both callee's return value and parameters alignment).
2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that
belong to param space (or are casted to it). We only handle cases when all
uses of such parameters are loads from it. For such loads, we can change the
alignment according to special type alignment and the load offset. Then,
load-store-vectorizer IR pass will perform vectorization where alignment
allows it.
Special alignment calculated as maximum from default ABI type alignment and
alignment 16. Alignment 16 is chosen because it's the maximum size of
vectorized ld.param & st.param.
Before specifying such special alignment, we should check if it is a multiple
of the alignment that the type already has. For example, if a value has an
enforced alignment of 64, default ABI alignment of 4 and special alignment
of 16, we should preserve 64.
This patch will be followed by a refactoring patch that removes duplicating
code in handling byval and non-byval arguments.
Differential Revision: https://reviews.llvm.org/D120129
Since function parameters and return values are passed via param space, we
can force special alignment for values hold in it which will add vectorization
options. This change may be done if the function has private or internal
linkage. Special alignment is forced during 2 phases.
1) Instruction selection lowering. Here we use special alignment for function
prototypes (changing both own return value and parameters alignment), call
lowering (changing both callee's return value and parameters alignment).
2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that
belong to param space (or are casted to it). We only handle cases when all
uses of such parameters are loads from it. For such loads, we can change the
alignment according to special type alignment and the load offset. Then,
load-store-vectorizer IR pass will perform vectorization where alignment
allows it.
Special alignment calculated as maximum from default ABI type alignment and
alignment 16. Alignment 16 is chosen because it's the maximum size of
vectorized ld.param & st.param.
Before specifying such special alignment, we should check if it is a multiple
of the alignment that the type already has. For example, if a value has an
enforced alignment of 64, default ABI alignment of 4 and special alignment
of 16, we should preserve 64.
This patch will be followed by a refactoring patch that removes duplicating
code in handling byval and non-byval arguments.
Differential Revision: https://reviews.llvm.org/D121549
For '-filetype=null', 'NVPTXTargetStreamer' is not created, so the
return value of 'OutStreamer->getTargetStreamer()' should be checked
before calling the methods.
Differential Revision: https://reviews.llvm.org/D122001
NVVM IR specification defines them with i32 return type:
declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
...
The i32 return value is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
as well as PTX ISA:
9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
match.any.sync.type d, a, membermask;
match.all.sync.type d[|p], a, membermask;
...
Destination d is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
Additionally, ptxas doesn't accept intructions, produced by NVPTX backend.
After this patch, it compiles with no issues.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120499
Declaration and definition attributes must match,
otherwise it may cause issues on linking.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120493
his patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2
instruction.
These two variants were added in PTX7.0, and are supported by sm_75 and above.
Note that this isn't wired with the exp2 llvm intrinsic because the ex2
instruction is only available in its approx variant.
Running ptxas on the assembly generated by the test f16-ex2.ll works as
expected.
Differential Revision: https://reviews.llvm.org/D119157
The texsurf_handle is removed by NVPTXReplaceImageHandles.cpp. There are more than one uses of the texsurf_handle, one of them is a regular function call, and one of them is a texture intrinsic.
The current hacky logic in NVPTXReplaceImageHandles.cpp for CUDA cannot handle such a mixed use. This patch fixes this issue.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D119635
A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Differential Revision: https://reviews.llvm.org/D119669
Instead of determining the alignment based on the pointer element
type (which is incompatible with opaque pointers), make use of
alignment annotations added by the frontend.
In particular, clang will add alignment attributes to OpenCL kernels
since D118894. Other frontends might need to be adjusted to add
the attribute as well.
Differential Revision: https://reviews.llvm.org/D119247
Previously a lot of StoreRetval instructions with undef operand were
generated on NVPTX target when a big struct was returned by value.
It resulted in a lot of unneeded st.param.* instructions in final
assembly. The patch solves the issue by implementing the logic in
NVPTX-specific part of DAG combiner.
Differential Revision: https://reviews.llvm.org/D118973
The patch adds LIT tests for SULD, SUST, TEX and TLD4 instructions as
a follow up for D112232. There are a number of FIXME marks that
highlight possible bugs or missed instruction variants.
Differential Revision: https://reviews.llvm.org/D114367
Texture/sampler/surface operands can be either a register or an
immediate (an index of .texref, .samplerref or .surfref).
TableGen declarations for these instructions used to only have
Int64Regs operands, so this caused issues when machine verifier
is turned on:
*** Bad machine code: Expected a register operand. ***
- function: bar
- basic block: %bb.0 (0x55b144d99ab8)
- instruction: %4:int32regs = SULD_1D_I32_TRAP 0, killed %2:int32regs
- operand 1: 0
The solution is to duplicate these instructions for all possible
operand types (i16imm and Int64Regs). Since this would
essentially double the amount code in TableGen, the patch also
does some refactoring for the original instructions to keep
things manageable.
Differential Revision: https://reviews.llvm.org/D112232
A reserved register:
- is not allocatable
- is considered always live
- is ignored by liveness tracking
NVPTX special registers match the criteria, and marking them as
reserved helps to avoid machine verifier error:
*** Bad machine code: Using an undefined physical register ***
- function: foo
- basic block: %bb.0 (0x557bb178b708)
- instruction: %0:int32regs = MOV_SPECIAL $envreg0
- operand 1: $envreg0
Differential Revision: https://reviews.llvm.org/D113008
TargetExternalSymbol is considered to be an immediate and not a
register, so machine verifier emits an error:
*** Bad machine code: Expected a register operand. ***
- function: static_offset
- basic block: %bb.0 bb (0x560e9b306028)
- instruction: %3:int64regs = MoveParamI64 &static_offset_param_1
- operand 1: &static_offset_param_1
The patch adds variants of this instruction with an immediate operand
for byval arguments on 64-bit and 32-bit targets.
Differential Revision: https://reviews.llvm.org/D113006
Before this patch, flags such as undef were dropped by TII::insertBranch
(used by BranchFolding pass), resulting in the following error from
machine verifier:
*** Bad machine code: Reading virtual register without a def ***
- function: hoge
- basic block: %bb.0 bb (0x562e9c240e68)
- instruction: CBranch %2:int1regs, %bb.3
- operand 0: %2:int1regs
Differential Revision: https://reviews.llvm.org/D113001
Right now when we see -O# we add the corresponding 'default<O#>' into
the list of passes to run when translating legacy -pass-name. This has
the side effect of not using the default AA pipeline.
Instead, treat -O# as -passes='default<O#>', but don't allow any other
-passes or -pass-name. I think we can keep `opt -O#` as shorthand for
`opt -passes='default<O#>` but disallow anything more than just -O#.
Tests need to be updated to not use `opt -O# -pass-name`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112036
These registers are used as operands for instructions that expect an
integer register, so they should be added to Int32Regs or Int64Regs
register classes. Otherwise the machine verifier emits an error for
the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1
environment variable is set:
*** Bad machine code: Illegal physical register for instruction ***
- function: kernel_func
- basic block: %bb.0 entry (0x55c8903d5438)
- instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0
- operand 1: $vrframelocal
$vrframelocal is not a Int64Regs register.
CodeGen/NVPTX/call-with-alloca-buffer.ll
CodeGen/NVPTX/disable-opt.ll
CodeGen/NVPTX/lower-alloca.ll
CodeGen/NVPTX/lower-args.ll
CodeGen/NVPTX/param-align.ll
CodeGen/NVPTX/reg-types.ll
DebugInfo/NVPTX/dbg-declare-alloca.ll
DebugInfo/NVPTX/dbg-value-const-byref.ll
Differential Revision: https://reviews.llvm.org/D110164
The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave.
This patch constrains the tests by specifying -mtriple instead of -march.
Reviewed By: daltenty, jsji, MaskRay
Differential Revision: https://reviews.llvm.org/D110186
Currently, the default alignment is much larger than the actual size of
the vector in memory. Fix this to use a sane default.
For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.
This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway. If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).
Differential Revision: https://reviews.llvm.org/D88994
This will use the python that LLVM was configured to use rather than
python from PATH.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D105224
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.
This is a preliminary commit for https://reviews.llvm.org/D99080
to match fmod frem result must have the dividend sign. Previous implementation
had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was
returning 5 instead of -2. We should just a ftrunc instead of floor when
lowering to get the right behavior.
Differential Revision: https://reviews.llvm.org/D102528
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650