52796 Commits

Author SHA1 Message Date
Simon Pilgrim
53b9d479d5 [X86] i256-add - replace i386 triple X32 check prefixes with X86 and add gnux32 triple tests 2024-01-31 11:04:19 +00:00
Vyacheslav Levytskyy
5a07774fe1
[SPIR-V] Improve how lowering of formal arguments in SPIR-V Backend interprets a value of 'kernel_arg_type' (#78730)
The goal of this PR is to tolerate differences between description of
formal arguments by function metadata (represented by "kernel_arg_type")
and LLVM actual parameter types. A compiler may use "kernel_arg_type" of
function metadata fields to encode detailed type information, whereas
LLVM IR may utilize for an actual parameter a more general type, in
particular, opaque pointer type. This PR proposes to resolve this by a
fallback to LLVM actual parameter types during the lowering of formal
function arguments in cases when the type can't be created by string
content of "kernel_arg_type", i.e., when "kernel_arg_type" contains a
type unknown for the SPIR-V Backend.

An example of the issue manifestation is
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/KernelArgTypeInOpString.ll,
where a compiler generates for the following kernel function detailed
`kernel_arg_type` info in a form of `!{!"image_kernel_data*", !"myInt",
!"struct struct_name*"}`, and in LLVM IR same arguments are referred to
as `@foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData)`.
Both definitions are correct, and the resulting LLVM IR is correct, but
lowering stage of SPIR-V Backend fails to generate SPIR-V type.

```
typedef int myInt;

 typedef struct {
   int width;
   int height;
 } image_kernel_data;

 struct struct_name {
   int i;
   int y;
 };
 void kernel foo(__global image_kernel_data* in,
                 __global struct struct_name *outData,
                 myInt out) {}
```

```
define spir_kernel void @foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData) ... !kernel_arg_type !7 ... {
entry:
  ret void
}
...
!7 = !{!"image_kernel_data*", !"myInt", !"struct struct_name*"}
```

The PR changes a contract of `SPIRVType *getArgSPIRVType(...)` in a way
that it may return `nullptr` to signal that the metadata string content
is not recognized, so corresponding comments are added and a couple of
checks for `nullptr` are inserted where appropriate.
2024-01-31 02:58:50 -08:00
Jay Foad
c2c650f62e
[AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034)
PAL uses ELF REL (not RELA) relocations which can only store a 32-bit
addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI
which require the upper 32 bits of a 64-bit address calculation to be
correct. This means that it is not safe to fold an arbitrary offset into
a GlobalAddressSDNode, so stop doing that.

In practice this is mostly a problem for small negative offsets which do
not work as expected because PAL treats the 32-bit addend as unsigned.
2024-01-31 10:28:23 +00:00
Yingwei Zheng
50e80e06d1
[ValueTracking] Merge cannotBeOrderedLessThanZeroImpl into computeKnownFPClass (#76360)
This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into
`computeKnownFPClass` to improve the signbit inference.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-01-31 18:26:50 +08:00
Yingwei Zheng
89f87c3876
[RISCV][MC] Add MC layer support for the experimental zabha extension (#80005)
This patch implements the zabha (Byte and Halfword Atomic Memory
Operations) v1.0-rc1 extension.
See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc.
2024-01-31 17:06:43 +08:00
Sander de Smalen
dd73666182
[SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294)
This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
expands to
a 'nop', but which stops the register allocator from coalescing a COPY
node when
its use/def crosses a SMSTART or SMSTOP instruction.

For example:

    %0:fpr64 = COPY killed $d0
    undef %2.dsub:zpr = COPY %0       // <- Do not coalesce this COPY
    ADJCALLSTACKDOWN 0, 0
MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
    $d0 = COPY killed %0
    BL @use_f64, csr_aarch64_aapcs

If the COPY would be coalesced, that would lead to:

    $d0 = COPY killed %0

being replaced by:

    $d0 = COPY killed %2.dsub

which means the whole ZPR reg would be live upto the call, causing the
MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:

    str     q0, [sp]   // 16-byte Folded Spill
    smstop  sm
    ldr     z0, [sp]   // 16-byte Folded Reload
    bl      use_f64

which would be incorrect for two reasons:
1. The program may load more data than it has allocated.
2. If there are other SVE objects on the stack, the compiler might use
the
   'mul vl' addressing modes to access the spill location.

By disabling the coalescing, we get the desired results:

    str     d0, [sp, #8]  // 8-byte Folded Spill
    smstop  sm
    ldr     d0, [sp, #8]  // 8-byte Folded Reload
    bl      use_f64
2024-01-31 09:04:13 +00:00
Chia
dc5dca1d01
[RISCV][Isel] Remove redundant vmerge for the scalable vwadd(u).wv (#80079)
Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited.

### Code
```
define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) {
    %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
    %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer
    %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64>
    %ret = add <vscale x 8 x i64> %sa, %y
    ret <vscale x 8 x i64> %ret
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/xsoa5xPrd)
```
vwadd_wv_mask_v8i32:
        li      a0, 42
        vsetvli a1, zero, e32, m4, ta, ma
        vmslt.vx        v0, v8, a0
        vmv.v.i v12, 0
        vmerge.vvm      v24, v12, v8, v0
        vwadd.wv        v8, v16, v24
        ret
```

### After this patch
```
vwadd_wv_mask_v8i32:
        li a0, 42
        vsetvli a1, zero, e32, m4, ta, ma
        vmslt.vx v0, v8, a0
        vsetvli zero, zero, e32, m4, tu, mu
        vwadd.wv v16, v16, v8, v0.t
        vmv8r.v v8, v16
        ret
```
2024-01-31 17:11:07 +09:00
Changpeng Fang
3564666fe1
[AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087)
Make the wmma intrinsic type signatures to be canonical. We need
a type signature as long as the type is not fixed. However, when an
argument's type matches a previous argument's type, we do not need the
signature for this argument.

 This patch fixes three general cases:
  1. add missing signatures
  2. remove signatures for matching arguments
3. reorer the signatures -- return type signature should always appear
first
2024-01-30 23:17:35 -08:00
Craig Topper
8a98091162 [RISCV] Use disjoint flag in or_is_add. 2024-01-30 22:12:28 -08:00
Shengchen Kan
8e77390c06
[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761) 2024-01-31 12:51:03 +08:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00
Congcong Cai
c43fda3efc Revert "[WebAssembly] avoid to use explicit disabled feature"
This reverts commit 1a17f2beb9cd1f5bbaa64502ab5c02ff74c199a4.
2024-01-31 11:20:34 +08:00
Congcong Cai
1a17f2beb9 [WebAssembly] avoid to use explicit disabled feature
In `CoalesceFeaturesAndStripAtomics`, feature string is converted to FeatureBitset and back to feature string. It will lose information about explicit diasbled features.
2024-01-31 11:14:40 +08:00
Billy Laws
c761b4a5e4
[AArch64] Fix variadic tail-calls on ARM64EC (#79774)
ARM64EC varargs calls expect that x4 = sp at entry, special handling is
needed to ensure this with tail calls since they occur after the
epilogue and the x4 write happens before.

I tried going through AArch64MachineFrameLowering for this, hoping to
avoid creating the dummy object but this was the best I could do since
the stack info that uses isn't populated at this stage,
CreateFixedObject also explicitly forbids 0 sized objects.
2024-01-30 18:32:15 -08:00
PiJoules
a356e6ccad
[SelectionDAG] Expand fixed point multiplication into libcall (#79352)
32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom
nodes during expansion which will cause fixed point multiplication of
_Accum types to fail with fixed point arithmetic. Prior to this, we just
happen to use fixed point multiplication on platforms that happen to
support these MULHS/MUL_LOHI.

This patch attempts to check if the multiplication can be done via
libcalls, which are provided by the arm runtime. These libcall attempts
are made elsewhere, so this patch refactors that libcall logic into its
own functions and the fixed point expansion calls and reuses that logic.
2024-01-30 13:58:55 -08:00
Vyacheslav Levytskyy
9e02e8f1a7
fix producing multiple identical opaque pointer types (#79060)
This PR fixes https://github.com/llvm/llvm-project/issues/79057 and
improves code generation for opaque pointers by replacing the culprit
SPIRVGlobalRegistry::getOpTypePointer() call with a more appropriate
SPIRVGlobalRegistry::getOrCreateSPIRVPointerType() call. The latter
function works together with the `DuplicatesTracker`
(`SPIRVGeneralDuplicatesTracker DT;` from `class SPIRVGlobalRegistry`)
to trace existence of previous definitions of opaque pointers. This
allows to produce just one `OpTypePointer` command for all identical
opaque pointers definitions and to return the very same type record for
subsequent `SPIRVGlobalRegistry::createSPIRVType()` invocations.

This PR alone improves code generation by producing a single needed
definition per all opaque pointers to i8 of the same address space
instead of multiple identical definitions produced before the patch.
From the root cause analysis of
https://github.com/llvm/llvm-project/issues/79057 we see also that this
PR resolves the problem of inconsistency between keeping multiple
instruction for identical opaque pointer types and just a single record
for all such instructions in the `DuplicatesTracker`, and so it also
resolves the issue with crashes on creation of a struct with opaque
pointer fields due to the fact that now such struct fields refer to the
same operand `<id>` having a required record in the data structure used
for dependencies analysis (see
https://github.com/llvm/llvm-project/issues/79057).
2024-01-30 18:11:53 +01:00
Vyacheslav Levytskyy
39483797b8
prevent undefined behaviour of SPIR-V Backend non-asserts builds when dealing with token type (#78437)
The goal of this PR is to fix the issue when use of token type in LLVM
intrinsic causes undefined behavior of SPIR-V Backend code generator
when assertions are disabled:
https://github.com/llvm/llvm-project/issues/78434

Among possible fix options, discussed in the
https://github.com/llvm/llvm-project/issues/78434 issue description, the
option to generate a meaningful error before execution arrives at the
`llvm_unreachable` call looks like a better solution for now, because
SPIR-V doesn't support token type anyway without additional extensions.

The PR is to generate a user-friendly error message and exit without
generating a stack dump when such a usage of token type was detected
that would lead to undefined behavior of SPIR-V Backend code generator.
2024-01-30 18:10:57 +01:00
Vyacheslav Levytskyy
b9d623105d
generate a name of an unnamed global variable for Instruction Selection (#78293)
The goal of this PR is to fix the issue of global unnamed variables
causing SPIR-V Backend code generation to crash:
https://github.com/llvm/llvm-project/issues/78278

The reason for the crash is that GlobalValue's getGlobalIdentifier()
would fail for unnamed global variable when trying to access the first
character of the name (see lib/IR/Globals.cpp:150). This leads to assert
in Debug and undefined behaviour in Release builds.

The proposed fix generates a name of an unnamed global variable as
__unnamed_<unsigned number>, in a style of similar existing LLVM
implementation (see lib/IR/Mangler.cpp:131). A new class member variable
is added into `SPIRVInstructionSelector` class to keep track of the
number we give to anonymous global values to generate the same name
every time when this is needed.

The patch adds a new LIT test with the smallest implementation of
reproducer ll code.
2024-01-30 18:09:52 +01:00
Florian Hahn
d1e162e5d9
[AArch64] Add custom lowering for load <3 x i8>. (#78632)
Add custom combine to lower load <3 x i8> as the more efficient sequence
below:
   ldrb wX, [x0, #2]
   ldrh wY, [x0]
   orr wX, wY, wX, lsl #16
   fmov s0, wX

At the moment, there are almost no cases in which such vector operations
will be generated automatically. The motivating case is non-power-of-2
SLP vectorization: https://github.com/llvm/llvm-project/pull/77790
2024-01-30 14:04:27 +00:00
Florian Hahn
6251b6bd8d
[AArch64] Add tests with sext of vec3 loads.
Another round of additional tests for
https://github.com/llvm/llvm-project/pull/7863
with different sext/zext and use variants.
2024-01-30 13:21:51 +00:00
Shengchen Kan
e5054fb5c6 [X86][test] Update CodeGen/X86/popcnt.ll after #78545 2024-01-30 18:38:16 +08:00
XinWang10
1a219e989f
[X86] Support EVEX compression from MOVBErr to BSWAP (#79775)
APX promoted MOVBE instructions were supported in #77431. The reg2reg
variants of MOVBE are newly introduced by APX and can be optimized to
BSWAP instruction when the 2 register operands are same.

This patch adds manual entries for MOVBErr instructions when we do ndd
to non-ndd compression #77731.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-30 16:57:51 +08:00
XinWang10
5910e34a2f
[X86][MC] Support encoding optimization & assembler relaxation about immediate operands for APX instructions (#78545)
Encoding optimization:
```
mi/mi32 -> mi8
ri/ri32 -> ri8
```
if the immediate operand is 8-bit wide.

Assembler relaxation:
```
mi8 -> mi/mi32
ri8 -> ri/ri32
```
If the immediate operand is a symbol expression and it's value is
unknown.
2024-01-30 14:21:06 +08:00
Arthur Eubanks
198652a0ff
[X86] Treat __start_*/__stop_* symbols as large (#79909)
Followup to #79884.

The linker adds __start_foo/__stop_foo symbols pointing to the
beginning/end of the foo section. These can be far away from text, so
treat them as large symbols under the medium/large code models.
Performance to access these is almost certainly not important.
2024-01-29 21:00:16 -07:00
Liao Chunyu
45188c64db
[DAGCombiner] Use generalized pattern matcher in foldBoolSelectToLogic (#79101)
support vp.select
    
 TODO: Possibly other functions could be supported, eg: SimplifySelect()
2024-01-30 10:26:51 +08:00
Justin Fargnoli
577738a12d
Revert "Disable incorrect peephole optimizations" (#79916)
This reverts commit ff77058141e8026357ca514ad0d45c6c50921290.
2024-01-29 16:22:07 -08:00
Justin Fargnoli
ff77058141
Disable incorrect peephole optimizations 2024-01-29 15:54:40 -08:00
Jivan Hakobyan
0461448313
[RISCV][ISel] Add ISel support for experimental Zimop extension (#77089)
This implements ISel support for mopr[0-31] and moprr[0-7] instructions
for 32 and 64 bits

---------

Co-authored-by: ln8-8 <lyut.nersisyan@gmail.com>
2024-01-29 15:24:00 -08:00
Craig Topper
7855703194 [RISCV] Move vp.splice tests into rvv directory. NFC 2024-01-29 15:01:52 -08:00
Arthur Eubanks
d6e07e0845
[X86] Treat __ehdr_start as large (#79884)
The __ehdr_start symbol is added by the linker and points to the ELF
file headers, which can be very far away from text. Treat it as a large
symbol under the medium/large code models. Performance to access
__ehdr_start is almost certainly not important.

There are a couple of other symbols that the linker adds [1], but this
is the most relevant one that may be far away from text.

[1]
547c395b27/lld/ELF/Writer.cpp (L226)
2024-01-29 14:25:40 -08:00
Joseph Huber
e633807a1f
[NVPTX] Add builtin support for 'globaltimer' (#79765)
Summary:
This patch adds support for `globaltimer` to match `clock` and
`clock64`. See the PTX ISA reference for details. This patch does not
implement the `hi` or `lo` variants for brevity as they can be obtained
from this with the cost of an additional register.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#special-registers-globaltimer-globaltimer-lo-globaltimer-hi
2024-01-29 14:11:54 -06:00
Joseph Huber
ea8014046c
[NVPTX] Add builtin for 'exit' handling (#79777)
Summary:
The PTX ISA has always supported the 'exit' instruction to terminate
individual threads. This patch adds a builtin to handle it. See the PTX
documentation for further details.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit
2024-01-29 14:09:34 -06:00
Joseph Huber
5f12cc912a
[NVPTX] Add builtin support for 'nanosleep' PTX instrunction (#79888)
Summary:
This patch adds a builtin for the `nanosleep` PTX function. It takes
either an immediate or a register and sleeps for [0, 2t] nanoseconds
given t. More information at the documentation:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-nanosleep
2024-01-29 14:07:58 -06:00
Joseph Huber
d492faa7aa
[NVPTX] Add 'activemask' builtin and intrinsic support (#79768)
Summary:
This patch adds support for getting the 'activemask' instruction's value
without needing to use inline assembly. See the relevant PTX reference
for details.


https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-activemask
2024-01-29 14:07:30 -06:00
Simon Pilgrim
3ab5dbb199 [X86] sext-i1.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 18:00:36 +00:00
Simon Pilgrim
2aef33230d [X86] fast-isel-store.ll - cleanup check prefixes
32/64-bit triples and check prefixes were inverted, and missing unwind attribute to strip cfi noise
2024-01-29 18:00:35 +00:00
Simon Pilgrim
cbe5985ff7 [X86] Replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 16:50:32 +00:00
David Green
9520773c46
[AArch64] Don't generate neon integer complex numbers with +sve2. NFC (#79829)
The condition for allowing integer complex number support could also
allow neon fixed length complex numbers if +sve2 was specified. This
tightens the condition to only allow integer complex number support for
scalable vectors.

We could generalize this in the future to generate SVE intrinsics for
fixed-length vectors, but for the moment this opts for the simpler fix.
2024-01-29 16:46:22 +00:00
Alex Bradbury
d833b9d677
[RISCV] Graduate Zicond to non-experimental (#79811)
The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.
2024-01-29 15:58:54 +00:00
Pierre van Houtryve
ce72f78f37
[AMDGPU] Fix mul combine for MUL24 (#79110)
MUL24 can now return a i64 for i32 operands, but the combine was never
updated to handle this case. Extend the operand when rewriting the ADD
to handle it.

Fixes SWDEV-436654
2024-01-29 16:37:20 +01:00
Simon Pilgrim
06f5b956a0 [X86] pmovsx-inreg.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 14:23:08 +00:00
Simon Pilgrim
ccb2810ee3 [X86] anyext.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 14:23:08 +00:00
Simon Pilgrim
8a074c84ff [X86] fixup-bw-copy.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 14:23:08 +00:00
Simon Pilgrim
bc879a9019 [X86] mul-i256.ll - simplify function attributes and remove cfi noise 2024-01-29 13:19:47 +00:00
Simon Pilgrim
3a4a7dcd62 [X86] Replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-29 13:19:47 +00:00
Michal Paszkowski
0fbaf03f70
[SPIR-V] Cast ptr kernel args to i8* when used as Store's value operand (#78603)
Handle a special case when StoreInst's value operand is a kernel
argument of a pointer type. Since these arguments could have either a
basic element type (e.g. float*) or OpenCL builtin type (sampler_t),
bitcast the StoreInst's value operand to default pointer element type
(i8).

This pull request addresses the issue
https://github.com/llvm/llvm-project/issues/72864
2024-01-28 19:30:14 -08:00
chuongg3
a7cfff8dc6
[AArch64][GlobalISel] Lower Shuffle Vector to REV (#79591)
Add lowering for i16 and i32 vectors for Shuffle Vector instructions
with REV mask
2024-01-28 20:35:02 +00:00
chuongg3
2c552d319a
[AArch64][GlobalISel] Legalize G_ABS for Larger/Smaller Vectors (#79117)
Legalize G_ABS for larger/smaller width vectors with legal element sizes

Fallsback for the smaller width vector tests because it is unable to
legalize for G_ANYEXT smaller width vectors
2024-01-28 20:21:38 +00:00
David Green
915c3d9e5a Revert "[AArch64] merge index address with large offset into base address"
This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.
2024-01-28 17:01:21 +00:00
Shengchen Kan
6d7c8a6e06 [X86][test] Update failed tests in 60dbb2cec1bbf65aacf6752a59b0666a23aaa3ae after rebase 2024-01-29 00:32:30 +08:00