55755 Commits

Author SHA1 Message Date
Mirko
4d3c427f33
[CodeGen] Use first EHLabel as a stop gate for live range shrinking (#114195)
This fixes issue #114194

The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.

This results in a portion of the landingpad missing due to being hoisted
in front of the label.
2024-11-01 19:13:18 -07:00
Alex MacLean
8ff60c4d47
[NVPTX] Add support for nvvm.flo.[us] intrinsics (#114489)
Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to
a PTX `bfind` instruction.
See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind)

The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position
of the leading one, returning either it's offset from the most or least
significant bit.

The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position
of the leading non-sign bit, returning either it's offset from the most
or least significant bit.
2024-11-01 16:35:43 -07:00
Alex MacLean
57183b6fe1
[NVPTX] Add support for stacksave, stackrestore intrinsics (#114484)
Add support for the '`@llvm.stacksave`' and '`@llvm.stackrestore`'
intrinsics to NVPTX. These are implemented with the `stacksave` and
`stackrestore` PTX instructions respectively.

See [PTX ISA 9.7.17. Stack Manipulation Instructions]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions).
2024-11-01 13:26:28 -07:00
Matt Arsenault
8e61aaa021
AMDGPU: Fix illegal commute with frame index (#114497)
In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started
being treated more like registers, rather than immediates. Update
the commute logic to avoid failing the verifier by moving illegal
SGPR operands in place of a frame index.
2024-11-01 10:02:29 -07:00
Luke Lau
faa385a9f4 [RISCV] Add tests for length changing shuffles
Tests taken from Luke's 88147 with minimal changes by me (preames).

The main case of interest here is when mask length is less than source
length (i.e. length is decreasing).  We often scalarize these, which
on RISCV can be quite painful.
2024-11-01 09:51:39 -07:00
Krzysztof Drewniak
ea33af63de
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
2024-11-01 11:13:29 -05:00
peterbell10
b74e588e1f
[NVPTX] Don't use stack memory when bitcasting to/from v2i8 (#113928)
`v2i8` is an unsupported type, so we hit the default legalization rules
which perform the bitcast in stack memory and is very inefficient on
GPU.

This adds a custom lowering where we pack `v2i8` into `i16` and from
there use another bitcast node to reach the final desired type. And also
the inverse unpacking `i16` into `v2i8`.
2024-11-01 08:02:43 -07:00
Philip Reames
58f525a23c [RISCV] Add tests for deinterleave shuffles w/o vnsrl.vv
With SEW=64, the vnsrl trick we primary rely on does not work.  This
is handled correctly today, but we have fairly minimal testing of the
resulting shuffles which makes it hard to demonstrate value of an
upcoming change.
2024-11-01 08:02:04 -07:00
Oliver Stannard
33411d5207
[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433)
When doing a call from CMSE secure state to non-secure state for
v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and
restore the FP registers around the call. These instructions both check
the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents
of the FP registers are not secret) they execute as no-ops.

This causes a problem when CONTROL_S.SFPA==0 before the call, which
happens if there are no floating-point instructions executed between
entry to secure state and the call. If this is the case, then the VLSTM
instruction will do nothing, leaving the save area in the stack
uninitialised. If the called function returns a value in floating-point
registers, the call sequence includes an instruction to copy the return
value from a floating-point register to a GPR, which must be before the
VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM
will fully execute, and load the uninitialised stack memory into the FP
registers.

This causes two problems:
* The FP register file is clobbered, including all of the callee-saved
  registers, which might contain live values.
* The stack region might contain secret values, which will be leaked to
  non-secure state through the floating-point registers if/when we
  return to non-secure state.

The fix is to insert a `vmov s0, s0` instruction before the VLSTM
instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and
VLSTM instruction.

CVE: https://www.cve.org/cverecord?id=CVE-2024-7883
Security bulletin:
https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability
2024-11-01 09:36:13 +00:00
Daniil Kovalev
da083e358e
[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811)
This re-applies #96164 after revert in #102434.

Support the following relocations and assembly operators:

- `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`)
- `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`)
- `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`)

`LOADgotAUTH` pseudo-instruction is introduced which is later expanded to
actual instruction sequence like the following.

```
adrp x16, :got_auth:sym
add x16, x16, :got_auth_lo12:sym
ldr x0, [x16]
autia x0, x16
```

If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT
load is lowered similarly to `LOADgotAUTH`.

```
@var = global i32 0
define ptr @resign_globalvar() {
  ret ptr ptrauth (ptr @var, i32 3, i64 43)
}
```

If FPAC bit is not set and auth instruction is emitted, a check+trap sequence
similar to one used for `AUT` pseudo is emitted to ensure auth success.

Both SelectionDAG and GlobalISel are suppported.
For FastISel, we fall back to SelectionDAG.

Tests starting with 'ptrauth-' have corresponding variants w/o this prefix.

See also specification
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
2024-11-01 12:21:10 +03:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Thorsten Schütt
8e3772744d
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-11-01 06:10:26 +01:00
Hervé Poussineau
6fa1647a47
[MC][Mips] Rename MipsMCAsmInfo to MipsELFMCAsmInfo (#112592)
Also change MipsAsmPrinter::emitStartOfAsmFile to emit ELF-related
sections only when using ELF output file format.
2024-11-01 08:42:34 +08:00
Ruiling, Song
54d31bde32
Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)" (#114347)
This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.
2024-11-01 08:29:59 +08:00
Justin Fargnoli
a1987beac5
Reland "[NVPTX] Prefer prmt.b32 over bfi.b32" (#114326)
Fix
[failure](https://github.com/llvm/llvm-project/pull/110766#discussion_r1796832635)
identified by @akuegel.

---

In [[NVPTX] Improve lowering of
v4i8](cbafb6f2f5)
@Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX
instructions. @Artem-B did this because:
(https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911)

Under the hood byte extraction/insertion ends up as BFI/BFE
instructions, so we may as well do that in PTX, too.
https://godbolt.org/z/Tb3zWbj9b

However, the example that @Artem-B linked was targeting sm_52. On modern
architectures, ptxas uses prmt.b32.
[Example](https://godbolt.org/z/Ye4W1n84o).

Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
2024-10-31 16:09:20 -07:00
Thorsten Schütt
aa70d846b0
[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006)
{nxv8s16, s16} fails to select.
{nxv16s8, s8} no patterns available.
2024-10-31 22:20:08 +01:00
zhijian lin
674574d25c
Promote 32bit pseudo instr that infer extsw removal to 64bit in PPCMIPeephole (#85451)
Fixes:   https://github.com/llvm/llvm-project/issues/71030

Bug only happens in 64bit involving spills. Since we don't know when the
spill will happen, all instructions in the chain used to deduce sign
extension for eliminating 'extsw' will need to be promoted to 64-bit
pseudo instructions.

The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA,
ISEL to EXTSH8, LHA8, ISEL8
2024-10-31 15:49:36 -04:00
Matt Arsenault
e3222e6f80
AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408)
We need a non-atomic path if flat may access private.
2024-10-31 11:46:13 -07:00
Zaara Syeda
155956834a
Fix failing test gcov_ctr_ref_init.ll (#114428)
Fix test failing after merge of commit:
ccddd136024305be8b6aa77e4ce576d8e6521529
2024-10-31 13:08:16 -04:00
Simon Pilgrim
9fb4bc5bf4
[DAG] SimplifyMultipleUseDemandedBits - ignore SRL node if we're just demanding known sign bits (#114389)
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.

We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
2024-10-31 16:40:29 +00:00
WÁNG Xuěruì
f246b5f547
[LoongArch] Support bswap for LSX/LASX VTs (#114171)
On top of #114170
2024-11-01 00:38:13 +08:00
hev
f7a96dc664
[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation (#113932)
Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
2024-11-01 00:08:15 +08:00
Zaara Syeda
ccddd13602
Enable aggressive constant merge in GlobalMerge for AIX (#113956)
Enable merging all constants without looking at use in GlobalMerge by
default to replace PPCMergeStringPool pass on AIX.
2024-10-31 11:22:48 -04:00
Matt Arsenault
1d0370872f
AMDGPU: Expand flat atomics that may access private memory (#109407)
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.

Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
2024-10-31 08:08:48 -07:00
Matt Arsenault
db5bcb24c2
GlobalISel: Fix combine duplicating atomic loads (#111730)
The sext_inreg (load) combine was not deleting the old load instruction,
and it would never be deleted if volatile or atomic.
2024-10-31 07:55:12 -07:00
Sander de Smalen
41448c1d07 [AArch64] NFC: Add RUN line for +sve2 for sve-intrinsics-perm-select.ll
The codegen for SVE and SVE2 may be different (e.g. for splice and ext).
A follow-up patch will improve codegen for EXT.
2024-10-31 14:46:00 +00:00
Matt Arsenault
12409024d3
AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721)
Atomic loads are handled differently from the DAG, and have separate opcodes
and explicit control over the extensions, like ordinary loads. Add
new patterns for these.

There's room for cleanup and improvement. d16 cases aren't handled.

Fixes #111645
2024-10-31 07:44:52 -07:00
WÁNG Xuěruì
5581e43a2b
[LoongArch][NFC] Pre-commit tests for LSX/LASX bswap codegen (#114170) 2024-10-31 21:10:26 +08:00
Nathan Gauër
cf3d6fded9
[SPIR-V] Re-enable -verify-machineinstrs on tests (#114388)
Many tests had this flag removed because of the G_BITCAST emission
issue. Now that the PR is merged, we can re-enable this additional
check.

2 tests (basic_int_types) just have the TODO removed because they are
not useful for SPIR-V as-is: SPIR-V requires reg2mem/mem2reg to run,
which removes all the body. Integers are used in other spirv tests, and
seems like testing for spirv32/64 and relying on others for the logical
target coverage should be fine.

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-10-31 13:55:30 +01:00
Benjamin Maxwell
89a8c71db6
[SDAG] Support expanding FSINCOS to vector library calls (#114039)
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
2024-10-31 12:41:43 +00:00
Pengcheng Wang
18f0f70934
[RISCV] Support llvm.masked.expandload intrinsic (#101954)
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes #101914.
2024-10-31 20:03:58 +08:00
SpencerAbson
0800351da4
[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034)
Moving elements from a scalable vector to a fixed-lengh vector should
use[ INS (vector, element)
](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-)
when we know that the extracted element is in the bottom 128-bits of the
scalable vector. This avoids inserting unecessary UMOV/FMOV
instructions.
2024-10-31 10:36:00 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
WANG Rui
862074fa57 [LoongArch][NFC] Pre-commit tests for the adjacency of expanded pseudo-insns 2024-10-31 16:59:41 +08:00
Stanislav Mekhanoshin
7cd29741fa
[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296)
The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32.
To allow a corresponding builtin to be overloaded the same way as
int_amdgcn_mov_dpp we need it to be able to split unsupported values.
2024-10-31 01:15:25 -07:00
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
Luke Lau
6da5968f5e
[RISCV] Lower scalar_to_vector for supported FP types (#114340)
In https://reviews.llvm.org/D147608 we added custom lowering for
integers, but inadvertently also marked it as custom for scalable FP
vectors despite not handling it.

This adds handling for floats and marks it as custom lowered for
fixed-length FP vectors too.

Note that this doesn't handle bf16 or f16 vectors that would need
promotion, but these scalar_to_vector nodes seem to be emitted when
expanding them.
2024-10-31 13:15:17 +08:00
Thorsten Schütt
6effab990c
Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353)
Reverts llvm/llvm-project#114310
2024-10-31 05:41:16 +01:00
Thorsten Schütt
6bf214b7c6
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-10-31 04:56:41 +01:00
Adam Yang
948249d804
Revert "[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic" (#114322)
Reverts llvm/llvm-project#111884
2024-10-30 20:44:54 -07:00
Feng Zou
8127162427
[X86][AMX] Support AMX-FP8 (#113850)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-31 10:14:25 +08:00
Luke Lau
1cb599835c [RISCV] Remove redundant +zfh from +zvfh[min] tests. NFC
In the vast majority of f16 tests we don't end up emitting any scalar
code that needs +zfh, so remove it.
2024-10-31 06:51:39 +08:00
Vyacheslav Levytskyy
c616f24bcb
[SPIR-V] Do instruction selection for G_BITCAST on an earlier stage (#114216)
This PR implements instruction selection for G_BITCAST on an earlier
stage to avoid MachineVerifier complains on subtle semantics difference
between G_BITCAST and OpBitcast.

We do instruction selections for OpBitcast after IR Translation instead
of calling MIB.buildBitcast() generating the general op code G_BITCAST,
because when MachineVerifier validates G_BITCAST we see a check of a
kind: 'if Source Type is equal to Destination Type then report error
"bitcast must change the type"'. This doesn't take into account the
notion of a typed pointer that is important for SPIR-V where a user may
and should use bitcast between pointers with different pointee types
(https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

It's important for correct lowering in SPIR-V, because interpretation of
the data type is not left to instructions that utilize the pointer, but
encoded by the pointer declaration, and the SPIRV target can and must
handle the declaration and use of pointers that specify the type of data
they point to.

It's not feasible to improve validation of G_BITCAST using just
information provided by low level types of source and destination.
Therefore we don't produce G_BITCAST as the general op code with
semantics different from OpBitcast, but rather lower to OpBitcast
immediately.

See discussion in https://github.com/llvm/llvm-project/pull/110270 for
even more context.
2024-10-30 20:49:21 +01:00
Steven Perron
d8295e2eec
[SPIRV][HLSL] Handle arrays of resources (#111564)
This commit adds the ability to get a particular resource from an array
of resources using the handle_fromBinding intrinsic.

The main changes are:

1. Create an array when generating the type.
2. Add capabilities from

[SPV_EXT_descriptor_indexing](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_descriptor_indexing.html).

We are still missing the ability to declare a runtime array. That will
be done in a follow up PR.
2024-10-30 15:01:02 -04:00
Thorsten Schütt
b3bb6f18bb
[GlobalISel] Import samesign flag (#114267)
Credits: https://github.com/llvm/llvm-project/pull/111419

Fixes icmp-flags.mir

First attempt: https://github.com/llvm/llvm-project/pull/113090

Revert: https://github.com/llvm/llvm-project/pull/114256
2024-10-30 19:56:25 +01:00
Craig Topper
408c84f35b
[RISCV] Add hasPostISelHook to sf.vfnrclip pseudo instructions. (#114274)
Add Uses = [FRM] to the underlying MC instructions.
    
Tweak a couple test cases so the MachineVerifier would have caught this.
2024-10-30 11:52:49 -07:00
Craig Topper
c3724ba866
[RISCV] Add OperandType for vector rounding mode operands. (#114179)
Use TSFlags to distinquish which type of rounding mode it is. We use the same tablegen base classes for vxrm and frm sometimes so its hard to have different types for different instructions.
2024-10-30 11:46:15 -07:00
Changpeng Fang
ca1154d1d4
AMDGPU: Disable pattern matching "x<<32-y>>32-y" to "bfe x, 0, y" (#114279)
It is not correct to lower "x<<32-y>>32-y" to "bfe x, 0, y". When y
equals to 32, the left-hand side is still x (unchanged), however, the
right-hand side will be evaluated to 0. So it is not always correct to
do such transformation.

We may be able to keep the pattern for immediate y while y is within [0,
31]. However, the immediate operands of the sub (32 - y) are easily
folded, and "(x << imm) >> imm" will be lowered to "and x,
(2^(32-imm))-1" anyway. So no bfe matching is needed.
2024-10-30 11:07:15 -07:00
Jay Foad
311c0772f9 [AMDGPU] Fix test failures after #114232 and #114200 2024-10-30 16:51:44 +00:00
Jay Foad
6bf4476ffb
[AMDGPU] Fix @llvm.amdgcn.cs.chain with callee not provably uniform (#114200)
The correct behavior is to insert a readfirstlane. This worked except
for an inappropriate assertion in SITargetLowering::LowerCall.
2024-10-30 16:18:29 +00:00