80797 Commits

Author SHA1 Message Date
Hubert Tong
0812cde3bf NFC: Make isPPC64 const and use member initializer 2024-11-01 20:41:25 -04:00
Alex MacLean
8ff60c4d47
[NVPTX] Add support for nvvm.flo.[us] intrinsics (#114489)
Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to
a PTX `bfind` instruction.
See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind)

The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position
of the leading one, returning either it's offset from the most or least
significant bit.

The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position
of the leading non-sign bit, returning either it's offset from the most
or least significant bit.
2024-11-01 16:35:43 -07:00
Alex MacLean
57183b6fe1
[NVPTX] Add support for stacksave, stackrestore intrinsics (#114484)
Add support for the '`@llvm.stacksave`' and '`@llvm.stackrestore`'
intrinsics to NVPTX. These are implemented with the `stacksave` and
`stackrestore` PTX instructions respectively.

See [PTX ISA 9.7.17. Stack Manipulation Instructions]
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions).
2024-11-01 13:26:28 -07:00
Simon Pilgrim
8634e358eb
[AArch64][ARM] Avoid some APFloat copies in tablegen patterns. NFC. (#114416)
Either the N->getValueAPF() was being unused or we were failing to make use of it returning a const APFloat&
2024-11-01 18:14:59 +00:00
Matt Arsenault
8e61aaa021
AMDGPU: Fix illegal commute with frame index (#114497)
In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started
being treated more like registers, rather than immediates. Update
the commute logic to avoid failing the verifier by moving illegal
SGPR operands in place of a frame index.
2024-11-01 10:02:29 -07:00
c8ef
b57b3f6425
[NFC] Simple typo correction. (#114548) 2024-11-02 00:40:57 +08:00
Shilei Tian
10a1ea9b53
[NFC][AMDGPU] Remove the empty FPM as well as the adaptor to MPM (#114558) 2024-11-01 12:21:26 -04:00
Krzysztof Drewniak
ea33af63de
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
2024-11-01 11:13:29 -05:00
David Green
95fb7f8cb8 [AArch64] Move FeatureUseFixedOverScalableIfEqualCost with other tuning features. NFC
This was in the with the armv9 architecture extensions.
2024-11-01 15:47:47 +00:00
peterbell10
b74e588e1f
[NVPTX] Don't use stack memory when bitcasting to/from v2i8 (#113928)
`v2i8` is an unsupported type, so we hit the default legalization rules
which perform the bitcast in stack memory and is very inefficient on
GPU.

This adds a custom lowering where we pack `v2i8` into `i16` and from
there use another bitcast node to reach the final desired type. And also
the inverse unpacking `i16` into `v2i8`.
2024-11-01 08:02:43 -07:00
Wang Qiang
b77e40265c
[llvm][NFC] Fix typos: replace “avaliable” with “available” across various files (#114524)
This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:

1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.

These changes have no impact on code functionality, focusing solely on
documentation clarity.

Co-authored-by: wangqiang <wangqiang1@kylinos.cn>
2024-11-01 13:25:04 +00:00
Jay Foad
550501f21c
[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (#114403)
For GFX12 hasTFE is always true because it does not have the buffer load
to LDS instructions.
2024-11-01 10:06:45 +00:00
Oliver Stannard
33411d5207
[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433)
When doing a call from CMSE secure state to non-secure state for
v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and
restore the FP registers around the call. These instructions both check
the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents
of the FP registers are not secret) they execute as no-ops.

This causes a problem when CONTROL_S.SFPA==0 before the call, which
happens if there are no floating-point instructions executed between
entry to secure state and the call. If this is the case, then the VLSTM
instruction will do nothing, leaving the save area in the stack
uninitialised. If the called function returns a value in floating-point
registers, the call sequence includes an instruction to copy the return
value from a floating-point register to a GPR, which must be before the
VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM
will fully execute, and load the uninitialised stack memory into the FP
registers.

This causes two problems:
* The FP register file is clobbered, including all of the callee-saved
  registers, which might contain live values.
* The stack region might contain secret values, which will be leaked to
  non-secure state through the floating-point registers if/when we
  return to non-secure state.

The fix is to insert a `vmov s0, s0` instruction before the VLSTM
instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and
VLSTM instruction.

CVE: https://www.cve.org/cverecord?id=CVE-2024-7883
Security bulletin:
https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability
2024-11-01 09:36:13 +00:00
Daniil Kovalev
da083e358e
[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811)
This re-applies #96164 after revert in #102434.

Support the following relocations and assembly operators:

- `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`)
- `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`)
- `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`)

`LOADgotAUTH` pseudo-instruction is introduced which is later expanded to
actual instruction sequence like the following.

```
adrp x16, :got_auth:sym
add x16, x16, :got_auth_lo12:sym
ldr x0, [x16]
autia x0, x16
```

If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT
load is lowered similarly to `LOADgotAUTH`.

```
@var = global i32 0
define ptr @resign_globalvar() {
  ret ptr ptrauth (ptr @var, i32 3, i64 43)
}
```

If FPAC bit is not set and auth instruction is emitted, a check+trap sequence
similar to one used for `AUT` pseudo is emitted to ensure auth success.

Both SelectionDAG and GlobalISel are suppported.
For FastISel, we fall back to SelectionDAG.

Tests starting with 'ptrauth-' have corresponding variants w/o this prefix.

See also specification
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
2024-11-01 12:21:10 +03:00
Phoebe Wang
c72a751dab
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-01 16:45:03 +08:00
Thorsten Schütt
8e3772744d
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-11-01 06:10:26 +01:00
Hervé Poussineau
6fa1647a47
[MC][Mips] Rename MipsMCAsmInfo to MipsELFMCAsmInfo (#112592)
Also change MipsAsmPrinter::emitStartOfAsmFile to emit ELF-related
sections only when using ELF output file format.
2024-11-01 08:42:34 +08:00
Justin Fargnoli
a1987beac5
Reland "[NVPTX] Prefer prmt.b32 over bfi.b32" (#114326)
Fix
[failure](https://github.com/llvm/llvm-project/pull/110766#discussion_r1796832635)
identified by @akuegel.

---

In [[NVPTX] Improve lowering of
v4i8](cbafb6f2f5)
@Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX
instructions. @Artem-B did this because:
(https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911)

Under the hood byte extraction/insertion ends up as BFI/BFE
instructions, so we may as well do that in PTX, too.
https://godbolt.org/z/Tb3zWbj9b

However, the example that @Artem-B linked was targeting sm_52. On modern
architectures, ptxas uses prmt.b32.
[Example](https://godbolt.org/z/Ye4W1n84o).

Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
2024-10-31 16:09:20 -07:00
Thorsten Schütt
aa70d846b0
[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006)
{nxv8s16, s16} fails to select.
{nxv16s8, s8} no patterns available.
2024-10-31 22:20:08 +01:00
zhijian lin
674574d25c
Promote 32bit pseudo instr that infer extsw removal to 64bit in PPCMIPeephole (#85451)
Fixes:   https://github.com/llvm/llvm-project/issues/71030

Bug only happens in 64bit involving spills. Since we don't know when the
spill will happen, all instructions in the chain used to deduce sign
extension for eliminating 'extsw' will need to be promoted to 64-bit
pseudo instructions.

The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA,
ISEL to EXTSH8, LHA8, ISEL8
2024-10-31 15:49:36 -04:00
Min-Yih Hsu
b9d7117ebd
[RISCV] Assign separate PseudoVSHA2MS_VV opcodes for each SEW (#114317)
The vsha2ms.vv from Zvknh[ab] currently supports both SEW=32 and SEW=64.
It might have different performance characteristics depending on the SEW
on some processors. This patch splits these two different SEWs into
their own VPsuedo opcodes and scheduling classes.

This is effectively a NFC change.
2024-10-31 10:21:14 -07:00
WÁNG Xuěruì
f246b5f547
[LoongArch] Support bswap for LSX/LASX VTs (#114171)
On top of #114170
2024-11-01 00:38:13 +08:00
Artem Belevich
8129b6b53b
[NVPTX, InstCombine] instcombine known pointer AS checks. (#114325)
The change improves the code in general and, as a side effect, avoids
crashing on an impossible address space casts guarded 
by `__isGlobal/__isShared`, which partially fixes 
https://github.com/llvm/llvm-project/issues/112760

It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.

This is #112964 + a small fix for the crash on unintended argument
access which was the root cause to revers the earlier version of the patch.
2024-10-31 09:24:51 -07:00
hev
f7a96dc664
[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation (#113932)
Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
2024-11-01 00:08:15 +08:00
Shilei Tian
9234ae1bbe [NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 2024-10-31 11:44:15 -04:00
Luke Lau
9c7188871c
[RISCV] Cost ordered bf16/f16 w/ zvfhmin reductions as invalid (#114250)
In #111000 we removed promotion of fadd/fmul reductions for bf16 and f16
without zvfh, and marked the cost as invalid to prevent the vectorizers
from emitting them. However it inadvertently didn't change the cost for
ordered reductions, so this moves the check earlier to fix this.

This also uses BasicTTIImpl instead which now assigns a valid but
expensive cost for fixed-length vectors, which reflects how codegen will
actually scalarize them.
2024-10-31 23:36:09 +08:00
Zaara Syeda
ccddd13602
Enable aggressive constant merge in GlobalMerge for AIX (#113956)
Enable merging all constants without looking at use in GlobalMerge by
default to replace PPCMergeStringPool pass on AIX.
2024-10-31 11:22:48 -04:00
Matt Arsenault
1d0370872f
AMDGPU: Expand flat atomics that may access private memory (#109407)
If the runtime flat address resolves to a scratch address,
64-bit atomics do not work correctly. Insert a runtime address
space check (which is quite likely to be uniform) and select between
the non-atomic and real atomic cases.

Consider noalias.addrspace metadata and avoid this expansion when
possible (we also need to consider it to avoid infinitely expanding
after adding the predication code).
2024-10-31 08:08:48 -07:00
Matt Arsenault
12409024d3
AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721)
Atomic loads are handled differently from the DAG, and have separate opcodes
and explicit control over the extensions, like ordinary loads. Add
new patterns for these.

There's room for cleanup and improvement. d16 cases aren't handled.

Fixes #111645
2024-10-31 07:44:52 -07:00
SpencerAbson
c485ee1968
[AArch64] Add assembly/disassembly for zeroing SVE REV{B,H,W,D} and RBIT (#114110)
This patch adds assembly/disassembly for the following SVE2.2
instructions

      - RBIT (zeroing)
      - REVB (zeroing)
      - REVH (zeroing)
      - REVW (zeroing)
      - REVD (zeroing)

- In accordance with:
https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions

Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-10-31 14:30:11 +00:00
Momchil Velikov
b185e925ad
[AArch64] Add assembly/disassembly for {S,U,SU,US}TMOPA instructions (#113946)
The new instructions are described in
https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions

Co-Authored-By:  Marian Lukac <Marian.Lukac@arm.com>
2024-10-31 12:16:17 +00:00
Pengcheng Wang
18f0f70934
[RISCV] Support llvm.masked.expandload intrinsic (#101954)
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes #101914.
2024-10-31 20:03:58 +08:00
Momchil Velikov
95c5042db8
[AArch64] Add assembly/disassembly for {S,SU,US,U}MOP4{A,S} instructions (#113349)
The new instructions are described in
https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions

Co-Authored-By:  Marian Lukac <Marian.Lukac@arm.com>
2024-10-31 11:12:14 +00:00
Sven van Haastregt
22081dc40b
[SPIR-V] Add missing ScalarOpts library (#114384)
Fixes an "undefined reference to `llvm::createRegToMemWrapperPass()'"
linker error introduced by cba70550ccf5 ("[SPIR-V] Fix BB ordering &
register lifetime (#111026)", 2024-10-30).
2024-10-31 11:50:01 +01:00
SpencerAbson
0800351da4
[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034)
Moving elements from a scalable vector to a fixed-lengh vector should
use[ INS (vector, element)
](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-)
when we know that the extracted element is in the bottom 128-bits of the
scalable vector. This avoids inserting unecessary UMOV/FMOV
instructions.
2024-10-31 10:36:00 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Stanislav Mekhanoshin
7cd29741fa
[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296)
The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32.
To allow a corresponding builtin to be overloaded the same way as
int_amdgcn_mov_dpp we need it to be able to split unsupported values.
2024-10-31 01:15:25 -07:00
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
Elvis Wang
a8575c1459
[RISCV] Sink ordered reduction check into FAdd. NFC (#114180) 2024-10-31 13:35:37 +08:00
Luke Lau
6da5968f5e
[RISCV] Lower scalar_to_vector for supported FP types (#114340)
In https://reviews.llvm.org/D147608 we added custom lowering for
integers, but inadvertently also marked it as custom for scalable FP
vectors despite not handling it.

This adds handling for floats and marks it as custom lowered for
fixed-length FP vectors too.

Note that this doesn't handle bf16 or f16 vectors that would need
promotion, but these scalar_to_vector nodes seem to be emitted when
expanding them.
2024-10-31 13:15:17 +08:00
Craig Topper
50896e7ef5 [ARM] Use getSignedConstant. NFC 2024-10-30 21:43:16 -07:00
Thorsten Schütt
6effab990c
Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353)
Reverts llvm/llvm-project#114310
2024-10-31 05:41:16 +01:00
Thorsten Schütt
6bf214b7c6
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-10-31 04:56:41 +01:00
Adam Yang
948249d804
Revert "[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic" (#114322)
Reverts llvm/llvm-project#111884
2024-10-30 20:44:54 -07:00
Craig Topper
55dbacbf07
[RISCV] Remove RISCVISD::VFCVT_X(U)_F_VL by using VFCVT_RM_X(U)_F_VL with DYN rounding mode. NFC (#114306) 2024-10-30 19:16:23 -07:00
Feng Zou
8127162427
[X86][AMX] Support AMX-FP8 (#113850)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-10-31 10:14:25 +08:00
Yingwei Zheng
cf9d1c1486
[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-31 08:10:07 +08:00
Craig Topper
51628faa01
[RISCV] Sink hasPostISelHook = 1 for vector pseudos into the subclasses that set HasRoundModeOp. NFC (#114294) 2024-10-30 16:33:18 -07:00
Craig Topper
847f4ef21b
[X86] Use getAllOnesConstant instead of getConstant(-1). NFC (#114299) 2024-10-30 16:22:23 -07:00
Artem Belevich
04e876e6c6
Revert "[NVPTX] instcombine known pointer AS checks." (#114319)
Reverts llvm/llvm-project#112964

Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665
2024-10-30 15:34:08 -07:00