55866 Commits

Author SHA1 Message Date
Krzysztof Drewniak
3b0f506c87
[AMDGPU] Support nuw and nusw in buffer fat pointer lowering (#115039)
This commit usis the `nuw` flag on `getelemnetptr` to set the `nuw` flag
on buffer offset additions, and also moves from `inbounds` to the looser
`nusw` for the existing case.
2024-11-06 11:42:47 -06:00
Matt Arsenault
aa7941289e
AMDGPU: Fold copy of scalar add of frame index (#115058)
This is a pre-optimization to avoid a regression in a future
commit. Currently we almost always emit frame index with
a v_mov_b32 and use vector adds for the pointer operations. We
need to consider the users of the frame index (or rather, the
transitive users of derived pointer operations) to know whether
the value will be used in a vector or scalar context. This saves
an sgpr->vgpr copy.

This optimization could be more general for any opcode that's
trivially convertible from a scalar to vector form (although this
is a workaround for a proper regbankselect).
2024-11-06 09:10:58 -08:00
Craig Topper
5dc8d61177
[RISCV][GISel] Implement zexti32/zexti16 ComplexPatterns. (#115097) 2024-11-06 08:48:43 -08:00
Matt Arsenault
efe87fbc9d
AMDGPU: Improve vector of pointer handling in amdgpu-promote-alloca (#114144) 2024-11-06 08:47:15 -08:00
dlav-sc
83f92c33a4
[RISCV] fix SP recovery in varargs functions (#114316)
This patch fixes sp recovery in the epilogue in varargs functions when
fp register is presented and second sp adjustment is applied.

Source of the issue: https://github.com/llvm/llvm-project/pull/110809
2024-11-06 19:30:32 +03:00
Yingwei Zheng
f74aed7938
[DAGCombiner] Add basic support for trunc nsw/nuw (#113808)
This patch adds basic support for `trunc nsw/nuw` in SDAG. It will allow
DAGCombiner to further eliminate in-reg `zext/sext` instructions.
2024-11-07 00:23:53 +08:00
Sarah Spall
fb90733e19
[HLSL] implement elementwise firstbithigh hlsl builtin (#111082)
Implements elementwise firstbithigh hlsl builtin.
Implements firstbituhigh intrinsic for spirv and directx, which handles
unsigned integers
Implements firstbitshigh intrinsic for spirv and directx, which handles
signed integers.
Fixes #113486
Closes #99115
2024-11-06 07:31:39 -08:00
yingopq
86e4beb702
[MIPS] LLVM data layout give i128 an alignment of 16 for mips64 (#112084)
Fix parts of #102783.
2024-11-06 16:14:30 +01:00
Oliver Stannard
9b016e3cb2
[ARM] Add early-clobber to MVE VCMLA.f32 (#114995)
This instruction (but not the f16 variant) cannot us the same register
for the output as either of the inputs, so it needs to be marked as
early-clobber.
2024-11-06 14:46:08 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Vyacheslav Levytskyy
5a062191f7
[SPIR-V] Ensure correct pointee types of some OpenCL Extended Instructions' pointer arguments (#114846)
OpenCL Extended Instruction Set Specification defines relations between
return/operand types and pointee type of pointer arguments in case of
remquo, fract, frexp, lgamma_r, modf, sincos and prefetch instructions
(https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html).
This PR ensures correct pointee types of those OpenCL Extended
Instructions' pointer arguments.
2024-11-06 12:44:53 +01:00
hev
cab606c306
[LoongArch] Enable alias analysis by default (#114980)
Enable use of alias analysis during code generation.
2024-11-06 19:30:57 +08:00
Benjamin Maxwell
ea6b8fa4b9
[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792)
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
2024-11-06 11:06:06 +00:00
Simon Pilgrim
c75353313e [X86] combineConcatVectorOps - add 256-bit concat(shuffle(),shuffle()) handling
Improve IsConcatFree detection to handle splat vector-loads (which can be folded as X86ISD::SUBV_BROADCAST_LOAD).

Fixes #114959
2024-11-06 10:47:01 +00:00
Oliver Stannard
2d56de9e7e Revert "[ARM] Add extra tests for CVE-2024-7883 with undef/poison"
Reverting because this causes a test failure in the expensive-checks
buildbot.

This reverts commit ed9dab67e2932baf11bfa514b07b159c3bffd518.
2024-11-06 10:35:44 +00:00
Vyacheslav Levytskyy
ebfafa2511
[SPIR-V] Fix OpFunctionParameter vs. OpTypeFunction types for pointer arguments when there are functions with aggregate arguments (#115044)
The goal of the PR is to ensure that if module contains functions with
mutated signature (due to preprocessing of aggregate types), functions
still are going through re-creating of function type to preserve pointee
type information for arguments.

This fixes a bug when a module with (1) a function having aggregate
arguments and/or return, and (2) at least two functions with signatures
different only wrt. pointee types is translated so that one of two
similar functions gets an incorrect OpFunctionParameter type that is
different from the corresponding OpTypeFunction definition.

A reproducer is attached as a new test case.
2024-11-06 11:17:45 +01:00
David Green
3d4d033cea [AArch64][Arm] Add nested double reduction tests. NFC 2024-11-06 10:08:14 +00:00
Simon Pilgrim
270bfb2f2a [X86] Add test coverage for #114959 2024-11-06 09:44:10 +00:00
Simon Pilgrim
e29d092af8 [X86] getFauxShuffleMask - add ISD::SHL/SRL handling
This is currently mostly the same as the VSHLI/VSRLI handling below, although I've kept them separate as I'm investigating adding non-uniform shift amount handling as a followup
2024-11-06 09:44:10 +00:00
Zhaoxin Yang
8c565de5ec
[LoongArch] Support llvm.lround intrinsics with i32 return type. (#114733)
This is needed by flang, similar to RISCV-64 in
https://reviews.llvm.org/D147195.
2024-11-06 17:34:13 +08:00
Oliver Stannard
ed9dab67e2 [ARM] Add extra tests for CVE-2024-7883 with undef/poison 2024-11-06 09:28:14 +00:00
Wang Pengcheng
37ce18951f [RISCV] Add requirement of asserts
We forgot to add `REQUIRES: asserts` here.
2024-11-06 17:01:24 +08:00
BoyaoWang430
69d0bab826
[RISCV] Add load/store clustering in post machine schedule (#111504)
#73789 added load clustering and #73796 tried to add store clustering.
If post machine schedule is used, previous cluster of load/store which
formed in machine schedule may break. In order to solve this, add
load/sotre clustering to post machine schedule.
2024-11-06 16:21:30 +08:00
Gergely Futo
08411c855f
[RISCV] Correct fcopysign pattern for zdinx (#114954)
Correcting the pattern fixes the following error:
fatal error: error in backend: Cannot select: t17: f64 = fcopysign t5,
t8
2024-11-06 09:10:37 +01:00
Pengcheng Wang
7a5b040e20
[RISCV] Add initial support of memcmp expansion
There are two passes that have dependency on the implementation
of `TargetTransformInfo::enableMemCmpExpansion` : `MergeICmps` and
`ExpandMemCmp`.

This PR adds the initial implementation of `enableMemCmpExpansion`
so that we can have some basic benefits from these two passes.

We don't enable expansion when there is no unaligned access support
currently because there are some issues about unaligned loads and
stores in `ExpandMemcmp` pass. We should fix these issues and enable
the expansion later.

Vector case hasn't been tested as we don't generate inlined vector
instructions for memcmp currently.

Reviewers: preames, arcbbb, topperc, asb, dtcxzyw

Reviewed By: topperc, preames

Pull Request: https://github.com/llvm/llvm-project/pull/107548
2024-11-06 15:44:12 +08:00
Pengcheng Wang
5adb5c05a2
[RISCV] Add tests for memcmp expansion
We add tests for the following cases:
* Length = 0, 1, 2, 3, 4, 5, 6, 7, 8, 15, 16, 31, 32, 63, 64, 127,
  128, runtime.
* Comparisons against zero.
* RUN lines for scalar/vector w/ or w/o strict align.
* Optimize for size.

Reviewers: topperc, preames

Reviewed By: topperc, preames

Pull Request: https://github.com/llvm/llvm-project/pull/107824
2024-11-06 15:12:35 +08:00
Craig Topper
f4270045f4 [RISCV] Add Zfinx/Zdinx RUN lines to rv64d-double-convert-strict.ll and rv64f-float-convert-strict.ll. NFC 2024-11-05 21:48:38 -08:00
Heejin Ahn
492812f613
[WebAssembly] Fix rethrow's index calculation (#114693)
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes #114600.
2024-11-05 21:45:13 -08:00
Craig Topper
cbc7812565 [RISCV] Add Zdinx RUN line to rv64d-double-convert.ll. NFC
We already have a Zfinx RUN line for rv64f-float-convert.ll.
2024-11-05 21:12:09 -08:00
WANG Rui
a165bbddf9 [LoongArch][NFC] Reland "Pre-commit tests for codegen with alias analysis" 2024-11-06 11:54:34 +08:00
WANG Rui
9ba0e5c27d Revert "[LoongArch][NFC] Pre-commit tests for codegen with alias analysis"
This reverts commit 445db93844cb50eeb6f587bef0749c2950b46e70.
2024-11-06 11:45:18 +08:00
Madhur Amilkanthwar
895a8e66c6
[AArch64][GISel] Support neon.abs intrinsic for vector types (#107226)
This patch lowers the intrinsic to G_ABS and thus supports the intrinsic in GISel.
2024-11-06 08:31:46 +05:30
Luke Lau
3a26feb607
[RISCV] Lower fixed-length mgather/mscatter for zvfhmin/zvfbfmin (#114945)
In preparation for allowing zvfhmin and zvfbfmin in
isLegalElementTypeForRVV, this lowers fixed-length masked gathers and
scatters

We need to mark f16 and bf16 as legal in isLegalMaskedGatherScatter
otherwise ScalarizeMaskedMemIntrin will just scalarize them, but we can
move this back into isLegalElementTypeForRVV afterwards.

The scalarized codegen required #114938, #114927 and #114915 to not
crash.
2024-11-06 10:33:06 +08:00
Craig Topper
db21dbd12a [RISCV][GISel] Add constant_fold_cast_op to RISCVPostLegalizerCombiner. 2024-11-05 17:48:54 -08:00
Jon Roelofs
4c3e1e3c4a
[llvm][AsmPrinter] Add an option to print instruction latencies (#113243)
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
2024-11-05 17:28:52 -08:00
ZhaoQi
92be2cb086
[LoongArch] Use LSX for scalar FP rounding with explicit rounding mode (#114766)
LoongArch FP base ISA only have frint.{s/d} instruction which reads the
global rounding mode. Utilize LSX for explicit rounding mode for scalar
ceil/floor/trunc/roundeven calls when -mlsx opend. It is faster than
calling the libm library functions.

Same as what gcc did:
https://gcc.gnu.org/pipermail/gcc-cvs/2023-November/394218.html
2024-11-06 09:26:28 +08:00
Philip Reames
a905203b9e
[RISCV] Prefer strided load for interleave load with only one lane active (#115069)
If only one of the elements is actually used, then we can legally use a
strided load in place of the segment load. Doing so reduces vector
register pressure, so if both segment and strided are believed to be
element/segment at a time, then prefer the strided load variant.

Note that I've seen the vectorizer emitting wide interleave loads to
represent a strided load, so this does happen in practice. It doesn't
matter much for small LMUL*NF, but at large NF can start causing
problems in register allocation.

Note that this patch only covers the fixed vector formation cases. In
theory, we should do the same patch for scalable, but we can currently
only represent NF2 in scalable IR, and NF2 is assumed to be optimized to
better than segment-at-a-time by default, so there's currently nothing
to do.
2024-11-05 16:15:20 -08:00
Craig Topper
339f395ece [RISCV][GISel] Enable commute_constant_to_rhs in RISCVPostLegalizerCombiner. 2024-11-05 15:08:43 -08:00
Craig Topper
a20b902b35 [RISCV][GISel] Copy some Zbb and Zbkb IR tests. NFC
These are copies of SDAG tests with some of the more specialized
cases removed. We can add them later when we're ready to improve them.
2024-11-05 15:08:43 -08:00
Craig Topper
13b5899c29
[SelectionDAGBuilder][X86] Don't form FMAXNUM for f16 vectors if FMAXNUM needs to be promoted. (#114943)
In #70357, I changed a isLegalOrCustom to isLegalOrCustomOrPromote in
visitSelect to enable integer min/max to be formed when the operation
was promoted. Unfortunately, this also affected floating point. For
floating point, fmaxnum may require a libcall so we also need to check
if the operation on the promoted type is legal or custom.

Other changes to RISC-V have seen made the original change untested so
this patch restores the original isLegalOrCustom.

Fixes #114520.
2024-11-05 15:06:37 -08:00
Stanislav Mekhanoshin
6d7e51de5e
[AMDGPU] Extend type support for update_dpp intrinsic (#114597)
We can split 64-bit DPP as a post-RA pseudo if control values are
supported, but cannot handle other types.
2024-11-05 13:59:14 -08:00
dlav-sc
97982a8c60
[RISCV][CFI] add function epilogue cfi information (#110810)
This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret

After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret

This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.
2024-11-06 00:20:21 +03:00
Brox Chen
e8644e3b47
[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)
Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
2024-11-05 16:12:49 -05:00
Matt Arsenault
0b40f97929
AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)
0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.
2024-11-05 12:50:44 -08:00
Kai Nacke
4a37799a48
[SystemZ][XRay] Implement XRay instrumentation for SystemZ (#113253)
Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET
into a small instruction sequence which calls into the XRay library.
2024-11-05 15:42:55 -05:00
Kai Nacke
8b659736f7
[SystemZ] Make lit test more specific (#115050)
The lit test fmuladd-soft-float.ll only specifies s390x as platform,
but the test is Linux specific, causing problems when run on z/OS.
This change updates the triple to fix this.
2024-11-05 15:29:32 -05:00
Craig Topper
e566ae8812 [RISCV][GISel] Remove s32 support for G_ABS on RV64.
I plan to remove s32 as a legal type to match SelectionDAG
and to remove i32 from the GPR regclass on RV64.
2024-11-05 12:05:30 -08:00
Matt Arsenault
ce067c5a3b AMDGPU: Rename test file 2024-11-05 10:42:12 -08:00
Finn Plummer
3cdac06708
[HLSL][SPIRV][DXIL] Implement dot4add_i8packed intrinsic (#113623)
- create a clang built-in in Builtins.td
- link dot4add_i8packed in hlsl_intrinsics.h
- add lowering to spirv backend through expansion of operation as OPSDot
is missing up to SPIRV 1.6 in SPIRVInstructionSelector.cpp
- add lowering to spirv backend using OpSDot in applicable SPIRV version
or if SPV_KHR_integer_dot_product is enabled
- add dot4add_i8packed intrinsic to IntrinsicsDirectX.td and mapping to
DXIL.td op Dot4AddI8Packed

- add tests for HLSL intrinsic lowering to dx/spv intrinsic in
dot4add_i8packed.hlsl
- add tests for sema checks in dot4add_i8packed-errors.hlsl
- add test of spir-v lowering in SPIRV/dot4add_i8packed.ll
- add test to dxil lowering in DirectX/dot4add_i8packed.ll
    
 Resolves #99220
2024-11-05 10:29:08 -08:00
Simon Pilgrim
61d5addd94 [X86] SimplifyDemandedBitsForTargetNode - call SimplifyMultipleUseDemandedBits on SSE shift-by-immediate nodes.
Attempt to peek through multiple-use SHLI/SRLI/SRAI source vectors.
2024-11-05 18:24:13 +00:00