52796 Commits

Author SHA1 Message Date
Evgenii Kudriashov
c13e310fa7 [DAGCombine] Support truncated constants for fptosi.sat combining
Closes https://github.com/llvm/llvm-project/issues/56779

Reviewed By: RKSimon, dmgreen

Differential Revision: https://reviews.llvm.org/D152926
2023-07-28 18:54:39 +03:00
Mikhail Gudim
cb15e657b5 [RISCV] A test for conditional binary ops.
Consider the following pattern:

```
%binop_ = binop %x, %y
%select_ = select %c, %binop_, %x
```

If there is an identity `%identity` operand for `binop`, it is possible to transform
the above code to:
```
%opearand = select %c, %y, %identity
%result = binop %x, %operand
```
This transformation is profitable when `%identity` is all zeroes or
ones.

This patch commits a test for such patterns.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155481
2023-07-28 11:08:02 -04:00
melonedo
afb9c04a5a [RISCV] Add support for XCVbi extension in CV32E40P
Implement XCVbi intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @paolos, @simoncook, @xmj.

bf2ad26b4ff856aab9a62ad168e6bdefeedc374f originally commited.
e4777dc4b9cb371971523cc603e1b8a5c7255e7e reverted due to test failures caused by a merge conflict marker in llvm/test/CodeGen/RISCV/attributes that was accidentally checked in.
This commit removed the conflict marker and recommitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154412
2023-07-28 21:54:10 +08:00
Ben Shi
7ca6b76934 [CSKY] Optimize conditional branches with float comparison
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D155424
2023-07-28 21:23:21 +08:00
melonedo
e4777dc4b9 Revert "[RISCV] Add support for XCVbi extension in CV32E40P"
This reverts commit bf2ad26b4ff856aab9a62ad168e6bdefeedc374f as it
checked in merge conflict markers.
2023-07-28 19:28:20 +08:00
Tuan Chuong Goh
8eeabf674c [AArch64] Add funnel shift lowering for SelectionDAG
Consider FSHR legal if shift amount is constant
Lower FSHL to FSHR if shift amount is constant

Differential Revision: https://reviews.llvm.org/D155565
2023-07-28 11:56:25 +01:00
Kevin Athey
d5f496a2cb Revert "[RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF"
This reverts commit 9cf675923afa73a3dbe575803ebbbe9146701df8.

Breaking sanitzer buildbots: asan and fast
https://lab.llvm.org/buildbot/#/builders/168/builds/14824
https://lab.llvm.org/buildbot/#/builders/5/builds/35419
2023-07-28 12:23:18 +02:00
John Brawn
8336d38be9 [ARM] Correctly handle combining segmented stacks with execute-only
Using segmented stacks with execute-only mostly works, but we need to
use the correct movi32 opcode in 6-M, and there's one place where for
thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally
used which needed to be fixed.

Differential Revision: https://reviews.llvm.org/D156339
2023-07-28 10:37:40 +01:00
melonedo
bf2ad26b4f [RISCV] Add support for XCVbi extension in CV32E40P
Implement XCVbi intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @PaoloS, @simoncook, @xmj.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154412
2023-07-28 17:36:57 +08:00
melonedo
3c0604b224 [RISCV] Add support for XCVsimd extension in CV32E40P
Implement XCVsimd intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @PaoloS, @simoncook, @xmj.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153721
2023-07-28 16:52:32 +08:00
Jun Sha (Joshua)
e56bf13317 [RISCV] Remove some instructions from Zvfbfwma by implying Zfbfmin according to the latest spec
According to the latest spec, Zvfbfwma requires Zvfbfmin and Zvfbfmin requires Zfbfmin, with FLH/FSH/FMV.H.X/HMV.X.H removed from Zvfbfwma.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155916
2023-07-28 15:52:03 +08:00
Freddy Ye
c9d92e6638 [X86] Support -march=arrowlake,arrowlake-s,lunarlake
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D156239
2023-07-28 15:05:54 +08:00
Qihan Cai
092e60a3fc [RISCV] Add support for XCValu extension in CV32E40P
Implement XCValu intrinsics for CV32E40P according to the specification.

This is a commit of the patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, Nandni Jamnadas, Serkan Muhcu, @jeremybennett, @lewis-revill, @liaolucy, @simoncook, @xmj

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153748
2023-07-28 11:37:31 +08:00
Fangrui Song
845d83d85f [test] Add --show-all-symbols to some llvm-objdump -d commands
llvm-objdump -d will be changed to not display mapping symbols by
default (D156190).
Add --show-all-symbols to make the intent clearer and prevent test
adjustment with the new behavior.
2023-07-27 19:33:51 -07:00
Jun Sha (Joshua)
2b6df4a336 [RISCV] Add codegen support for bf16 vector
This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156287
2023-07-28 09:54:23 +08:00
Philip Reames
9cf675923a [RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF
The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.

I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.

As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)

Differential Revision: https://reviews.llvm.org/D156477
2023-07-27 16:25:56 -07:00
Jon Roelofs
3e0cdf332f
Upgrade a rdar://5907648 link to a github issue
https://github.com/llvm/llvm-project/issues/64174
2023-07-27 13:37:48 -07:00
Hiroshi Yamauchi
a90228b911 [AArch64][Windows] Fix the slot offset of the swift async context register.
This fixes a code gen issue where savings the swift async context
register (x22) accidentally overwrites the saved value of another
callee-saved register, corrupts its value and causes a crash.

Differential Revision: https://reviews.llvm.org/D156391
2023-07-27 12:32:43 -07:00
Craig Topper
a81e1f0fb2 [RISCV] When using vror.vi for left rotate, mask the inverted immediate to SEW.
This makes the assembly more readable.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D156348
2023-07-27 12:16:21 -07:00
Matt Arsenault
e5f04830c5 ARM: Use explicit triple in a test to avoid inheriting windows from the host 2023-07-27 13:18:50 -04:00
Daniel Hoekwater
3435a6a0bb [AArch64] [XRay] Account for XRay event instrs in Branch Relaxation
PATCHABLE_TYPED_EVENT_CALL and PATCHABLE_EVENT_CALL are pseudo
instructions that expand to XRay sleds, so getInstSizeInBytes
should reflect the size of the sleds, not the pseudo-instructions.

Differential Revision: https://reviews.llvm.org/D156272
2023-07-27 17:10:58 +00:00
Matt Arsenault
95e5a461f5 AMDGPU: Always custom lower extract_subvector
The patterns were ripped out in
a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be
custom lowered. I absolutely hate how difficult it is to write tests
for these, I have no doubt there are more of these hidden.

Fixes #64142
2023-07-27 08:46:44 -04:00
Jay Foad
2dcf051259 [CodeGen] Store call frame size in MachineBasicBlock
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156113
2023-07-27 10:32:00 +01:00
David Green
beabfe747b [AArch64] Sink splat to fmlal intrinsics
Similar to other neon index instructions, it is beneficial to sink the splat to
the instruction for fmlal in order for it to create the index.
2023-07-27 10:07:01 +01:00
David Green
509cb33469 [AArch64] Correct the regtype of indexed fmlal
The indexed fmlal should use a low numbered register for the index operand,
which this fixes by making it V128_lo.

Fixes 64104

Differential Revision: https://reviews.llvm.org/D156296
2023-07-27 08:27:03 +01:00
David Green
e012c5cfac [AArch64] Add test showing incorrect register usage of FMLAL. NFC
See D156296
2023-07-27 07:39:10 +01:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Pravin Jagtap
1462053608 [AMDGPU] Propagate constants for llvm.amdgcn.wave.reduce.umin/umax
Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156077
2023-07-26 23:46:01 -04:00
Nitin John Raj
474cf4feb7 [RISCV][GlobalISel] Test legalization of binary logical instructions with wider types
Without any additional tweaking, we can successfully legalize for wider
types (i64, i96 for rv32; i128, i192 for rv64) that are integer
multiples of XLen.

Reviewed By: arsenm, craig.topper

Differential Revision: https://reviews.llvm.org/D155639
2023-07-26 15:37:13 -07:00
Matthew Voss
380dbfd8ca Revert "Reapply [IR] Mark and/or constant expressions as undesirable"
This reverts commit 0cab8d20417c0e2ccc1ffc5505e080126f5de8e6.

Reverted due to an LTO crash. I've put a reduced test case here:
https://github.com/llvm/llvm-project/issues/64114
2023-07-26 12:54:07 -07:00
Luke Lau
33a93a41df [RISCV] Add SDNode patterns for vwsll.[vv,vx,vi]
This reuses the patterns introduced to help lower vnsr[a,l].vx in D155698.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155936
2023-07-26 20:26:35 +01:00
Luke Lau
ce8f094da8 [RISCV] Add patterns for vnsrl.vx where shift amount is truncated
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.

The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155928
2023-07-26 20:26:32 +01:00
Luke Lau
7c652feb95 [RISCV] Add tests for vnsrl.vx where shift amount is truncated
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155927
2023-07-26 20:26:27 +01:00
Kevin P. Neal
33e25cdd48 [FPEnv][X86] Correct strictfp tests.
Recommit only the tests that look good this time.

Correct X86 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.

Test changes verified with D146845.
2023-07-26 13:35:58 -04:00
Craig Topper
e28307e93a [RISCV] Handle seteq/setne conditions for CZERO_NEZ/CZERO_EQZ during isel.
This removes selectSETCC and adds isel patterns for seteq/setne
conditions.

This removes the duplication of selectSETCC between lowering and
isel. This also gets some cases in xaluo.ll that we missed previously.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D156250
2023-07-26 10:06:08 -07:00
Yonghong Song
6c412b6c6f [BPF] Add a few new insns under cpu=v4
In [1], a few new insns are proposed to expand BPF ISA to
  . fixing the limitation of existing insn (e.g., 16bit jmp offset)
  . adding new insns which may improve code quality
    (sign_ext_ld, sign_ext_mov, st)
  . feature complete (sdiv, smod)
  . better user experience (bswap)

This patch implemented insn encoding for
  . sign-extended load
  . sign-extended mov
  . sdiv/smod
  . bswap insns
  . unconditional jump with 32bit offset

The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.

To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
        13:       06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>

Eduard is working on to add 'st' insn to cpu=v4.

A list of llc flags:
  disable-ldsx, disable-movsx, disable-bswap,
  disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
  llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.

References:
  [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/

Differential Revision: https://reviews.llvm.org/D144829
2023-07-26 08:37:30 -07:00
Simon Pilgrim
4aa06ba39e [X86] Cleanup vector-trunc-* test prefixes
Add missing SSE2-SSSE3 common prefix to vector-trunc-ssat.ll
2023-07-26 16:27:56 +01:00
Alexander Kornienko
0def4e6b0f Revert "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre"
This reverts commit b0093e13fcfdd4eea5bbd7ae57d3d1b82f4135c3 due to a miscompile
under MSan. See https://reviews.llvm.org/D152407#4533478 for more details.

Reviewed By: asmok-g

Differential Revision: https://reviews.llvm.org/D156328
2023-07-26 16:22:24 +02:00
Kevin P. Neal
3a5f8c3af8 Revert "[FPEnv][X86] Correct strictfp tests."
This reverts commit d6857060a3b7428d1e9319d85fcef44e4b6b8db7.

I'm getting build bot failures due to i128-fpconv-win64-strict.ll.
2023-07-26 09:18:32 -04:00
Kevin P. Neal
f57fb82e0f [FPEnv][AArch64] Correct strictfp tests.
Correct AArch64 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions.  I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.

I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.

Test changes verified with D146845.
2023-07-26 09:14:25 -04:00
Kevin P. Neal
7e0e8b7ace [FPEnv][PowerPC] Correct strictfp tests.
Correct PowerPC strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions.  I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.

I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.

Test changes verified with D146845.
2023-07-26 09:12:29 -04:00
Kevin P. Neal
5ad2760ad9 [FPEnv][RISC-V] Correct a strictfp test.
Correct a RISC-V strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

All function calls in a strictfp function require the strictfp attribute.

Test changes verified with D146845.
2023-07-26 09:10:19 -04:00
Kevin P. Neal
58ad5699e7 [FPEnv][SystemZ] Correct strictfp tests.
Correct a SystemZ strictfp test to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test, like many others, just needed the function definition corrected.

Test changes verified with D146845.
2023-07-26 09:08:46 -04:00
Kevin P. Neal
d6857060a3 [FPEnv][X86] Correct strictfp tests.
Correct X86 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.

Test changes verified with D146845.
2023-07-26 09:07:03 -04:00
pvanhout
a8aabba587 [AMDGPU] Fix PromoteAlloca Subvector Stores for Single Elements
The previous condition was incorrect in some cases, like storing <2 x i32>
into a double. If IndexVal was >0, we ended up never storing anything.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D156308
2023-07-26 13:21:21 +02:00
pvanhout
6a767fbc36 [AMDGPU] Precommit tests for D156308
Also includes another testcase that's unrelated, it's just a sanity check.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156309
2023-07-26 13:21:20 +02:00
Jolanta Jensen
c67e443895 [AArch64][NFC] Expand coverage of ReplaceWithVeclib testing using SLEEF vector library
This patch expands testing coverage for ReplaceWithVeclib pass
when SLEEF vector library is used. It adds testing for all LLVM
intrinsics which correspond to math functions from libm.

llrint, llround and lrint are not included as currently
IR verifier pass does not allow to use vector types with them.

Differential Revision: https://reviews.llvm.org/D155623
2023-07-26 08:57:56 +00:00
Jim Lin
2f726c22ce [RISCV] Merge rv32/rv64 vector slideup and slidedown intrinsic tests that have the same content. NFC. 2023-07-26 13:13:55 +08:00
esmeyi
e83b8a5e71 [XCOFF] Enable available_externally linkage for functions.
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.

Reviewed By: hubert.reinterpretcast, shchenz

Differential Revision: https://reviews.llvm.org/D156213

Signed-off-by: Esme Yi <esme.yi@ibm.com>
2023-07-25 22:47:11 -04:00
Weining Lu
c56514f21b Reland "[LoongArch] Support -march=native and -mtune="
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.

D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.

A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.

`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.

And these two preprocessor macros are defined:
- __loongarch_arch
- __loongarch_tune

[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc

Reviewed By: xen0n, wangleiat

Differential Revision: https://reviews.llvm.org/D155824
2023-07-26 10:26:38 +08:00