50419 Commits

Author SHA1 Message Date
David Green
8a701024f3 [ARM] Lower i1 concat via MVETRUNC
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
2023-10-18 19:40:11 +01:00
Stanislav Mekhanoshin
84f398af74
[AMDGPU] Add missing test checks. NFC. (#69484) 2023-10-18 11:26:39 -07:00
Ilya Leoshkevich
8e810dc7d9
[SystemZ] Support builtin_{frame,return}_address() with non-zero argument (#69405)
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
2023-10-18 19:05:31 +02:00
Stanislav Mekhanoshin
47ed921985
[AMDGPU] Add legality check when folding short 64-bit literals (#69391)
We can only fold it if it can fit into 32-bit. I believe it did not
trigger yet because we do not select 64-bit literals generally.
2023-10-18 09:22:23 -07:00
Sirish Pande
28e4f97320
[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363)
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-10-18 08:43:29 -05:00
David Green
c060757bcc [ARM] Correct v2i1 concat extract types.
For two v2i1 concat into a v4i1, we cannot extract each i64 element as an i32.
This casts to a v4i32 instead and extracts the correct vector lanes.
2023-10-18 13:40:38 +01:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Jay Foad
104db26004
[AMDGPU] Fix image intrinsic optimizer on loads from different resources (#69355)
The image intrinsic optimizer pass was neglecting to check any arguments
of the load intrinsic after the VAddr arguments. For example multiple
loads from different resources should not have been combined but were,
because the pass was not checking the resource argument.
2023-10-18 11:08:01 +01:00
Paul Walker
675231eb09
[SVE ACLE] Allow default zero initialisation for svcount_t. (#69321)
This matches the behaviour of the other SVE ACLE types.
2023-10-18 10:40:07 +01:00
Amara Emerson
e93bddb287 [AArch64][GlobalISel] Precommit indexed sextload/zextload tests. 2023-10-18 00:23:20 -07:00
Shao-Ce SUN
f48dab5237
Add RV64 constraint to SRLIW (#69416)
Fixes #69408
2023-10-18 15:01:17 +08:00
Noah Goldstein
112e49b381 [DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
2023-10-18 01:16:55 -05:00
Noah Goldstein
0c2d28a448 [X86] Add tests for transform (icmp eq/ne (and X, C0), (shift X, C1)); NFC
Differential Revision: https://reviews.llvm.org/D152115
2023-10-18 01:16:55 -05:00
Pierre van Houtryve
c464fea779
[DAG] Constant fold FMAD (#69324)
This has very little effect on codegen in practice, but is a nice to
have I think.

See #68315
2023-10-18 07:46:24 +02:00
Kai Luo
b42738805a [PowerPC] Auto gen test checks for #69299. NFC. 2023-10-18 02:21:22 +00:00
Nitin John Raj
ae3ba725b7
[RISCV][GlobalISel] Select G_FRAME_INDEX (#68254)
This patch is a bandage to get G_FRAME_INDEX working. We could import
the SelectionDAG patterns for the ComplexPattern FrameAddrRegImm, and
perhaps we will do that in the future. For now we just select it as an
addition with 0.
2023-10-17 17:56:42 -07:00
Mircea Trofin
ab91e05e48 [mlgo] Fix tests post 760e7d0 2023-10-17 12:19:54 -07:00
Artem Belevich
b33723710f
[NVPTX] Fixed few more corner cases for v4i8 lowering. (#69263)
Fixes https://github.com/llvm/llvm-project/issues/69124
2023-10-17 11:06:11 -07:00
Stanislav Mekhanoshin
a22a1fe151
[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260)
This is a part of https://github.com/llvm/llvm-project/issues/67781.
Until we select more 64-bit move immediates the impact is minimal.
2023-10-17 10:53:22 -07:00
David Green
4266815f4d
[AArch64] Convert negative constant aarch64_neon_sshl to VASHR (#68918)
In replacing shifts by splat with constant shifts, we can handle
negative shifts by flipping the sign and using a VASHR or VLSHR.
2023-10-17 18:41:23 +01:00
David Green
658ed58de6 [AArch64] Add additional tests for fptosi/fptoui. NFC 2023-10-17 18:39:37 +01:00
akirchhoff-modular
4480e650b3
[YAMLParser] Improve plain scalar spec compliance (#68946)
The `YAMLParser.h` header file claims support for YAML 1.2 with a few
deviations, but our plain scalar parsing failed to parse some valid YAML
according to the spec. This change puts us more in compliance with the
YAML spec, now letting us parse plain scalars containing additional
special characters in cases where they are not ambiguous.
2023-10-17 11:28:14 -06:00
Guozhi Wei
760e7d00d1 [X86, Peephole] Enable FoldImmediate for X86
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.

Also enhanced peephole by deleting identical instructions after FoldImmediate.

Differential Revision: https://reviews.llvm.org/D151848
2023-10-17 16:22:42 +00:00
Vladislav Dzhidzhoev
abd0d5d262 Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG
This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes
isAArch64FrameOffsetLegal function recognize LD1R instructions.

Original PR: https://github.com/llvm/llvm-project/pull/66914
PR of the fix: https://github.com/llvm/llvm-project/pull/69003
2023-10-17 17:40:05 +02:00
Phoebe Wang
3d6e4160d5
[X86] Enable bfloat type support in inline assembly constraints (#68469)
Similar to FP16 but we don't have native scalar instruction support, so
limit it to vector types only.

Fixes #68149
2023-10-17 22:56:25 +08:00
Weining Lu
b2773d170c [LoongArch] Precommit a test for atomic cmpxchg optmization 2023-10-17 22:29:51 +08:00
Simon Pilgrim
be9bc54218 [X86] vselect.ll - add vXi8 select-by-constant tests with repeated/broadcastable shuffle mask 2023-10-17 11:34:08 +01:00
Momchil Velikov
bea3684944
[AArch64] Allow only LSL to be folded into addressing mode (#69235)
There was an error in decoding shift type, which permitted shift types
other than LSL to be (incorrectly) folded into the addressing mode of a
load/store instruction.
2023-10-17 11:30:14 +01:00
Zhaoxuan Jiang
041a786c78
[AArch64] Fix pairing different types of registers when computing CSRs. (#66642)
If a function has odd number of same type of registers to save, and the
calling convention also requires odd number of such type of CSRs, an FP
register would be accidentally marked as saved when producePairRegisters
returns true.

This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not
handling AArch64::NoRegister; actually this pass must be fixed along
with the register pairing so i can write a test for it.
2023-10-16 23:34:04 -07:00
Shao-Ce SUN
5a6ef95a1c
[RISCV][GISel] Add legalizer for G_UMAX, G_UMIN, G_SMAX, G_SMIN (#69150)
Similar to #67577, Lower G_UMAX, G_UMIN, G_SMAX, G_SMIN.
2023-10-17 10:36:24 +08:00
Jianjian Guan
b0eba8e209
[RISCV] Support STRICT_FP_ROUND and STRICT_FP_EXTEND when only have Zvfhmin (#68559)
This patch supports STRICT_FP_ROUND and STRICT_FP_EXTEND when we only
have Zvfhmin but no Zvfh.
2023-10-17 10:10:19 +08:00
Michael Maitland
c319c74146 [RISCV] Improve performCONCAT_VECTORCombine stride matching
If the load ptrs can be decomposed into a common (Base + Index) with a
common constant stride, then return the constant stride.
2023-10-16 16:45:26 -07:00
Michael Maitland
30ca258614 [RISCV] Pre-commit concat-vectors-constant-stride.ll
This patch commits tests that can be optimized by improving
performCONCAT_VECTORCombine to do a better job at decomposing the base
pointer and recognizing a constant offset.
2023-10-16 16:45:16 -07:00
Pierre van Houtryve
cc3d2533cc
[AMDGPU] Add i1 mul patterns (#67291)
i1 muls can sometimes happen after SCEV. They resulted in ISel failures
because we were missing the patterns for them.

Solves SWDEV-423354
2023-10-16 16:18:27 +02:00
Pierre van Houtryve
4d6fc88946
[AMDGPU] Add patterns for V_CMP_O/U (#69157)
Fixes SWDEV-427162
2023-10-16 13:07:56 +02:00
Nikita Popov
a72d88fb4f Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9.

This results in verifier failures during LTO, see #68929.
2023-10-16 12:17:24 +02:00
chuongg3
dad563e3c2
[AArch64][GlobalISel] Add legalization for G_VECREDUCE_MUL (#68398) 2023-10-16 11:02:03 +01:00
Phoebe Wang
0ddca87b79
[X86][FP16] Do not combine to ADDSUB if target doesn't support FP16 (#69109)
Fix crash when build code with `-mattr=f16c,fma` or `-mattr=avx512vl`.
2023-10-16 16:27:15 +08:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Freddy Ye
819ac45d1c
[X86] Add USER_MSR instructions. (#68944)
For more details about this instruction, please refer to the latest ISE
document:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
2023-10-16 10:12:53 +08:00
Craig Topper
0ae4622126 [RISCV][GISel] Move variadic-call.ll from call-lowering directory to irtranslator. NFC
Keeps it consistent with the other call tests.
2023-10-15 18:16:38 -07:00
Min-Yih Hsu
fd84b1a99d [M68k] Add new calling convention M68k_RTD
`M68k_RTD` is really similar to X86's stdcall, in which callee pops the
arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`.
This patch also improves how ExpandPseudo Pass handles popping stack at
function returns in the absent of the RTD instruction.

Differential Revision: https://reviews.llvm.org/D149864
2023-10-15 16:12:31 -07:00
Amara Emerson
1950507212 Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'"
This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac.

This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.
2023-10-15 14:16:37 -07:00
Markus Böck
0ad92c0cbb
[StatepointLowering] Take return attributes of gc.result into account (#68439)
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
2023-10-14 18:38:18 +02:00
David Green
5e1c2bf3e6 [AArch64][GlobalISel] Expand converage of FMA.
This moves the legalization of G_FMA to the action builder that can handle more
types. The existing arm64-vfloatintrinsics.ll has been removed as they are
covered in other test files.
2023-10-14 13:24:28 +01:00
David Green
a502dddfd0 [AArch64] Additional GISel test for FMA. NFC 2023-10-14 12:34:54 +01:00
LiqinWeng
64e7207ea5
[Test] Pre-submit tests for #68972 (#69040) 2023-10-14 12:18:43 +08:00
Craig Topper
3750558ee1
[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635)
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.
2023-10-13 20:34:45 -07:00
Amara Emerson
25d93f3f68 NFC: Precommit GISel checks for arm64-indexed-memory.ll 2023-10-13 16:51:39 -07:00
Amara Emerson
2f80dfc079 [GlobalISel][NFC] Add distinct CHECK/SDAG/GISEL run lines to test. 2023-10-13 16:21:52 -07:00