52796 Commits

Author SHA1 Message Date
Vladislav Dzhidzhoev
abd0d5d262 Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG
This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes
isAArch64FrameOffsetLegal function recognize LD1R instructions.

Original PR: https://github.com/llvm/llvm-project/pull/66914
PR of the fix: https://github.com/llvm/llvm-project/pull/69003
2023-10-17 17:40:05 +02:00
Phoebe Wang
3d6e4160d5
[X86] Enable bfloat type support in inline assembly constraints (#68469)
Similar to FP16 but we don't have native scalar instruction support, so
limit it to vector types only.

Fixes #68149
2023-10-17 22:56:25 +08:00
Weining Lu
b2773d170c [LoongArch] Precommit a test for atomic cmpxchg optmization 2023-10-17 22:29:51 +08:00
Simon Pilgrim
be9bc54218 [X86] vselect.ll - add vXi8 select-by-constant tests with repeated/broadcastable shuffle mask 2023-10-17 11:34:08 +01:00
Momchil Velikov
bea3684944
[AArch64] Allow only LSL to be folded into addressing mode (#69235)
There was an error in decoding shift type, which permitted shift types
other than LSL to be (incorrectly) folded into the addressing mode of a
load/store instruction.
2023-10-17 11:30:14 +01:00
Zhaoxuan Jiang
041a786c78
[AArch64] Fix pairing different types of registers when computing CSRs. (#66642)
If a function has odd number of same type of registers to save, and the
calling convention also requires odd number of such type of CSRs, an FP
register would be accidentally marked as saved when producePairRegisters
returns true.

This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not
handling AArch64::NoRegister; actually this pass must be fixed along
with the register pairing so i can write a test for it.
2023-10-16 23:34:04 -07:00
Shao-Ce SUN
5a6ef95a1c
[RISCV][GISel] Add legalizer for G_UMAX, G_UMIN, G_SMAX, G_SMIN (#69150)
Similar to #67577, Lower G_UMAX, G_UMIN, G_SMAX, G_SMIN.
2023-10-17 10:36:24 +08:00
Jianjian Guan
b0eba8e209
[RISCV] Support STRICT_FP_ROUND and STRICT_FP_EXTEND when only have Zvfhmin (#68559)
This patch supports STRICT_FP_ROUND and STRICT_FP_EXTEND when we only
have Zvfhmin but no Zvfh.
2023-10-17 10:10:19 +08:00
Michael Maitland
c319c74146 [RISCV] Improve performCONCAT_VECTORCombine stride matching
If the load ptrs can be decomposed into a common (Base + Index) with a
common constant stride, then return the constant stride.
2023-10-16 16:45:26 -07:00
Michael Maitland
30ca258614 [RISCV] Pre-commit concat-vectors-constant-stride.ll
This patch commits tests that can be optimized by improving
performCONCAT_VECTORCombine to do a better job at decomposing the base
pointer and recognizing a constant offset.
2023-10-16 16:45:16 -07:00
Pierre van Houtryve
cc3d2533cc
[AMDGPU] Add i1 mul patterns (#67291)
i1 muls can sometimes happen after SCEV. They resulted in ISel failures
because we were missing the patterns for them.

Solves SWDEV-423354
2023-10-16 16:18:27 +02:00
Pierre van Houtryve
4d6fc88946
[AMDGPU] Add patterns for V_CMP_O/U (#69157)
Fixes SWDEV-427162
2023-10-16 13:07:56 +02:00
Nikita Popov
a72d88fb4f Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9.

This results in verifier failures during LTO, see #68929.
2023-10-16 12:17:24 +02:00
chuongg3
dad563e3c2
[AArch64][GlobalISel] Add legalization for G_VECREDUCE_MUL (#68398) 2023-10-16 11:02:03 +01:00
Phoebe Wang
0ddca87b79
[X86][FP16] Do not combine to ADDSUB if target doesn't support FP16 (#69109)
Fix crash when build code with `-mattr=f16c,fma` or `-mattr=avx512vl`.
2023-10-16 16:27:15 +08:00
Pierre van Houtryve
544d91280c
[AMDGPU] Remove Code Object V3 (#67118)
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.

- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
2023-10-16 08:21:48 +02:00
Freddy Ye
819ac45d1c
[X86] Add USER_MSR instructions. (#68944)
For more details about this instruction, please refer to the latest ISE
document:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
2023-10-16 10:12:53 +08:00
Craig Topper
0ae4622126 [RISCV][GISel] Move variadic-call.ll from call-lowering directory to irtranslator. NFC
Keeps it consistent with the other call tests.
2023-10-15 18:16:38 -07:00
Min-Yih Hsu
fd84b1a99d [M68k] Add new calling convention M68k_RTD
`M68k_RTD` is really similar to X86's stdcall, in which callee pops the
arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`.
This patch also improves how ExpandPseudo Pass handles popping stack at
function returns in the absent of the RTD instruction.

Differential Revision: https://reviews.llvm.org/D149864
2023-10-15 16:12:31 -07:00
Amara Emerson
1950507212 Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'"
This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac.

This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.
2023-10-15 14:16:37 -07:00
Markus Böck
0ad92c0cbb
[StatepointLowering] Take return attributes of gc.result into account (#68439)
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
2023-10-14 18:38:18 +02:00
David Green
5e1c2bf3e6 [AArch64][GlobalISel] Expand converage of FMA.
This moves the legalization of G_FMA to the action builder that can handle more
types. The existing arm64-vfloatintrinsics.ll has been removed as they are
covered in other test files.
2023-10-14 13:24:28 +01:00
David Green
a502dddfd0 [AArch64] Additional GISel test for FMA. NFC 2023-10-14 12:34:54 +01:00
LiqinWeng
64e7207ea5
[Test] Pre-submit tests for #68972 (#69040) 2023-10-14 12:18:43 +08:00
Craig Topper
3750558ee1
[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635)
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.
2023-10-13 20:34:45 -07:00
Amara Emerson
25d93f3f68 NFC: Precommit GISel checks for arm64-indexed-memory.ll 2023-10-13 16:51:39 -07:00
Amara Emerson
2f80dfc079 [GlobalISel][NFC] Add distinct CHECK/SDAG/GISEL run lines to test. 2023-10-13 16:21:52 -07:00
Yingwei Zheng
53c81a8c16
[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015)
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.

Fixes #68855.
2023-10-14 06:38:17 +08:00
Momchil Velikov
dbb9faedec Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'
This re-applies commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481, which
was reverted by 8abb2ace888bdd04a1bdb4ac2f2fc25d57a5760a.

The issue was fixed by 7510f32f906ab4e583542eae2611b020f88629af
2023-10-13 12:14:22 +01:00
Maurice Heumann
187e02fa2d
[CodeGenPrepare] Check types when unmerging GEPs across indirect branches (#68587)
The optimization in CodeGenPrepare, where GEPs are unmerged across
indirect branches must respect the types of both GEPs and their sizes
when adjusting the indices.

The sample here shows the bug:

https://godbolt.org/z/8e9o5sYPP

The value `%elementValuePtr` addresses the second field of the
`%struct.Blub`. It is therefore a GEP with index 1 and type i8.
The value `%nextArrayElement` addresses the next array element. It is
therefore a GEP with index 1 and type `%struct.Blub`.

Both values point to completely different addresses, even if the indices
are the same, due to the types being different.
However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast
from `%elementValuePtr`, meaning both were treated as equal.

The cause for this is that the unmerging optimization does not take
types into consideration.
It sees both GEPs have `%currentArrayElement` as source operand and
therefore tries to rewrite `%nextArrayElement` in terms of
`%elementValuePtr`.
It changes the index to the difference of the two GEPs. As both indices
are `1`, the difference is `0`. As the indices are `0` the GEP is later
replaced with a simple bitcast in CodeGenPrepare.

Before adjusting the indices, the types of the GEPs would have to be
aligned and the indices scaled accordingly for the optimization to be
correct.
Due to the size of the struct being `16` and the `%elementValuePtr`
pointing to offset `1`, the correct index for the unmerged
`%nextArrayElement` would be 15.

I assume this bug emerged from the opaque pointer change as GEPs like
`%elementValuePtr` that access the struct field based of type i8 did not
naturally occur before.

In light of future migration to ptradd, simply not performing the
optimization if the types mismatch should be sufficient.
2023-10-13 09:47:47 +02:00
Kai Luo
3104681686
[PowerPC][Atomics] Remove redundant block to clear reservation (#68430)
This PR is following what https://reviews.llvm.org/D134783 does for
quardword CAS.
2023-10-13 10:59:27 +08:00
john-brawn-arm
a574ef6176
[AArch64] Fix incorrect big-endian spill in foldMemoryOperandImpl (#65601)
When an sreg sub-register of a q register was spilled,
AArch64InstrInfo::foldMemoryOperandImpl would emit a spill of a d
register, which gives the wrong result when the target is big-endian as
the following q register fill will put the value in the top half.

Fix this by greatly simplifying the existing code for widening the spill
to only handle wzr to xzr widening, as the default result we get if the
function returns nullptr is already that a widened spill will be
emitted.
2023-10-12 16:10:28 +01:00
Yusra Syeda
6cf41ada44
[SystemZ][z/OS] Add vararg support to z/OS (#68834)
This PR adds vararg support to z/OS and updates the call-zos-vararg.ll
lit test.

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-10-12 12:42:55 +02:00
Nikita Popov
127ed9ae26
[PowerPC] Use zext instead of anyext in custom and combine (#68784)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.

Emit `zext(and(x,c))` instead.

Fixes https://github.com/llvm/llvm-project/issues/68783.
2023-10-12 09:32:17 +02:00
WANG Xuerui
956482de13 [LoongArch] Support finer-grained DBAR hints for LA664+ (#68787)
These are treated as DBAR 0 on older uarchs, so we can start to
unconditionally emit the new hints right away.

Co-authored-by: WANG Rui <wangrui@loongson.cn>
2023-10-12 15:04:51 +08:00
Rahman Lavaee
28b9126879
[BasicBlockSections] Introduce the path cloning profile format to BasicBlockSectionsProfileReader. (#67214)
Following up on prior RFC
(https://lists.llvm.org/pipermail/llvm-dev/2020-September/145357.html)
we can now improve above our highly-optimized basic-block-sections
binary (e.g., 2% for clang) by applying path cloning. Cloning can
improve performance by reducing taken branches.

This patch prepares the profile format for applying cloning actions.

The basic block cloning profile format extends the basic block sections
profile in two ways.

1. Specifies the cloning paths with a 'p' specifier. For example, `p 1 4
5` specifies that blocks with BB ids 4 and 5 must be cloned along the
edge 1 --> 4.
2. For each cloned block, it will appear in the cluster info as
`<bb_id>.<clone_id>` where `clone_id` is the id associated with this
clone.

For example, the following profile specifies one cloned block (2) and
determines its cluster position as well.
```
f foo
p 1 2
c 0 1 2.1 3 2 5
```

This patch keeps backward-compatibility (retains the behavior for old
profile formats). This feature is only introduced for profile version >=
1.
2023-10-11 22:47:13 -07:00
weiguozhi
b6043f9867
[RA] Disable split around hint register if optimize for size (#68619)
Split a virtual register with hint may generate COPY instructions in
multiple cold basic blocks, and increase code size. So disable this
split when the function is optimized for size.
2023-10-11 14:57:15 -07:00
Harald van Dijk
8d520973b0
[X86] Use indirect addressing for high 2GB of x32 address space
Instructions that take immediate addresses sign-extend their operands, so cannot be used when we actually need zero extension. Use indirect addressing to avoid problems.

The functions in the test are a modified versions of the functions by the same names in large-constants.ll, with i64 types changed to i32.

Fixes #55061

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124406
2023-10-11 19:20:36 +01:00
chuongg3
d88d9834e9
[AArch64][GlobalISel] Support more types for TRUNC (#66927)
G_TRUNC will get lowered into trunc(merge(trunc(unmerge),
trunc(unmerge))) if the source is larger than 128 bits or the truncation
is more than half of the current bit size.

Now mirrors ZEXT/SEXT code more closely for vector types.
2023-10-11 16:05:25 +01:00
Anatoly Trosinenko
1d2b558265 [AArch64][PAC] Check authenticated LR value during tail call
When performing a tail call, check the value of LR register after
authentication to prevent the callee from signing and spilling an
untrusted value. This commit implements a few variants of check,
more can be added later.

If it is safe to assume that executable pages are always readable,
LR can be checked just by dereferencing the LR value via LDR.

As an alternative, LR can be checked as follows:

    ; lowered AUT* instruction
    ; <some variant of check that LR contains a valid address>
    b.cond break_block
  ret_block:
    ; lowered TCRETURN
  break_block:
    brk 0xc471

As the existing methods either break the compatibility with execute-only
memory mappings or can degrade the performance, they are disabled by
default and can be explicitly enabled with a command line option.

Individual subtargets can opt-in to use one of the available methods
by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod().

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D156716
2023-10-11 17:38:17 +03:00
Stephen Thomas
720be6c535
[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684)
The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have
encodings for
non-value-returning forms, although actually using them isn't supported
by
hardware. However, these encodings aren't supported by the backend,
meaning
that they can't even be assembled or disassembled.

Add support for the non-returning encodings, but gate actually using
them
in instruction selection behind a new feature
FeatureAtomicCSubNoRtnInsts,
which no target uses. This does allow the non-returning instructions to
be
tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover
them.
The feature does not gate assembling or disassembling them, this is now
not an error, and encoding and decoding tests have been adapted
accordingly.
2023-10-11 11:37:27 +01:00
hev
37b93f07cd
[LoongArch] Add some atomic tests (#68766) 2023-10-11 18:28:04 +08:00
Nikita Popov
0ead1faef0 [PowerPC] Add test for #68783 (NFC) 2023-10-11 12:15:26 +02:00
Harald van Dijk
a21abc782a
[X86] Align i128 to 16 bytes in x86 datalayouts
This is an attempt at rebooting https://reviews.llvm.org/D28990

I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.

This should fix PR46320.

Reviewed By: echristo, rnk, tmgross

Differential Revision: https://reviews.llvm.org/D86310
2023-10-11 10:23:38 +01:00
Piotr Sobczak
2888fa4313 [AMDGPU] Update test remat-smrd.mir
Update test/CodeGen/AMDGPU/remat-smrd.mir:

* Convert a negative case of non-dereferenceable invariant load to positive one.
* Add new cases for subreg.
2023-10-11 10:19:22 +02:00
Sacha Coppey
776889bc1c [RISCV] Add Stackmap/Statepoint/Patchpoint support without targets
This patch adds stackmap support for RISC-V without targets (i.e. the nop patchable forms).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123496
2023-10-11 09:18:55 +05:30
Evgenii Kudriashov
255f826d6f
[X86] Fix value-extending/truncating loads and stores of __ptr32/__ptr64 pointers (#67168)
The value extension and truncation were missed during casting __ptr32/__ptr64
pointers to the default address space.

Closes #66873
2023-10-11 05:19:36 +02:00
Wang Pengcheng
f3c92a06b9
[RISCV] Make PostRAScheduler a target feature (#68692)
This is what AArch64 has done in https://reviews.llvm.org/D20762.

Tests are added in macro fusion tests, which uncover a bug that
DAG mutations don't take effect.
2023-10-11 10:51:03 +08:00
hev
203ba238e3
[LoongArch] Improve codegen for atomic ops (#67391)
This PR improves memory barriers generated by atomic operations.

Memory barrier semantics of LL/SC:
```
LL: <memory-barrier> + <load-exclusive>
SC: <store-conditional> + <memory-barrier>
```

Changes:
* Remove unnecessary memory barriers before LL and between LL/SC.
* Fix acquire semantics. (If the SC instruction is not executed, then
the guarantee of acquiring semantics cannot be ensured. Therefore, an
acquire barrier needs to be generated when memory ordering includes an
acquire operation.)
2023-10-11 10:24:18 +08:00
Philip Reames
3a6cc52fe3 Revert "[RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector (#65997)"
This reverts commit b5ff71e261b637ab7088fb5c3314bf71d6e01da7.  As described in
https://github.com/llvm/llvm-project/issues/68730, this appears to have exposed
an existing liveness issue.  Revert to green until we can figure out how to
address the root cause.

Note: This was not a clean revert.  I ended up doing it by hand.
2023-10-10 15:13:57 -07:00