41828 Commits

Author SHA1 Message Date
Paweł Bylica
8f5b1d9e14
[test][DAGCombine] Add tests for cmp+add -> addcarry
Tests for https://reviews.llvm.org/D118037.
2022-01-25 22:17:09 +01:00
Dávid Bolvanský
b35ef580d8 [NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll 2022-01-25 21:25:12 +01:00
David Green
c415ff186d [AArch64] Add extra vecreduce.add tests, including extending reductions. NFC
This is all the reductions from i8 -> i64 with either sign or zero
extensions.
2022-01-25 18:10:09 +00:00
eopXD
b089e4072a [RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x
EEW=64 of mulh and its vairants requires V extension.

Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117947
2022-01-25 09:55:05 -08:00
Simon Pilgrim
ef0d90f682 [X86] Regenerate avx-vbroadcast.ll
Remove '' around mattr to stop update script crash and use X86 prefixes instead of X32
2022-01-25 16:23:22 +00:00
Sean Fertile
a2505bd063 [PowerPC][AIX] Override markFunctionEnd()
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.

Differential Revision: https://reviews.llvm.org/D117040
2022-01-25 10:08:53 -05:00
Simon Pilgrim
ea4b0489f5 [X86][AVX] Add PR47194 shuffle test case 2022-01-25 15:06:39 +00:00
Simon Pilgrim
fc15ab7b1b [X86] Add folded load tests to PR46809 tests 2022-01-25 12:37:15 +00:00
Danila Malyutin
153b1e3cba [AArch64] Add patterns for relaxed atomic ld/st into fp registers
Adds patterns to match integer loads/stores bitcasted to fp values

Fixes https://github.com/llvm/llvm-project/issues/52927

Differential Revision: https://reviews.llvm.org/D117573
2022-01-25 15:33:37 +03:00
Paul Walker
d95cf1f6cf [SVE] Enable ISD::ABDS/U ISel for scalable vectors.
NOTE: This patch also includes tests that highlight those cases
where the existing DAG combine doesn't yet work well for SVE.

Differential Revision: https://reviews.llvm.org/D117873
2022-01-25 12:14:53 +00:00
Bjorn Pettersson
109cc5adcc [DAGCombine] Fold SRA of a load into a narrower sign-extending load
An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.

Differential Revision: https://reviews.llvm.org/D116930
2022-01-25 12:14:48 +01:00
Fraser Cormack
7cb452bfde [SelectionDAG][VP] Add widening support for VP_MERGE
This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118030
2022-01-25 10:59:40 +00:00
Victor Perez
19d3dc6e22 [VP] Update CodeGen/RISCV/rvv/vpgather-sdnode.ll test 2022-01-25 10:49:05 +00:00
Fraser Cormack
5f5c5603ce [SelectionDAG][VP] Add splitting support for VP_MERGE
This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118032
2022-01-25 10:33:23 +00:00
Dávid Bolvanský
9fa6ad4c58 Revert "[NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll"
This reverts commit e2f8d28afba0a6545284ad3b54a4b7532c3253b6.
2022-01-25 11:28:26 +01:00
Dávid Bolvanský
e2f8d28afb [NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll 2022-01-25 11:13:20 +01:00
Victor Perez
2233befa5d [LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter
Split these nodes in a similar way as their masked versions.

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D117760
2022-01-25 10:08:07 +00:00
Simon Pilgrim
902184e6cc [X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y))
There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed

Improves codegen for Issue #53379
2022-01-25 00:24:06 +00:00
Simon Pilgrim
11bb4a1111 [X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq())
vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB.

If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.
2022-01-24 22:43:29 +00:00
Simon Pilgrim
8d298355ca [X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern.
Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type
2022-01-24 21:42:03 +00:00
Quinn Pham
6a028296fe [PowerPC] Emit warning when SP is clobbered by asm
This patch emits a warning when the stack pointer register (`R1`) is found in
the clobber list of an inline asm statement. Clobbering the stack pointer is
not supported.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D112073
2022-01-24 15:12:23 -06:00
Stanislav Mekhanoshin
bb1fe36977 [AMDGPU] Make v8i16/v8f16 legal
Differential Revision: https://reviews.llvm.org/D117721
2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin
c27f8fb968 [AMDGPU] Remove cndmask from readsExecAsData
Differential Revision: https://reviews.llvm.org/D117909
2022-01-24 11:24:47 -08:00
Sander de Smalen
11cea7e5ce [AArch64] NFC: Clarify and auto-generate some CodeGen tests.
* For ext-narrow-index.ll, move vscale_range attribute closer to the
  function definition, rather than through indirect #<num> attribute. This
  makes the test a bit easier to read.
* auto-generated CHECK lines for sve-cmp-select.ll and
  named-vector-shuffles-sve.ll.
* re-generated CHECK lines for tests that had a mention they were
  auto-generated, but where the CHECK lines were out of date.
2022-01-24 17:42:37 +00:00
Simon Pilgrim
6997f4d07f [X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379)
As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets

This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern

We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial

Fixes Issue #53379
2022-01-24 16:44:37 +00:00
Craig Topper
a43ed49f5b [DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)).
If the bitreverse gets expanded, it will introduce a new bswap. By
putting a bswap before the bitreverse, we can ensure it gets cancelled
out when this happens.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118012
2022-01-24 08:31:53 -08:00
Craig Topper
b8c7cdcc81 [SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x.
This can show up during when bitreverse is expanded to bswap and
swap of bits within a byte. If the input is already a bswap, we
should cancel them out before we further transform them in a way
that makes it harder to see the redundancy.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D118007
2022-01-24 08:17:46 -08:00
Craig Topper
cd2a9ff397 [RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv.
Add might be faster than shift. We can't do this earlier without
using a Freeze instruction.

This is the intrinsic version of D106689.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118013
2022-01-24 08:04:53 -08:00
Matt Arsenault
18aabae8e2 AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
2022-01-24 09:45:41 -05:00
Bjorn Pettersson
354b2c36ee Pre-commit test cases for (sra (load)) -> (sextload) folds. NFC
Add test case to show missing folds for (sra (load)) -> (sextload).

Differential Revision: https://reviews.llvm.org/D116929
2022-01-24 15:30:55 +01:00
Matt Arsenault
99e8e17313 Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction"
This reverts commit a97e20a3a8a58be751f023e610758310d5664562.
2022-01-24 09:26:52 -05:00
Paul Walker
34aedbe90d [AArch64] Regenerate CHECK lines for llvm/test/CodeGen/AArch64/sve2-int-mul.ll 2022-01-24 14:11:19 +00:00
Simon Pilgrim
0553f5e61a [X86] Add cmp-equality bool reductions
PR53379 test coverage
2022-01-24 14:05:10 +00:00
Simon Pilgrim
4436d4cd7c [X86] Rename cmp-with-zero bool reductions
Explicitly name them icmp0_* - I'm intending to add PR53379 test coverage shortly
2022-01-24 14:05:10 +00:00
Simon Pilgrim
f7079bf9ee [X86] Fix v8i8 -> v8i16 typo in bool reductions
We were supposed to be testing <8 x i16> reductions
2022-01-24 14:05:09 +00:00
Fraser Cormack
d42678b453 [RISCV] Add side-effect-free vsetvli intrinsics
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.

These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.

No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117910
2022-01-24 13:52:08 +00:00
Simon Pilgrim
0e70dd858e [X86] Add PR46249 test case showing poorly widened select predicate mask 2022-01-24 12:59:30 +00:00
SForeKeeper
70f83f3084 [RISCV] add support for zbkx subextension in MC layer.
This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer.
Instructions with same functionality and same encoding is defined in the bitmanip extension.
It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format.

[[ https://reviews.llvm.org/D94999 | D94999 ]] this is the patch that introduces xperm.* instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117889
2022-01-24 20:38:46 +08:00
Bjorn Pettersson
46cacdbb21 [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth
In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.

The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.

Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).

When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:

  t1 = load i96* ptr
  t2 = srl t1, 64
  t3 = truncate t2 to i64

When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.

In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into

  t3 = load i32* ptr+offset, zext to i64


Another case is shown in the X86/shift-folding.ll test case:

  t1 = load i32* ptr
  t2 = srl i32 t1, 8
  t3 = truncate t2 to i16

In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get

  t3 = load i16* ptr+offset

Differential Revision: https://reviews.llvm.org/D117406
2022-01-24 12:22:04 +01:00
Bjorn Pettersson
12a499eb00 Pre-commit test case for trunc+lshr+load folds
This is a pre-commit of test cases relevant for D117406.

@srl_load_narrowing1 is showing a pattern that could be folded into
a more narrow load.

@srl_load_narrowing2 is showing a similar pattern that happens to
be optimized already, but that happens in two steps (first triggering
a combine based on SRL and later another combine based on TRUNCATE).

Differential Revision: https://reviews.llvm.org/D117588
2022-01-24 12:22:03 +01:00
Fraser Cormack
af773a1818 [RISCV][VP] Lower VP_MERGE to RVV instructions
This patch adds lowering of the llvm.vp.merge.* intrinsic
(ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a
special pseudo form of vmerge which allows a tied merge operand,
allowing us to specify the tail elements as being equal to the "on
false" operand, using a tied-def constraint and a "tail undisturbed"
policy.

While this strategy allows us to often lower the intrinsic to just one
instruction, it may be less efficient in fixed-vector types as the
number of tail elements may extend far beyond the length of the fixed
vector. Another strategy could be to use a vmerge/vfmerge instruction
with an AVL equal to the length of the vector type, and manipulate the
condition operand such that mask elements greater than the operation's
EVL are false.

I've also observed inefficient codegen in which our 'VF' patterns don't
match raw floating-point SPLAT_VECTORs, which occur in scalable-vector
code.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117561
2022-01-24 11:05:05 +00:00
Fraser Cormack
e7926e8d97 [RISCV] Match VF variants for masked VFRDIV/VFRSUB
This patch follows up on D117697 to help the simple binary operations
behave similarly in the presence of masks.

It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics,
now that VFRDIV and VFRSUB are consistently matched with a LHS splat for
masked and unmasked variants.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117783
2022-01-24 10:59:43 +00:00
Abinav Puthan Purayil
912af6b570 [AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests. 2022-01-24 16:30:40 +05:30
Jay Foad
aa50b93e7c [AMDGPU][GlobalISel] Add more sign/zero/any-extension tests
Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.
2022-01-24 10:16:51 +00:00
Jay Foad
906ebd5830 [AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir 2022-01-24 10:16:51 +00:00
Nikita Popov
0d1308a7b7 [AArch64][GlobalISel] Support returned argument with multiple registers
The call lowering code assumed that a returned argument could only
consist of one register. Pass an ArrayRef<Register> instead of
Register to make sure that all parts get assigned.

Fixes https://github.com/llvm/llvm-project/issues/53315.

Differential Revision: https://reviews.llvm.org/D117866
2022-01-24 10:55:28 +01:00
Nikita Popov
e7c9a6cae0 [SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243)
EmitSchedule() shouldn't be touching instructions after the provided
insertion point. The change introduced in D83561 performs a scan to
the end of the block, and thus may move unrelated instructions. In
particular, this ends up moving instructions that have been produced
by FastISel and will later be deleted. Moving them means that more
instructions than intended are removed.

Fix this by stopping the iteration when the insertion point is
reached.

Fixes https://github.com/llvm/llvm-project/issues/53243.

Differential Revision: https://reviews.llvm.org/D117489
2022-01-24 10:50:49 +01:00
Sander de Smalen
4f8fdf7827 [ISEL] Canonicalise constant splats to RHS.
SelectionDAG::getNode() canonicalises constants to the RHS if the
operation is commutative, but it doesn't do so for constant splat
vectors. Doing this early helps making certain folds on vector types,
simplifying the code required for target DAGCombines that are enabled
before Type legalization.

Somewhat to my surprise, DAGCombine doesn't seem to traverse the
DAG in a post-order DFS, so at the time of doing some custom fold where
the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the
constant splat to the RHS.

This patch leads to a few improvements, but also a few  minor regressions,
which I traced down to D46492. When I tried reverting this change to see
if the changes were still necessary, I ran into some segfaults. Not sure
if there is some latent bug there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117794
2022-01-24 09:38:36 +00:00
Chenbing.Zheng
9aaa74aeef [RISCV] Add patterns of SET[U]LT_VI for STECC forms
This patch optmizes "li a0, 5
                     vmsgt[u].vx v10, v8, a0"
                 -> "vmsgt[u].vi v10, v8, 5"

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118014
2022-01-24 08:50:49 +00:00
jacquesguan
ba16e3c31f [RISCV] Decouple Zve* extensions and the V extension.
According to the spec, there are some difference between V and Zve64d. For example, the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64*, but V extension does support these instructions. So we should decouple Zve* extensions and the V extension.

Differential Revision: https://reviews.llvm.org/D117854
2022-01-24 14:55:21 +08:00