6291 Commits

Author SHA1 Message Date
Florian Hahn
596a8d07d4
[AArch64] Add additional reassociation test.
Add a test where the reassociation candidates are split across 2 blocks.
2023-01-09 16:38:19 +00:00
Florian Hahn
70520e2f1c
[AArch64] Add test showing reassociation potential.
Add a test case where some ops of a reassociate-able expression are in
an earlier block.

This can appear in practice, e.g. when computing the final reduction
value after vectorization.
2023-01-09 15:20:55 +00:00
Sander de Smalen
8aff167b34 [AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D141266
2023-01-09 15:08:04 +00:00
David Green
07d6af6a71 [AArch64] Fold And/Or into CSel if possible
If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we
can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we
can emit `csinc x, ZR, cc`. This can help where we can not otherwise
general ccmp instructions.

Differential Revision: https://reviews.llvm.org/D141119
2023-01-09 11:52:37 +00:00
Tim Northover
5b24d42106 TailDuplication: do not remove trivial PHIs from addr-taken blocks.
Unlike an anonymous block, it will not be removed even though we've resolved
all valid paths to get here. So removing a PHI can leave vregs with no
definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.
2023-01-09 11:12:33 +00:00
zhongyunde
9e83333445 [AArch64][SelectionDAG] Eliminates redundant zero-extension for 32-bit popcount
Fix https://github.com/llvm/llvm-project/issues/59597.
mov w8, w0 + fmov d0, x8 ==> fmov s0, w0

Reviewed By: dmgreen, efriedma

Differential Revision: https://reviews.llvm.org/D140649
2023-01-09 16:08:16 +08:00
Serguei Katkov
fd64bd94ed [Inline Spiller] Extend the snippet by statepoint uses
Snippet is a tiny live interval which has copy or fill like def
and copy or spill like use at the end (any of them might abcent).

Snippet has only one use/def inside interval and interval is located
in one basic block.

When inline spiller spills some reg around uses it also forces the
spilling of connected snippets those which got by splitting the
same original reg and its def is a full copy of our reg or its
last use is a full copy to our reg.

The definition of snippet is extended to allow not only one use/def
but more. However all other uses are statepoint instructions which will
fold fill into its operand. That way we do not introduce new fills/spills.

Reviewed By: qcolombet, dantrushin
Differential Revision: https://reviews.llvm.org/D138093
2023-01-09 13:30:57 +07:00
Paul Walker
c9602e02fc [SVE] Fix incorrect VT usage when lowering fixed length vector divides.
Ensure the negation required when lowering negative power-of-two
divides uses the scalable vector container type with the fixed
length result extracted from it.

Fixes: #59647

Differential Revision: https://reviews.llvm.org/D140563
2023-01-08 12:22:05 +00:00
David Green
0d4ab5de7f [ARM][AArch64] Add tests for And/Or into CSel fold. NFC 2023-01-07 14:08:29 +00:00
James Y Knight
1ae36b1387 Remove special cases for invoke of non-throwing inline-asm.
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
2023-01-06 13:53:10 -05:00
Luke Lau
275658d1af [SelectionDAG] Implicitly truncate known bits in SPLAT_VECTOR
Now that D139525 fixes the Hexagon infinite loop, the stopgap can be
removed to provide more information about known bits in SPLAT_VECTOR
whose operands are smaller than the bit width (which is most of the
time)

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141075
2023-01-06 15:43:47 +00:00
Hassnaa Hamdi
9eb698946d [AArch64][SME]: Make 'Expand' the default action for all Ops.
By default expand all operations, then change to Custom/Legal if needed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D141068
2023-01-06 15:32:07 +00:00
Sanjay Patel
bf82070ea4 [SDAG] try to avoid multiply for X*Y==0
Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_
https://alive2.llvm.org/ce/z/STVD7d

We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.

In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.

Differential Revision: https://reviews.llvm.org/D141086
2023-01-06 09:06:11 -05:00
Sanjay Patel
bd87b84a02 [AArch64] add tests for x*y == 0; NFC 2023-01-06 08:37:04 -05:00
Ties Stuij
0b066e02a6 [AArch64] add GlobalIsel support for scalar CNT instruction
When feature CSSC is available we should use instruction CNT for s32, s64 and
s128 types in GlobalIsel's G_CTPOP.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits-

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D139417
2023-01-06 11:08:34 +00:00
David Green
8b5d0361c0 [AArch64] Regenerate fp16-vector-nvcast.ll check lines. NFC 2023-01-05 18:16:58 +00:00
Craig Topper
11e92bd61f [SelectionDAG] Improve codegen for udiv by constant if any divisors are 1.
If the divisor is 1, the magic algorithm does not return a correct
result and we end up using a select to pick the numerator for those
elements at the end.

Therefore we can use undef for that element of the earlier operations
when the divisor is 1. We sometimes get this through SimplifyDemandedVectorElts,
but not always. Definitely seems like we don't if the NPQ fixup is used.

Unfortunately, DAGCombiner is unable to fold srl X, <0, undef> to X so
I had to add flags to avoid emitting the srl unless one of the shift
amounts is non-zero.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141022
2023-01-05 08:41:44 -08:00
Ties Stuij
8d5b759a6c [AArch64][GlobalISel] implement GPR (U/S)(MIN/MAX) instr support
Lower umin, umax, smin, smax intrinsics to corresponding UMIN, UMAX, SMIN, SMAX
instructions when feat CSSC is available.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D139420
2023-01-05 11:50:31 +00:00
Jay Foad
0d518ae50c [GlobalISel] New combine to commute constant operands to the RHS
Differential Revision: https://reviews.llvm.org/D140907
2023-01-05 11:12:40 +00:00
Diana Picus
6ee4f253b2 [GlobalISel] Add G_BUILD_VECTOR[_TRUNC] to CSE
Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in
`shouldCSEOpc`. This simplifies the code generated for vector splats.

Differential Revision: https://reviews.llvm.org/D140965
2023-01-05 10:15:31 +01:00
Roman Lebedev
41005b7ab2
[DAGCombiner] Do try to combine ISD::ANY_EXTEND_VECTOR_INREG nodes
These weren't previously getting combined at all here,
only in target-specific combines.
2023-01-05 01:12:31 +03:00
Roman Lebedev
317a1adfe4
[DAGCombiner] Fold *_EXTEND_INREG of one of CONCAT_VECTORS operands into *_EXTEND of operand
This appears to be the root problematic pattern
for AArch64 regression in D140677.

We already do this, and many more, as target-specific X86 combines,
so this isn't causing much of an impact.
2023-01-05 01:12:31 +03:00
Roman Lebedev
2b1d077592
[NFC][AArch64] Add some tests for upcoming patch 2023-01-05 01:12:31 +03:00
Matt Arsenault
bf4596bf58 CodeGen: Clean up some tests with broken "strictfp" attribute 2023-01-03 20:26:57 -05:00
Samuel Parker
615333bc09 [TypePromotion] NewPM support.
Differential Revision: https://reviews.llvm.org/D140893
2023-01-03 15:09:29 +00:00
Roman Lebedev
4fc417ec37
[DAGCombiner] convertBuildVecZextToBuildVecWithZeros(): rework split factor calculation
The original computation was both making assumptions that do not hold
in practice, and being overly pessimistic. We should just check
every possible split factor, and pick the best one.

Fixes https://github.com/llvm/llvm-project/issues/59781
2023-01-02 18:34:35 +03:00
Roman Lebedev
16facf1ca6
[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector
Single-element vectors are legalized by splitting,
so the the memory operations would also get scalarized.
While we do have some support to reconstruct scalarized loads,
we clearly don't catch everything.

The comment for the affected AArch64 store suggests that
having two stores was the desired outcome in the first place.

This was showing as a source of *many* regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
2022-12-31 03:49:43 +03:00
Roman Lebedev
e4d25a9c23
[DAG] BUILD_VECTOR: absorb ZERO_EXTEND of a single first operand if all other ops are zeros
This kind of pattern seems to come up as regressions
with better ZERO_EXTEND_VECTOR_INREG recognition.

For initial implementation, this is quite restricted
to the minimal viable transform, otherwise there are
too many regressions to be dealt with.
2022-12-31 00:58:11 +03:00
Craig Topper
8abd70081f [TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend.
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.

This patch is for scalars only, but we might be able to do this
for vectors in a follow up.

Differential Revision: https://reviews.llvm.org/D140750
2022-12-29 13:58:46 -08:00
zhongyunde
c69d83908a [AArch64][MachineScheduler] Set no side effect for movprfx
The movprfx is a vector copy, so it doesn't access memory. Set the
value of hasSideEffects 0 to avoid return true for the hasUnmodeledSideEffects(),
which will block the machine scheduler which load/store instructions.

Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D140680
2022-12-28 01:18:14 +08:00
Roman Lebedev
e26e7ed69a
[DAG] combineShuffleToZeroExtendVectorInReg(): try to match w/ commuted operands
We don't have any reason to expect that the operand we will match
is on any particular hand of the shuffle, so we should try both.
2022-12-26 22:54:03 +03:00
Roman Lebedev
83288f8063
[AArch64] Custom lower ISD::ZERO_EXTEND_VECTOR_INREG
The baseline legalization for `ISD::ZERO_EXTEND_VECTOR_INREG`
(`VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG`),
blends-in the zeros, but as mentioned e.g.
in b4bd0a404fe26071dab0854dfd9767974909c7c4,
there is no such thing for AArch64.

So some of the shuffles that would be nicely lowered
by `LowerVECTOR_SHUFFLE()`, e.g. into `ZIP1`,
would now be unrecognizable after round-tripping
through `ISD::ZERO_EXTEND_VECTOR_INREG` recognition & legalization.

The most obvious solution is to just custom-lower
`ISD::ZERO_EXTEND_VECTOR_INREG` as the `ZIP1`-with-zeros,
like it would have been originally in that test case.
2022-12-26 22:54:03 +03:00
Roman Lebedev
62fc5f1640
[DAGCombiner] Add a most basic combineShuffleToZeroExtendVectorInReg()
Sometimes we end up with a shuffles in DAG that would be
better represented as a `ISD::ZERO_EXTEND_VECTOR_INREG`,
and a failure to do so causes suboptimal codegen in a number of cases,
especially when we will then cast vector to scalar.

I acknowledge, the test changes here are rather underwhelming,
but as with all of codegen, it's always a yak shawing,
and this is the most stripped down version of the patch
that shows *some* effect without having insurmountable amount
of fallout to deal with. The next change resolves this regression.

The transformation will be extended in follow-ups.
2022-12-26 22:54:03 +03:00
Roman Lebedev
46458aadd4
[NFC][AArch64] Add a few vector shuffle tests that should be zip1
At least, they are equivalent to the `@vzipNoBlend`, which is lowered into zip1.
2022-12-26 22:54:03 +03:00
Danila Malyutin
821a59588b [TwoAddressInstruction] Constrain RegClass when processing a statepoint
This transformation could've triggered a verifier assert if RegA and RegB
were of different reg classes. Fix this by constraining as the comment
for replaceRegWith suggests.

Differential Revision: https://reviews.llvm.org/D140672
2022-12-26 19:00:34 +03:00
Roman Lebedev
110c5442b8
[NFC][Codegen] Add tests with oversized shifts by non-byte-multiple 2022-12-24 19:26:41 +03:00
Roman Lebedev
a9fbf25a14
[NFC][Codegen] Rename tests for oversized shifts by byte multiple 2022-12-24 19:26:41 +03:00
Roman Lebedev
387c1573f8
[NFC][Codegen] Tests with wide scalar shifts, for new potential legalization strategy 2022-12-24 00:47:25 +03:00
Jessica Paquette
7ef8f9c972 [IR/MachineOutliner] Add a "nooutline" function attr and respect it
Add `nooutline` + update LangRef to say it exists.

This makes it possible to say "don't outline from this function ever."

We want to be able to toggle whether or not a function should be in the search
set regardless of default behaviour.

Add testcases for the IR Outliner + Machine Outliner.

Also remove an unnecessary check for an empty function in the Machine Outliner.

Differential Revision: https://reviews.llvm.org/D140438
2022-12-22 10:22:08 -08:00
David Green
61b72f6abe [AArch64] Add RSHRN and RSHRN2 patterns
This adds some tablegen patterns for RSHRN, which performs a rounding
shift with narrow. This is similar to the existing SHRN patterns with an
extra addition to perform the rounding, that adds 1<<(shift-1) before
the right shift. Because the round immediate and the shift amount are
tied, it goes via a ComplexPattern that uses a SelectRoundingVLShr
method to perform the selection checks.

aarch64_neon_rshrn are expanded into the sequence of equivalent
instructions (trunc(shr(add(x, 1<<(sht-1)), sht))) so that they can be
converted back into RSHRN. Which also allows us to match raddhn through
the adjusted patterns that previously used aarch64_neon_rshrn.

DIfferential Revision: https://reviews.llvm.org/D140297
2022-12-22 16:49:19 +00:00
David Green
440d71f7b7 [AArch64] Additional RSHRN pattern tests. NFC 2022-12-21 18:32:52 +00:00
David Green
3e65ad7482 [AArch64] Combine Trunc(DUP) -> DUP
This adds a simple fold of TRUNCATE(AArch64ISD::DUP) -> AArch64ISD::DUP,
which can help generate more optimal UMULL sequences, and seems useful
in general.

Differential Revision: https://reviews.llvm.org/D140289
2022-12-21 14:59:59 +00:00
Peter Waller
6d877e6717 [AArch64][SVE][CodeGen] Prefer ld1r* over indexed-load when consumed by a splat
If a load is consumed by a single splat, don't consider indexed loads.

This is an alternative implementation to D138581.

Depends on D139637.

Differential Revision: https://reviews.llvm.org/D139850
2022-12-21 14:23:39 +00:00
Ties Stuij
50ddc8cca6 [AArch64] GlobalIsel codegen for gpr CTZ
If feature CSSC is available, CTTZ intrinsics are lowered using the CTZ
instruction when using GlobalIsel.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros-

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D139418
2022-12-21 11:31:50 +00:00
bipmis
eb7b8e3e2a [AAch64] Optimize muls with operands having enough zero bits.
Fix the regression in the reported test case lagarith-preproc.c.
Specfic to the incorrect umsubl generation.

Differential Revision: https://reviews.llvm.org/D139411
2022-12-21 11:14:45 +00:00
Roman Lebedev
3a8e009f97
Revert "Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block""
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.

This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.
2022-12-20 18:36:42 +03:00
David Green
752819e813 [AArch64][ARM] Remove load from dup and vmul tests. NFC
These tests needn't use loads in their testing of dup and mul
instructions, and as the load changes the test may no longer test what
they are intending (as in D140069).
2022-12-20 15:23:38 +00:00
KAWASHIMA Takahiro
347d2be7be [AArch64] Add Neon int instructions to isAssociativeAndCommutative
Differential Revision: https://reviews.llvm.org/D139810
2022-12-20 23:47:51 +09:00
KAWASHIMA Takahiro
673b4ad645 [AArch64] Add FP16 instructions to isAssociativeAndCommutative
`-mcpu=` in `llvm/test/CodeGen/AArch64/machine-combiner.ll` is changed
to `neoverse-n2` to use FP16 and SVE/SVE2 instructions. By this, the
register allocation and/or instruction scheduling are slightly changed
and some existing `CHECK` lines need to be updated.

Differential Revision: https://reviews.llvm.org/D139809
2022-12-20 23:47:51 +09:00
bipmis
454997d396 [AAch64] Optimize muls with operands having enough zero bits.
Muls with 64bit operands where each of the operand is having top 32 bits as zero, we can generate a single umull instruction on a 32bit operand.

Differential Revision: https://reviews.llvm.org/D139411
2022-12-20 14:34:17 +00:00