12751 Commits

Author SHA1 Message Date
Simon Pilgrim
da570ef1b4 [DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns
Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches)

Differential Revision: https://reviews.llvm.org/D144789
2023-03-14 15:10:30 +00:00
Simon Pilgrim
4bf004e07e [DAG] Fold (bitcast (logicop (bitcast x), (c))) -> (logicop x, (bitcast c)) iff the current logicop type is illegal
Try to remove extra bitcasts around logicops if we're dealing with illegal types

Fixes the regressions in D145939

Differential Revision: https://reviews.llvm.org/D146032
2023-03-14 14:41:11 +00:00
pvanhout
1f1fea6c38 Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 14:38:45 +01:00
pvanhout
0e79106fc9 Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel"
This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.
2023-03-14 11:48:58 +01:00
pvanhout
0022b5803f [DAG/AMDGPU] Use UniformityAnalysis in DAGISel
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D145918
2023-03-14 11:18:28 +01:00
Simon Pilgrim
c7d844ea0f [DAG] Use ISD::isBitwiseLogicOp in AND/OR/XOR checks. NFCI.
There's additional cases we can cleanup (mainly in target code), but this tries to cleanup generic code and PPC which had an equivalent helper.
2023-03-13 13:39:02 +00:00
Chen Zheng
4f0ed16a46 Reland rGf35a09daebd0a90daa536432e62a2476f708150d and rG63854f91d3ee1056796a5ef27753648396cac6ec
[DAGCombiner] handle more store value forwarding

When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.

Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad

Reviewed By: nemanjai, RolandF

Differential Revision: https://reviews.llvm.org/D138899
2023-03-12 21:59:18 -04:00
Jun Ma
00eef4f7c3 [SelectionDAG] Fix mismatched truncate when combine BUILD_VECTOR with EXTRACT_SUBVECTOR
Just use correct type for truncation. Fixes PR59625

Differential Revision: https://reviews.llvm.org/D145757
2023-03-13 08:59:52 +08:00
Simon Pilgrim
82dc04befd [DAG] visitZERO_EXTEND - pull out the repeated SDLoc(N) variables 2023-03-12 15:18:46 +00:00
Simon Pilgrim
4d7da0e711 [DAG] Cleanup the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold. NFC.
Preliminary cleanup before adding some additional legality and value tracking handling.
2023-03-12 15:01:33 +00:00
Simon Pilgrim
b53ea2b9c5 [DAG] visitAND - fold (and (any_ext V), c) -> (zero_ext (and (trunc V), c)) if profitable.
Try to more aggressively narrow masks of extended values.

This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.

This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.

Differential Revision: https://reviews.llvm.org/D145866
2023-03-12 13:25:23 +00:00
Simon Pilgrim
fad852efe4 [DAG] combineShiftAnd1ToBitTest - improve support for peeking through truncations
Allows us to handle shift amounts that exceed the original bitwidth
2023-03-11 16:37:47 +00:00
Yeting Kuo
b2c48559c8 [IR][DAG][RISCV] Allow scalable vector ISD::STRICT_FP_EXTEND and RISC-V supports for vector ISD::STRICT_FP_EXTEND.
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145548
2023-03-09 17:37:59 +08:00
Juneyoung Lee
a66bc1c4a3 [DAGCombiner] Avoid converting (x or/xor const) + y to (x + y) + const if benefit is unclear
This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .

DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.

This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144116
2023-03-08 18:13:57 +00:00
Chen Zheng
fc26ab36a2 [DAGCombiner] don't use the pointer info for widen store
The merged store touches memory for other underlying objects, so mapping
the merged store to the first underlying object is not correct. For example
in https://github.com/llvm/llvm-project/issues/60744, the merged store is
not correctly analyzed as dependent with memory operations which are also
part of the merged store.

Fixes #60744

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D144711
2023-03-07 20:31:09 -05:00
Jay Foad
0265dd9925 Fix "compatiable" typos 2023-03-07 12:57:39 +00:00
Noah Goldstein
c1ecd0a3f4 [DAGCombiner] Add fold for ~x + x -> -1
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D145177
2023-03-06 20:30:27 -06:00
Noah Goldstein
d4b24b4a55 [DAGCombiner] Add fold for ~x & x -> 0
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.

Differential Revision: https://reviews.llvm.org/D145143
2023-03-06 20:30:20 -06:00
Marco Elver
bdb4353ae0 [SelectionDAG] Optimize copyExtraInfo deep copy
It turns out that there are relatively trivial, albeit rare, cases that
require a MaxDepth of more than 16 (see added test). However, we want to
avoid having to rely on a large fixed MaxDepth.

Since these cases are relatively rare, apply the following strategy:

  1. Start with a low MaxDepth of 16 - if the entry node was not
     reached, we can return (the common case).

  2. If the entry node was reached, exponentially increase MaxDepth up
     to some large limit that should cover all cases and guard against
     stack exhaustion.

This retains the better performance with a low MaxDepth in the common
case, and in complex cases backs off and retries. On a whole, this is
preferable vs. starting with a large MaxDepth which would unnecessarily
penalize the common case where a low MaxDepth is sufficient.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D145386
2023-03-06 17:29:53 +01:00
Caroline Concatto
204800ad0a [IR][Legalization] Promote illegal deinterleave and interleave vectors
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D144846
2023-03-03 10:54:52 +00:00
Marco Elver
7ecd2a23f5 [SelectionDAG] Fix missing lambda capture
Move MaxDepth into the lambda, since it is not needed outside. This
fixes some compilers that complain about missing capture:

  error C3493: 'MaxDepth' cannot be implicitly captured because no
  default capture mode has been specified

Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")
2023-03-02 23:47:36 +01:00
Marco Elver
f693932fbe [SelectionDAG] Transitively copy NodeExtraInfo on RAUW
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.

Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.

This fixes the remaining pcsections-atomics.ll tests on X86.

v2: Optimize copyExtraInfo() deep copy. For now we assume that only
NodeExtraInfo that have PCSections set require deep copy. Furthermore,
limit the depth of graph search while pre-populating the visited set,
assuming the to-be-replaced subgraph 'From' has limited complexity. An
assertion catches if the maximum depth needs to be increased.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144677
2023-03-02 23:07:19 +01:00
Craig Topper
06c6b787b2 [SelectionDAG][AArch64] Constant fold in SelectionDAG::getVScale if VScaleMin==VScaleMax.
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D145113
2023-03-02 12:02:38 -08:00
Craig Topper
c546f13f1f [DAGCombiner] Replace LegalOperations check in visitSIGN_EXTEND with LegalTypes.
This is guarding a check for isTypeLegal so it should check is
LegalTypes.

Fixes PR61111.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D145139
2023-03-02 07:52:53 -08:00
Sander de Smalen
170e7a0ec2 [AArch64][SME2] Add CodeGen support for target("aarch64.svcount").
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D136862
2023-03-02 12:07:41 +00:00
J. Ryan Stinnett
22b8e82c12 [DebugInfo] Remove dbg.addr from CodeGen
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.

Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144800
2023-03-02 09:29:43 +00:00
J. Ryan Stinnett
f5b85c02e9 [DebugInfo][NFC] Remove FuncArgumentDbgValueKind::Addr from SelectionDAG
This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added
by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the
original intrinsic that marked a function argument, but the `Addr` part was
never used.

Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898

Differential Revision: https://reviews.llvm.org/D144794
2023-03-02 09:29:42 +00:00
Marco Elver
e0bc779000 Revert "[SelectionDAG] Transitively copy NodeExtraInfo on RAUW"
This reverts commit 7f635b90e7bdf1378fd9a65fc62b99e8e07d4aaf.

The current implementation causes pathological slowdowns in certain
cases: https://github.com/llvm/llvm-project/issues/61108
2023-03-02 09:39:44 +01:00
Simon Pilgrim
73cdccad55 [DAG] expandIntMINMAX - attempt to match existing SETCC node
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.

There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.

An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.

Differential Revision: https://reviews.llvm.org/D145065
2023-03-01 19:04:03 +00:00
David Green
337215ddf9 [DAG] ABD is not reassociative
I'm not sure how I missed this in the testing, but as far as I understand
whilst ABDS and ABDU are commutive they are not associative. This patch
disables reassociateOps from visitABD, fixing the problems found in #61069.
ABDU: https://alive2.llvm.org/ce/z/eiT5QG
ABDS: https://alive2.llvm.org/ce/z/HzE29l

Differential Revision: https://reviews.llvm.org/D145064
2023-03-01 16:22:13 +00:00
Caroline Concatto
cb96eba27c [IR][Legalization] Split illegal deinterleave and interleave vectors
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in SplitVectorResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144744
2023-03-01 08:30:16 +00:00
David Green
06daa515b2 [AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.

Differential Revision: https://reviews.llvm.org/D144850
2023-02-27 19:20:10 +00:00
Marco Elver
7f635b90e7 [SelectionDAG] Transitively copy NodeExtraInfo on RAUW
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.

Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.

This fixes the remaining pcsections-atomics.ll tests on X86.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144677
2023-02-27 12:16:14 +01:00
Noah Goldstein
e981e6d10e Add transform for (and/or (icmp eq/ne A,-1),(icmp eq/ne A,-1+C))->(and/or (icmp eq/ne (and ~A,-1+C),0))
This works of `-1+C` is a negative power of 2.

This can be more useful than the `AddAnd` case as `~A` does not
necessarily require materializing a constant. This makes the transform
worth it for X86 vector types.

Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/P6u8cq
NE: https://alive2.llvm.org/ce/z/_Kkqp1

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144284
2023-02-24 15:22:09 -06:00
Noah Goldstein
8c74c5402f Make (and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1)) where IsPow(dif(C0,C1)) work for more patterns.
`(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` can be lowered to
`(icmp eq/ne (and (sub A, (smin C0, C1)), (not (sub (smax C0, C1), (smin C0, C1)))), 0)`
generically if `(sub (smax C0, C1), (smin C0,C1))` is a power of 2.

This covers the existing case of `(and/or (icmp eq/ne A, C_Pow2),(icmp eq/ne A, -C_Pow2))`
as well as other cases.

Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/mLJiUW
NE: https://alive2.llvm.org/ce/z/TKnzUr

Differential Revision: https://reviews.llvm.org/D144283
2023-02-24 15:22:09 -06:00
Serge Pavlov
7f81dd4dd6 [NFC] Make FPClassTest a bitmask enumeration
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.

With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-24 15:12:16 +07:00
Samuel Parker
f48d3b6f46 Revert "[DAGCombine] Fold redundant select"
This reverts commit c7f9344d0f8f6a00adab138037e2e7b406ef2b69.
2023-02-23 17:59:41 +00:00
Craig Topper
230e61658b [LegalizeTypes] Add a special case for (add X, 1) to ExpandIntRes_ADDSUB.
On targets without ADDCARRY or ADDE, we need to emit a separate
SETCC to determine carry from the low half to the high half. Usually
we do (setult Lo, LHSLo). If RHSLo is 1 we can instead do (seteq Lo, 0).
This can reduce the live range of LHSLo.
2023-02-23 09:47:42 -08:00
Craig Topper
2fc5a5117c [LegalizeTypes][RISCV] Add a special case to ExpandIntRes_UADDSUBO for (uaddo X, 1).
On targets that lack ADDCARRY support we split a wide uaddo into
an ADD and a SETCC that both need to be split.

For (uaddo X, 1) we can observe that when the add overflows the result
will be 0. We can emit (seteq (or Lo, Hi), 0) to detect this.

This improves D142071.

There is an alternative here. We could use either ~(lo(X) & hi(X)) == 0 or
(lo(X) & hi(X)) == -1 before the addition. That would be closer to the
code before D142071.

Reviewed By: liaolucy

Differential Revision: https://reviews.llvm.org/D144614
2023-02-23 09:16:54 -08:00
Nikita Popov
8347ca7dc8 [PatternMatch] Don't require DataLayout for m_VScale()
The m_VScale() matcher is unusual in that it requires a DataLayout.
It is currently used to determine the size of the GEP type. However,
I believe it is sufficient to check for the canonical
<vscale x 1 x i8> form here -- I don't think there's a need to
recognize exotic variations like <vscale x 1 x i4> as a vscale
constant representation as well.

Differential Revision: https://reviews.llvm.org/D144566
2023-02-23 15:30:29 +01:00
Yeting Kuo
419948fe67 [VP] Reorder is_int_min_poison/is_zero_poison operand before mask for vp.abs/ctlz/cttz.
The patch ensures last two operands of vp.abs/ctlz/cttz are mask and evl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144536
2023-02-23 13:58:21 +08:00
Serge Pavlov
08a09235b6 Revert "[NFC] Make FPClassTest a bitmask enumeration"
This reverts commit e7613c1d9b259bdf2b0b06b4169d9a10dd553406.

GCC issues an error:

In file included from /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:9:
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: explicit specialization of template<class E, class Enable> struct llvm::is_bitmask_enum outside its namespace must use a nested-name-specifier [-fpermissive]
   66 |   template <> struct is_bitmask_enum<Enum> : std::true_type {};                \
      |                      ^~~~~~~~~~~~~~~~~~~~~
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/ADT/BitmaskEnumTest.cpp:30:1: note: in expansion of macro LLVM_DECLARE_ENUM_AS_BITMASK
   30 | LLVM_DECLARE_ENUM_AS_BITMASK(Flags2, V4);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-02-23 12:55:58 +07:00
Serge Pavlov
e7613c1d9b [NFC] Make FPClassTest a bitmask enumeration
This is recommit of 2e416cdd52, reverted in 8555ab2fcd, because GCC
complains on extra qualification. The macro LLVM_DECLARE_ENUM_AS_BITMASK
does not specify llvm:: anymore, so the macro must occur in the namespace
llvm. Documentation updated accordingly. The original commit message is below.

With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-23 12:38:57 +07:00
Cameron McInally
af4c4f4e21 [DAGCombine] Fix an ICE in combineMinNumMaxNum(...)
65420c8041f4 introduced an ICE in combineMinNumMaxNum(...) when
combineMinNumMaxNumImpl(...) returns an SDValue(). Make sure to check that a
value is returned before trying to perform an FNEG on it.

GitHub Issue: #60924

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144571
2023-02-22 11:00:51 -08:00
Ricardo Jesus
272bd573dc [AArch64] Fix abs(sub nsw) -> absd
This partially reverts a regression introduced in 8f25e382c5b1 for
AArch64 targets. In particular, we restore the logic of `(abs (sub nsw
x, y)) -> abds(x, y)` for all targets except X86, which keeps the logic
introduced in 8f25e382c5b1. See also https://reviews.llvm.org/D142288.

Differential Revision: https://reviews.llvm.org/D144379
2023-02-22 09:17:25 +00:00
Nikita Popov
8555ab2fcd Revert "[NFC] Make FPClassTest a bitmask enumeration"
This reverts commit 2e416cdd52c1079b8c7cb1f7d7e557c889a4fb56.

Breaks the GCC build:

In file included from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:18,
                 from /home/npopov/repos/llvm-project/llvm/include/llvm/ADT/APFloat.h:20,
                 from /home/npopov/repos/llvm-project/llvm/lib/Support/APFloat.cpp:14:
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:66:22: error: extra qualification not allowed [-fpermissive]
   66 |   template <> struct llvm::is_bitmask_enum<Enum> : std::true_type {};          \
      |                      ^~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’
  223 | LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, /* LargestValue */ fcPosInf);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:67:22: error: extra qualification not allowed [-fpermissive]
   67 |   template <> struct llvm::largest_bitmask_enum_bit<Enum> {                    \
      |                      ^~~~
/home/npopov/repos/llvm-project/llvm/include/llvm/ADT/FloatingPointMode.h:223:1: note: in expansion of macro ‘LLVM_DECLARE_ENUM_AS_BITMASK’
  223 | LLVM_DECLARE_ENUM_AS_BITMASK(FPClassTest, /* LargestValue */ fcPosInf);
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[43/4396] Building CXX object lib/Supp...iles/LLVMSupport.dir/CommandLine.cpp.o
2023-02-22 08:56:19 +01:00
Serge Pavlov
2e416cdd52 [NFC] Make FPClassTest a bitmask enumeration
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.

Differential Revision: https://reviews.llvm.org/D144241
2023-02-22 14:20:04 +07:00
Fangrui Song
e4f4f34e7a [SelectionDAG] Migrate away from soft-deprecated functions. NFC 2023-02-21 11:01:34 -08:00
Caroline Concatto
d515ecca68 [IR] Add new intrinsics interleave and deinterleave vectors
This patch adds 2 new intrinsics:

  ; Interleave two vectors into a wider vector
  <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

  ; Deinterleave the odd and even lanes from a wider vector
  {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:

  using cf = std::complex<float>;

  void foo(cf * dst, int N) {
      for (int i=0; i<N; ++i)
          dst[i] += cf(1.f, 2.f);
  }

For this loop, LoopVectorize:
  (1) Loads a wide vector (e.g. <8 x float>)
  (2) Extracts odd lanes using shufflevector (leading to <4 x float>)
  (3) Extracts even lanes using shufflevector (leading to <4 x float>)
  (4) Performs the addition
  (5) Interleaves the two <4 x float> vectors into a single <8 x float> using
      shufflevector
  (6) Stores the wide vector.

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D141924
2023-02-20 12:21:59 +00:00
Kazu Hirata
a28b252d85 Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.
2023-02-19 23:56:52 -08:00