1593 Commits

Author SHA1 Message Date
Sander de Smalen
8aff167b34 [AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D141266
2023-01-09 15:08:04 +00:00
Sander de Smalen
17a1936122 [AArch64] NFC: Align addTypeForStreamingSVE and addTypeForFixedLengthSVE
This patch is NFC and just moves things around so their implementation is very similar.
2023-01-09 09:47:33 +00:00
Paul Walker
c9602e02fc [SVE] Fix incorrect VT usage when lowering fixed length vector divides.
Ensure the negation required when lowering negative power-of-two
divides uses the scalable vector container type with the fixed
length result extracted from it.

Fixes: #59647

Differential Revision: https://reviews.llvm.org/D140563
2023-01-08 12:22:05 +00:00
Hassnaa Hamdi
9eb698946d [AArch64][SME]: Make 'Expand' the default action for all Ops.
By default expand all operations, then change to Custom/Legal if needed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D141068
2023-01-06 15:32:07 +00:00
Guillaume Chatelet
87b6b347fc Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"
The patch should be discussed further.

This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.
2023-01-06 15:27:50 +00:00
Guillaume Chatelet
dd56e1c92b [NFC] Only expose getXXXSize functions in TypeSize
Currently 'TypeSize' exposes two functions that serve the same purpose:
 - getFixedSize / getFixedValue
 - getKnownMinSize / getKnownMinValue

source : bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134
2023-01-06 15:24:52 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Roman Lebedev
83288f8063
[AArch64] Custom lower ISD::ZERO_EXTEND_VECTOR_INREG
The baseline legalization for `ISD::ZERO_EXTEND_VECTOR_INREG`
(`VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG`),
blends-in the zeros, but as mentioned e.g.
in b4bd0a404fe26071dab0854dfd9767974909c7c4,
there is no such thing for AArch64.

So some of the shuffles that would be nicely lowered
by `LowerVECTOR_SHUFFLE()`, e.g. into `ZIP1`,
would now be unrecognizable after round-tripping
through `ISD::ZERO_EXTEND_VECTOR_INREG` recognition & legalization.

The most obvious solution is to just custom-lower
`ISD::ZERO_EXTEND_VECTOR_INREG` as the `ZIP1`-with-zeros,
like it would have been originally in that test case.
2022-12-26 22:54:03 +03:00
David Green
61b72f6abe [AArch64] Add RSHRN and RSHRN2 patterns
This adds some tablegen patterns for RSHRN, which performs a rounding
shift with narrow. This is similar to the existing SHRN patterns with an
extra addition to perform the rounding, that adds 1<<(shift-1) before
the right shift. Because the round immediate and the shift amount are
tied, it goes via a ComplexPattern that uses a SelectRoundingVLShr
method to perform the selection checks.

aarch64_neon_rshrn are expanded into the sequence of equivalent
instructions (trunc(shr(add(x, 1<<(sht-1)), sht))) so that they can be
converted back into RSHRN. Which also allows us to match raddhn through
the adjusted patterns that previously used aarch64_neon_rshrn.

DIfferential Revision: https://reviews.llvm.org/D140297
2022-12-22 16:49:19 +00:00
David Green
3e65ad7482 [AArch64] Combine Trunc(DUP) -> DUP
This adds a simple fold of TRUNCATE(AArch64ISD::DUP) -> AArch64ISD::DUP,
which can help generate more optimal UMULL sequences, and seems useful
in general.

Differential Revision: https://reviews.llvm.org/D140289
2022-12-21 14:59:59 +00:00
Peter Waller
6d877e6717 [AArch64][SVE][CodeGen] Prefer ld1r* over indexed-load when consumed by a splat
If a load is consumed by a single splat, don't consider indexed loads.

This is an alternative implementation to D138581.

Depends on D139637.

Differential Revision: https://reviews.llvm.org/D139850
2022-12-21 14:23:39 +00:00
David Green
3c0c24e0c1 [AArch64] Combine to UMULL if top bits are known zero
Given mul(zext(a), b), we can convert to a umull so long as we know that
the top bits of b are zero. This uses MaskedValueIsZero to detect that
case for NEON UMULL patterns.

Differential Revision: https://reviews.llvm.org/D140287
2022-12-20 13:50:34 +00:00
Qiu Chaofan
a40ef656d8 [Intrinsic] Rename flt.rounds intrinsic to get.rounding
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D139507
2022-12-19 15:22:39 +08:00
Dinar Temirbulatov
7bce66edc6 [AArch64][SVE] Allow to lower WHILEop with constant operands to PTRUE
This allows it to fold WHILEop with constant operand to PTRUE instruction in
the case given range is fitted to predicate format. Also, this change
fixes the unsigned overflow error introduced in D137547 for WHILELO lowering.

Differential Revision: https://reviews.llvm.org/D139068
2022-12-18 01:27:03 +00:00
Archibald Elliott
82b51a1428 [AArch64] Support SLC in ACLE prefetch intrinsics
This change:
- Modifies the ACLE code to allow the new SLC value (3) for the prefetch
  target.

- Introduces a new intrinsic, @llvm.aarch64.prefetch which matches the
  PRFM family instructions much more closely, and can represent all
  values for the PRFM immediate.

  The target-independent @llvm.prefetch intrinsic does not have enough
  information for us to be able to lower to it from the ACLE intrinsics
  correctly.

- Lowers the acle calls to the new intrinsic on aarch64 (the ARM
  lowering is unchanged).

- Implements code generation for the new intrinsic in both SelectionDAG
  and GlobalISel. We specifically choose to continue to support lowering
  the target-independent @llvm.prefetch intrinsic so that other
  frontends can continue to use it.

Differential Revision: https://reviews.llvm.org/D139443
2022-12-16 14:42:27 +00:00
Nilanjana Basu
795868285d [AArch64] Minor changes and sanity checks in relation to https://reviews.llvm.org/D135229 2022-12-16 01:39:29 +05:30
Nilanjana Basu
02d09ffc1b [AArch64] Extending lowering of 'trunc <(8|16) x i64> %x to <(8|16) x i8>' to use tbl instructions
[AArch64] Patch for lowering trunc instructions to 'tbl' for (8|16)xi32 -> (8|16)xi8 conversions in https://reviews.llvm.org/D133495 is extended to support trunc to tbl lowering for (8|16) x i64 to (8|16) x i8.

A microbenchmark for runtime for these transformations is added in https://reviews.llvm.org/D136274

Reviewed by: fhahn, t.p.northover

Differential Revision: https://reviews.llvm.org/D135229
2022-12-15 20:50:40 +05:30
Tim Northover
10d34f5538 AArch64: use CAS instead of LDXR/STXR if available
This covers 128-bit loads, and atomicrmw operations without a single native
instruction. Using CAS saves has a better chance of succeeding with high
contention on some systems.
2022-12-14 12:16:40 +00:00
David Green
1da4d5aafa [AArch64][SVE] Add hadd and rhadd support
This adds basic HADD and RHADD support for SVE, by marking the AVGFLOOR
and AVGCEIL as custom and converting those to HADD_PRED/RHADD_PRED
AArch64 nodes. Both the existing intrinsics and the _PRED nodes are then
lowered to the _ZPmZ instructions.

Differential Revision: https://reviews.llvm.org/D131875
2022-12-14 09:24:54 +00:00
Martin Storsjö
899739cdbd Revert "[AArch64][GlobalISel] Lower formal arguments of AAPCS & ms_abi variadic functions."
This reverts commit 56fd846f370adf16bea333b12637038ea2f3c225.

This commit regressed handling of functions with floats as arguments,
reproducible e.g. like this:

$ cat test.c
double func(double f) {
    return f;
}
$ clang -target aarch64-windows -S -o - test.c -fno-asynchronous-unwind-tables
func:
	sub	sp, sp, #16
	str	x0, [sp, #8]
	ldr	d0, [sp, #8]
	add	sp, sp, #16
	ret
2022-12-13 11:37:35 +02:00
Vladislav Dzhidzhoev
56fd846f37 [AArch64][GlobalISel] Lower formal arguments of AAPCS & ms_abi variadic functions.
Reimplemented SelectionDAG code for GlobalISel.

Fixes https://github.com/llvm/llvm-project/issues/54079

Differential Revision: https://reviews.llvm.org/D130903
2022-12-12 19:35:40 +03:00
Zain Jaffal
ebae917294 Recommit "[AArch64] Select SMULL for zero extended vectors when top bit is zero"
This is a recommit of f9e0390751cb5eefbbbc191f851c52422acacab1
The previous commit failed to handle cases where the zero extended operand is an extended `BUILD_VECTOR`.
We don't replace zext with a sext operand to select smull if any operand is `BUILD_VECTOR`

Original commit message:

we can safely replace a `zext` instruction with `sext` if the top bit is zero. This is useful because we can select `smull` when both operands are sign extended.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D134711
2022-12-12 14:45:54 +00:00
Zain Jaffal
dfc8ab2e25 Revert "Revert "[AArch64] Select SMULL for zero extended vectors when top bit is zero""
This reverts commit c07a01c2bb0d1f95689f809fd5be23829e364393.
2022-12-12 14:45:27 +00:00
Peter Waller
8812b6eed7 [AArch64][SVE][Fixed length] Fix div miscompile
The prior code worked before SVE DIV was enabled 128 bit vectors.
With 128 bit vectors, when run on a 256 bit machine, it would split and
do a signed unpack, but this resulted in one full vector and one empty
vector with a half-sized predicate. The effect was that only half the
elements were treated correctly.

The fix is to bisect the vector, sign extend, do the division, truncate
and then concat.

Fixes #59357.

Differential Revision: https://reviews.llvm.org/D139618
2022-12-12 11:31:02 +00:00
Salvatore Dipietro
3a894fd90b [AArch64] Lower READCYCLECOUNTER using MRS CNTVCT_EL0
As suggested in D12425 it would be better for the readcyclecounter
function on ARM architecture to use the CNTVCT_EL0 register
(Counter-timer Virtual Count register) instead of the PMCCNTR_EL0
(Performance Monitors Cycle Count Register) because the PMCCNTR_EL0 is a
PMU register which, depending on the configuration, it might always
return zeroes and it doesn't guaranteed to always be increased.

Differential Revision: https://reviews.llvm.org/D136999
2022-12-09 10:36:16 +00:00
Nilanjana Basu
955c0f13cd [AArch64] Extending lowering of 'zext <Y x i8> %x to <Y x i8X>' to use tbl instructions
Adding support for ZExt lowering for destination types beyond the existing support for (8|16) x i32

Patch for lowering zext instructions to 'tbl' for (8|16)xi8 -> (8|16)xi32 conversions in https://reviews.llvm.org/D120571 is extended to support zext to 'tbl' lowering for Y x i8 to Y x i8X where X > 2 and X < 8, that is, any number of vector elements & any destination element type whose size is a multiple of 8 and lies between 16 & 64 is allowed for this transformation.

Related microbenchmarks are in https://reviews.llvm.org/D136274 & https://reviews.llvm.org/D138059

Differential Revision: https://reviews.llvm.org/D136722
2022-12-09 13:55:25 +05:30
Kazu Hirata
8a7cbea525 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-08 23:22:00 -08:00
Florian Hahn
c07a01c2bb
Revert "[AArch64] Select SMULL for zero extended vectors when top bit is zero"
This reverts commit f9e0390751cb5eefbbbc191f851c52422acacab1.

The patch causes a crash for the IR below:

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx"

define void @test(ptr %data, <8 x i16> %v) {
entry:
  %0 = sext <8 x i16> %v to <8 x i32>
  %1 = mul <8 x i32> %0, <i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584, i32 35584>
  %2 = lshr <8 x i32> %1, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %3 = trunc <8 x i32> %2 to <8 x i16>
  store <8 x i16> %3, ptr %data, align 2
  ret void
}
2022-12-08 10:28:33 +00:00
Zain Jaffal
f9e0390751 [AArch64] Select SMULL for zero extended vectors when top bit is zero
we can safely replace a `zext` instruction with `sext` if the top bit is zero. This is useful because we can select `smull` when both operands are sign extended.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D134711
2022-12-08 09:06:18 +02:00
Sander de Smalen
5922a04dbd [AArch64][SVE2p1] Make use of REVD instruction.
Reversing double-words within a quard-word is possible using the REVD instruction
when SVE2p1 is enabled.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D139119
2022-12-06 15:42:32 +00:00
Haojian Wu
6f12281d8e Fix a -Wunused-variable warning in release build, NFC 2022-12-06 13:56:16 +01:00
Archibald Elliott
83b3304dd2 [AArch64] Implement __arm_rsr128/__arm_wsr128
This only contains the SelectionDAG implementation. GlobalISel to
follow.

The broad approach is:
- Introduce new builtins for 128-bit wide instructions.
- Lower these to @llvm.read_register.i128/@llvm.write_register.i128
- Introduce target-specific ISD nodes which have legal operands (two
  i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to
  match the instructions they are for. These are a little complex as
  they need to match the "shape" of what they're replacing or the
  legaliser complains.
- Select these using the existing tryReadRegister/tryWriteRegister to
  share the MDString parsing code, and introduce additional code to
  ensure these are selected into the right MRRS/MSRR instructions. What
  makes this hard is ensuring that the two i64s end up in an XSeqPair
  register pair, because SelectionDAG doesn't care that much about
  register classes if it can avoid doing so.

The main change to existing code is the reorganisation of
tryReadRegister and tryWriteRegister to try to keep the string parsing
code separate from the instruction creating code.

This also includes the changes to clang to define and use the ACLE
feature macro named `__ARM_FEATURE_SYSREG128`.

Contributors:
  Sam Elliott
  Lucas Prates

Differential Revision: https://reviews.llvm.org/D139086
2022-12-06 11:39:05 +00:00
Ties Stuij
94e7e58fa4 [AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support
Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding
UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available.

See specs for corresponding immediate and register versions in:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138813
2022-12-06 10:57:49 +00:00
Ties Stuij
eaea4608e6 [AArch64] lower abs intrinsic to new ABS instruction in SelDag
When feature CSSC is available, the SelectionDag abs intrinsic should map to the
new scalar ABS instruction.

Additionally, the SIMDTwoScalarD tablegen defm includes a pattern match for
scalar i64, which we don't want to use when CSSC is enabled.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ABS--Absolute-value-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138812
2022-12-06 10:48:21 +00:00
Ties Stuij
2f778e60c9 [AArch64] SelectionDag codegen for gpr CTZ instruction
When feature CSSC is available we should use instruction CTZ in SelectionDag
where applicable:

- CTTZ intrinsics are lowered to using the gpr CTZ instruction
- BITREVERSE -> CTLZ instruction pattern gets replaced by CTZ

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138811
2022-12-06 10:42:07 +00:00
Sander de Smalen
4d2f0f723a [AArch64][SME] Avoid going through memory for streaming-compatible splats
Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D139111
2022-12-05 13:04:30 +00:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ties Stuij
82a5f1c62b [AArch64] use CNT for ISD::popcnt and ISD::parity if available
These are the two places where we explicitly want to use cnt in
SelectionDAG when feature CSSC is available: ISD::popcnt and ISD::parity

For both, we need to make sure we're emitting optimized code for i32 (and
lower), i64 and i128. The most optimal way is of course using the GPR CNT
instruction. If we don't have CSSC, but we do have neon, we'll use floating
point CNT. If all fails, we'll fall back on the general GPR popcnt and parity
implementations.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits-

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138808
2022-12-02 11:27:14 +00:00
Sander de Smalen
d32c9e8384 Reland "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca."
Phabricator review for this patch was D138791
2022-12-01 14:48:30 +00:00
David Sherwood
4a5ccf4e93 Revert "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca."
This reverts commit 279c0a83aa22cd35d4b7c7c52b85d2a86f2528a7.
2022-12-01 10:22:21 +00:00
Hassnaa Hamdi
2bda5a6287 [AArch64][SME][NFC]: Enable lowering truncate for enhancement.
Enable lowering truncate to enhance the generated code.
2022-12-01 03:54:28 +00:00
Hassnaa Hamdi
279c0a83aa [AArch64][SME]: Generate streaming-compatible code for ld2-alloca.
To generate code compatible to streaming mode:
 - disable lowering interleaved load to avoid generating invalid NEON intrinsics.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D138791
2022-12-01 02:31:01 +00:00
Hassnaa Hamdi
02db5603ba [AArch64][SME]: Generate streaming-compatible code for fp-extend-trunc
To generate code compatible to streaming mode:
 - enable custome lowering for TruncStore to avoid crashing
   during legalizing TruncStore for non Integer vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D138720
2022-11-29 12:45:53 +00:00
Hassnaa Hamdi
9c7286f938 [AArch64][SME]: Generate streaming-compatible code for bit counting/select
To generate code compatible to streaming mode:
 - enable custom-lowering ISD::CTLZ and ISD::CTPOP.
 - disable combining OR into BSL.

- Testing files:
 - bit-counting.ll
 - bitselect.ll

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D138682
2022-11-29 12:24:21 +00:00
Nicola Lancellotti
49cd18c55e Revert "[AArch64] Canonicalize ZERO_EXTEND to VSELECT"
This reverts commit 43fe14c056458501990c3db2788f67268d1bdf38.
2022-11-28 16:37:30 +00:00
Hassnaa Hamdi
60ab791aa0 [AArch64][SME]: Generate streaming-compatible code for fp-compares.
To generate code compatible to streaming mode:
 - enable expanding ISD::SETUEQ to avoid custom-lowering setcc to setcc_merge_zero
   which cause a crash while instruction selection because there is no pattern match for it.

- Testing files:
 - fp-compares.ll

Differential Revision: https://reviews.llvm.org/D138670
2022-11-28 11:21:40 +00:00
Roman Lebedev
453f27bc9e
[AArch64] LowerBUILD_VECTOR(): NormalizeBuildVector() might return non-BUILD_VECTOR
As apparent in the newly-added test, provided in:
cf624b23bc (commitcomment-90836329),
we should be more careful with handling wider vectors,
or we will assert later on.
2022-11-26 18:46:36 +03:00
Kazu Hirata
23ca55231a [AArch64] Use std::optional in AArch64ISelLowering.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:06:01 -08:00
Benjamin Maxwell
79b5829a15 [TargetLowering][AArch64] Teach DemandedBits about SVE count intrinsics
This allows DemandedBits to see that the SVE count intrinsics (CNTB,
CNTH, CNTW, CNTD) sans multiplier will only ever produce small
positive integers. The maximum value you could get here is 256, which
is CNTB on a machine with a 2048bit vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc)
can be eliminated.

Differential Revision: https://reviews.llvm.org/D138424
2022-11-25 10:15:14 +00:00