126 Commits

Author SHA1 Message Date
Kerry McLaughlin
c34cba0413
[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305)
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
2025-08-20 09:48:36 +01:00
Benjamin Maxwell
43a9ec2ecd
[AArch64][SME] Instcombine llvm.aarch64.sme.in.streaming.mode() (#147930)
This can fold away in functions with known streaming modes.
2025-07-13 13:20:20 +01:00
Paul Walker
e478a22d54
[LLVM][IRBuilder] Use NUW arithmetic for Create{ElementCount,TypeSize}. (#143532)
This put the onus on the caller to ensure the result type is big enough.
In the unlikely event a cropped result is required then explicitly
truncate a safe value.
2025-06-19 13:24:39 +01:00
Ricardo Jesus
c70c0a86a5
[AArch64][InstCombine] Combine AES instructions with zero operands. (#142781)
We currently combine (AES (EOR (A, B)), 0) into (AES A, B) for Neon
intrinsics when the zero operand appears in the RHS of the AES
instruction.

This patch extends the combine to support AES SVE intrinsics and
the case where the zero operand appears in the LHS of the AES
instructions.
2025-06-09 08:27:58 +01:00
Ricardo Jesus
fbd9a3160b
[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. (#137956)
This patch combines uxt[bhw] intrinsics to and_u when the governing
predicate is all-true or the passthrough is undef (e.g. in cases of
``unknown'' merging). This improves code gen as the latter can be
emitted as AND immediate instructions.

For example, given:
```cpp
svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}
```

Currently:
```gas
foo:
  ptrue   p0.d
  movi    v1.2d, #0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret
```

Becomes:
```gas
foo:
  and     z0.d, z0.d, #0xff
  ret
```
2025-05-06 08:48:08 +01:00
Paul Walker
149d795ab0
[LLVM][InstCombine] Enable constant folding for SVE sdiv & udiv intrinsics. (#137966) 2025-05-01 13:20:05 +01:00
Paul Walker
8dc89e3419
[LLVM][InstCombine] Enable constant folding for SVE asr,lsl and lsr intrinsics. (#137350)
The SVE intrinsics support shift amounts greater-than-or-equal to the
element type's bit-length, essentially saturating the shift amount to
the bit-length. However, the IR instructions consider this undefined
behaviour that results in poison. To account for this we now ignore the
result of the simplifications that result in poison. This allows
existing code to be used to simplify the shifts but does mean:

1) We don't simplify cases like "svlsl_s32(x, splat(32)) => 0".
2) We no longer constant fold cases like "svadd(poison, X) => poison"

For (1) we'd need dedicated target specific combines anyway and the
result of (2) is not specified by the ACLE and replicating LLVM IR
behaviour might be confusing to ACLE writers.
2025-04-30 13:21:46 +01:00
Paul Walker
50d3febf17 [NFC][InstCombine][AArch64] Refactor SVE intrinsics tests.
Only test a single element type because the combines are not sensitive
to them. Extend coverage to cover the majority of SVE merging
intrinsics.
2025-04-29 10:44:08 +00:00
Paul Walker
93321966d9 [NFC][InstCombine][AArch64] Add simplify tests for reversed SVE intrinsics.
Add missing tests for fdivr, fsubr, sdivr, subr & udivr.
Add test case to demonstrate incorrect poison propagation.
2025-04-29 10:33:47 +00:00
Paul Walker
6edcb52f40 [NFC][InstCombine][AArch64] Remove SVE mul idempotency tests.
They have been subsumed into the SVE binop simplification tests.
2025-04-29 10:03:04 +00:00
Paul Walker
96ec17dfed
[LLVM][InstCombine] Enable constant folding for SVE add,and,eor,fadd,fdiv,fsub,orr & sub intrinsics. (#136849)
This is the subset of binops (mul and fmul are already enabled) whose
behaviour fully aligns with the equivalent SVE intrinsic. The omissions
are integer divides and shifts that are defined to return poison for
values where the intrinsics have a defined result. These will be covered
in a seperate PR.
2025-04-25 11:30:03 +01:00
Paul Walker
013aab4051 [NFC][LLVM] Add test coverage for all binops to sve-intrinsic-simplify-binop.ll.
Also adds sve-intrinsic-simplify-shift.ll to test asr, shl and shr.
2025-04-23 11:02:43 +00:00
Matthew Devereau
91a205653e
[AArch64][SVE] Instcombine ptrue(all) to splat(i1) (#135016)
SVE Operations such as predicated loads become canonicalized to LLVM
masked loads, and doing the same for ptrue(all) to splat(1) creates
further optimization opportunities from generic LLVM IR passes.
2025-04-13 20:40:51 +01:00
Paul Walker
1997073a54
[LLVM][InstCombine][SVE] Refactor sve.mul/fmul combines. (#134116)
After https://github.com/llvm/llvm-project/issues/126928 it's now
possible to rewrite the existing combines, which mostly only handle
cases where a operand is an identity value, to use existing simplify
code to unlock general constant folding.
2025-04-08 11:38:27 +01:00
Paul Walker
575656877f [LLVM][AArch64] Reduce uses of "undef" in SVE InstCombine tests.
Also removes a largely duplicate test file and changes the other
one to use autogenerated CHECK lines.
2025-02-26 11:11:02 +00:00
David Green
c71f9141a9 [AArch64] Add a phase-ordering test for dividing vscale. NFC
See #126411 / #127055, the test isn't expected to fold in a single instcombine
iteration, needing instcombine->cse->instcombine.
2025-02-18 10:48:50 +00:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Paul Walker
56c091ea71
[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856)
This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.
2024-11-21 11:21:12 +00:00
Paul Walker
33fcd6acc7 [NFC][LLVM] Migrate tests to use update_test_checks.py.
Transforms/InstCombine/AArch64/sve-intrinsic-fmul-idempotency.ll
  Transforms/InstCombine/AArch64/sve-intrinsic-fmul_u-idempotency.ll
  Transforms/InstCombine/AArch64/sve-intrinsic-mul-idempotency.ll
  Transforms/InstCombine/AArch64/sve-intrinsic-mul_u-idempotency.ll
  Transforms/InstCombine/scalable-const-fp-splat.ll
  Transforms/InstSimplify/ConstProp/extractelement-vscale.ll
  Transforms/InstSimplify/ConstProp/vscale-shufflevector-inseltpoison.ll
  Transforms/InstSimplify/ConstProp/vscale-shufflevector.ll
2024-11-20 12:18:47 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
Danila Malyutin
1a609052b6
[AArch64][InstCombine] Eliminate redundant barrier intrinsics (#112023)
If there are no memory ops on the path from one dmb to another then one
barrier can be eliminated.
2024-10-17 21:04:04 +04:00
Paul Walker
d283705829
[AArch64][SVE] Fix definition of bfloat fcvt intrinsics. (#110281)
Affected intrinsics:
  llvm.aarch64.sve.fcvt.bf16f32
  llvm.aarch64.sve.fcvtnt.bf16f32
    
The named intrinsics took a predicate based on the smallest element type
when it should be based on the largest. The intrinsics have been replace
by v2 equivalents and affected code ported to use them.
    
Patch includes changes to getSVEPredicateBitCast() that ensure the
generated code for the auto-upgraded old intrinsics is unchanged.
2024-10-03 12:36:01 +01:00
Paul Walker
be9461cda6
[LLVM][InstCombine][SVE] fcvtnt(a,all_active,b) != fcvtnt(undef,all_active,b) (#110278)
The "narrowing top" convert instructions leave the bottom half of active
elements untouched and thus the first paramater of their associated
intrinsic remains live even when there are no inactive lanes.
2024-10-01 11:13:04 +01:00
Paul Walker
622ae7ffa4
[LLVM][InstCombine][AArch64] sve.insr(splat(x), x) ==> splat(x) (#109445)
Fixes https://github.com/llvm/llvm-project/issues/100497
2024-09-24 15:11:36 +01:00
Matthew Devereau
1808fc13c8
[AArch64][InstCombine] Bail from combining SRAD on +/-1 divisor (#109274)
This fixes a crash when svdiv's third parameter is svdup_s64(1)
2024-09-20 13:53:02 +01:00
Lukacma
d57be195e3
[AArch64] replace SVE intrinsics with no active lanes with zero (#107413)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE intrinsics into zero constants when predicate is zero.
2024-09-09 10:28:01 +01:00
Lukacma
113806d187
[AArch64] optimise SVE cvt intrinsics with no active lanes (#104809)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE cvt intrinsics away when predicate is zero.
2024-08-29 11:45:14 +01:00
cceerczw
67a9093a47
[instCombine][bugfix] Fix crash caused by using of cast in instCombineSVECmpNE (#102472) 2024-08-23 15:30:51 +01:00
Lukacma
29cb1e6b4f
[AArch64] optimise SVE cmp intrinsics with no active lanes (#104779)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises SVE cmp intrinsics to zero vector when predicate is zero.
2024-08-22 15:51:51 +01:00
Lukacma
d7aeea626d
[AArch64] optimise SVE prefetch intrinsics with no active lanes (#103052)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
optimises away SVE prefetch intrinsics when predicate is zero.
2024-08-15 13:52:35 +01:00
Lukacma
9ceb45cc19
[AArch64][SVE] optimisation for unary SVE store intrinsics with no active lanes (#95793)
This patch extends https://github.com/llvm/llvm-project/pull/73964 and
adds optimisation of store SVE intrinsics when predicate is zero.
2024-07-02 11:37:52 +02:00
Lukacma
0bd9c49a29
[AArch64][SVE] optimisation for SVE load intrinsics with no active lanes (#95269)
This patch extends #73964 and adds optimisation of load SVE intrinsics
when predicate is zero.
2024-06-25 10:58:16 +02:00
Paul Walker
fd07b8f809 [LLVM][tests/Transforms/InstCombine] Convert instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Usman Nadeem
267d6b5ed2
[AArch64][SVE] Instcombine uzp1/reinterpret svbool to use vector.insert (#81069)
Concatenating two predictes using uzp1 after converting to double length
using sve.convert.to/from.svbool is optimized poorly in the backend,
resulting in additional `and` instructions to zero the lanes. See
https://github.com/llvm/llvm-project/pull/78623/

Combine this pattern to use `llvm.vector.insert` to concatenate and get
rid of convert to/from svbools.
2024-02-15 10:40:09 -08:00
Mark Harley
adfd13157d
[AArch64][SVE] Add optimisation for SVE intrinsics with no active lanes (#73964)
This patch introduces optimisations for SVE intrinsic function calls
which have all false predicates.
2024-01-10 11:56:52 +00:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
zhongyunde 00443407
bf90ffb9b4 [SVE][InstCombine] Delete redundante sel instructions with ptrue
svsel(pture, x, y) => x. depend on D121792
Reviewed By: paulwalker-arm, david-arm
2023-10-13 09:20:36 +08:00
zhongyunde 00443407
127cf4ead3 [SVE][InstCombine] Precommit tests for select + ptrue 2023-10-13 09:20:36 +08:00
Nikita Popov
c00f49cf12 [InstCombine] Remove instcombine-infinite-loop-threshold option
This option has been superseded by the fixpoint verification
functionality.
2023-09-21 15:30:05 +02:00
Fangrui Song
d39b4ce3ce [test] Replace aarch64-*-eabi with aarch64
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
2023-06-27 20:02:52 -07:00
Fangrui Song
ebbfdca586 [test] Replace aarch64-arm-none-eabi with aarch64
Similar to 02e9441d6ca73314afa1973a234dce1e390da1da, but for llvm/test and one
lld/test/ELF test.
2023-06-27 19:36:27 -07:00
Jolanta Jensen
ecb07f481b [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++ builtins
This patch implements IR combines to convert intrinsics used for _m C/C++ builtins
which take an all active predicate to their equivalent _u intrinsic.

Differential Revision: https://reviews.llvm.org/D152005
2023-06-21 10:35:13 +00:00
Paul Walker
b7287a82d3 [SVE][AArch64TTI] Fix invalid mla combine that miscomputes the value of inactive lanes.
Consider: add(pg, a, mul_u(pg, b, c))

Although the multiply's inactive lanes are undefined, they don't
contribute to the final result.  The overall result of the inactive
lanes come from "a" and thus the above is another form of mla
rather than mla_u.
2023-06-18 13:07:03 +01:00
Paul Walker
7a8f6a3eaa Increase test coverage of Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll 2023-06-18 13:07:03 +01:00
Nikita Popov
f9f8517e03 [InstCombine][AArch64] Fix phi insertion point
Fix the issue reported at https://reviews.llvm.org/rG724f4a5bac25#inline-9083,
by specifying the correct insertion point for the new phi.
2023-06-16 14:58:33 +02:00
Nikita Popov
f10103ba59 [InstCombine] Regenerate test checks (NFC) 2023-06-16 14:50:16 +02:00
Jolanta Jensen
a963dbb5ac [SVE ACLE] Extend existing aarch64_sve_mul combines to also act on aarch64_sve_mul_u.
Differential Revision: https://reviews.llvm.org/D152004
2023-06-06 15:26:33 +00:00
Jolanta Jensen
dc63b35b02 [SVE ACLE] Extend IR combines for fmul, fsub, fadd to cover _u variants
This patch extends existing IR combines for: fmul, fsub and fadd,
relying on all active predicate to also apply to their equivalent
undef (_u) intrinsics.

Differential Revision: https://reviews.llvm.org/D150768
2023-06-02 11:06:57 +00:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00