303 Commits

Author SHA1 Message Date
Florian Hahn
1217c8226b
[LoopIdiom] Add test for simplifying SCEV during expansion with flags. 2025-08-19 13:22:45 +01:00
Sushant Gokhale
e8918c318e
[SCEV] Consider non-volatile memory intrinsics as not having side-effect for forward progress (#150916)
For the attached test:
Before the loop-idiom pass, we have a store into the inner loop which is
considered simple and one that does not have any side effects on the
loop. Post loop-idiom pass, we get a memset into the outer loop that is
considered to introduce side effects on the loop. This changes the
backedge taken count before and after the pass and hence, the crash with
verify-scev.

We try to consider non-volatile memory intrinsics as not having
side-effect for forward progress to fix the issue.

Fixes #149377
2025-08-11 00:24:50 -07:00
Florian Hahn
99d70e09a9
[SCEV] Allow adds of constants in tryToReuseLCSSAPhi. (#150693)
Update the logic added in
https://github.com/llvm/llvm-project/pull/147824 to also allow adds of
constants. There are a number of cases where this can help remove
redundant phis and replace some computation with a ptrtoint (which
likely is free in the backend).

PR: https://github.com/llvm/llvm-project/pull/150693
2025-07-31 16:33:25 +01:00
Florian Hahn
f9f68af4b8
[SCEV] Make sure LCSSA is preserved when re-using phi if needed.
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.
2025-07-28 16:24:46 +01:00
Florian Hahn
8437038984
[LoopIdiom] Add test where LCSSA needs preserving when re-using PHI (NFC) 2025-07-28 16:02:18 +01:00
Florian Hahn
e21ee41be4
[SCEV] Try to re-use pointer LCSSA phis when expanding SCEVs. (#147824)
Generalize the code added in
https://github.com/llvm/llvm-project/pull/147214 to also support
re-using pointer LCSSA phis when expanding SCEVs with AddRecs.

A common source of integer AddRecs with pointer bases are runtime checks
emitted by LV based on the distance between 2 pointer AddRecs.

This improves codegen in some cases when vectorizing and prevents
regressions with https://github.com/llvm/llvm-project/pull/142309, which
turns some phis into single-entry ones, which SCEV will look through
now (and expand the whole AddRec), whereas before it would have to treat
the LCSSA phi as SCEVUnknown.

Compile-time impact neutral:
https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/147824
2025-07-25 15:29:40 +01:00
Alex Bradbury
3877039fd1
[LoopIdiom] Select llvm.experimental.memset.pattern intrinsic rather than memset_pattern16 libcall (#126736)
In order to keep the change as incremental as possible, this only
introduces the memset.pattern intrinsic in cases where memset_pattern16
would have been used. Future patches can enable it on targets that don't
have the intrinsic, and select it in cases where the libcall isn't
directly usable. As the memset.pattern intrinsic takes the number of
times to store the pattern as an argument unlike memset_pattern16 which
takes the number of bytes to write, we no longer try to form an i128
pattern.

Special care is taken for cases where multiple stores in the same loop
iteration were combined to form a single pattern. For such cases, we
inherit the limitation that loops such as the following are supported:

```
for (unsigned i = 0; i < 2 * n; i += 2) {
  f[i] = 2;
  f[i+1] = 2;
}
```

But the following doesn't result in a memset.pattern (even though it
could be, by forming an appropriate pattern):
```
for (unsigned i = 0; i < 2 * n; i += 2) {
  f[i] = 2;
  f[i+1] = 3;
}
```

Addressing this existing deficiency is left for a follow-up due to a
desire not to change too much at once (i.e. to target equivalence to the
current codegen).

A command line option is introduced to force the selection of the
intrinsic even in cases it wouldn't be (i.e. in cases where the libcall
wouldn't have been selected). This is intended as a transitionary option
for testing and experimentation, to be removed at a later point.

The only platforms this should impact are those that have the memset_pattern16 libcall (Apple platforms). Testing performed to check for no unexpected codegen changes is described here https://github.com/llvm/llvm-project/pull/126736#issuecomment-3005097468
2025-07-09 13:48:15 +01:00
Florian Hahn
1340ecf0ba
[SCEV] Add more tests with zext(add C, %var)<nsw>.
Add extra test coverage for
https://github.com/llvm/llvm-project/pull/142599.
2025-06-03 22:03:20 +01:00
Amy Huang
7175970fc5
Add debug location to strlen in LoopIdiomRecognize pass (#140164)
Pass down the debug location to the generated strlen call
because LLVM maintains that calls to inlinable functions must have debug info.
2025-05-16 14:42:20 -07:00
Yingwei Zheng
76e07d8ba5
[LibCall] Infer nocallback for libcalls (#135173)
This patch adds `nocallback` attributes for string/math libcalls. It
allows FuncAttributor to infer `norecurse` more precisely and encourages
more aggressive global optimization.
2025-04-12 15:11:54 +08:00
Henry Jiang
68b7cba2b0
[LoopIdiom] Update strlen idiom body loop condition to be clean up by LoopDeletion (#134906)
Fixes the case where subsequent passes were unable to find and delete
the invariant loop left over by the strlen idiom conversion. Since
`loop-deletion` only operate on computable loops, we can update the loop
condition to something more easily picked up by `loop-deletion`

As pointed out in https://github.com/llvm/llvm-project/issues/134736
2025-04-11 12:55:45 -04:00
Henry Jiang
9694844d7e
Reland "[Transforms] LoopIdiomRecognize recognize strlen and wcslen #108985" (#132572)
Reland https://github.com/llvm/llvm-project/pull/108985

Extend `LoopIdiomRecognize` to find and replace loops of the form
```c
base = str;
while (*str)
  ++str;
```
and transforming the `strlen` loop idiom into the appropriate `strlen`
and `wcslen` library call which will give a small performance boost if
replaced.
```c
str = base + strlen(base)
len = str - base
```
2025-03-24 09:49:31 -04:00
Martin Storsjö
2a4522229c Revert "Reland "[Transforms] LoopIdiomRecognize recognize strlen and wcslen (#108985)" (#131412)"
This reverts commit ac9049df7e62e2ca4dc5d103593b51639b5715e3.

This change broke Wine (causing it to hang on startup), see
https://github.com/llvm/llvm-project/pull/108985 for discussion.
2025-03-22 12:41:04 +02:00
Martin Storsjö
1fdd8cb91e Revert "[Transform] Clean up strlen loop idiom (#132421)"
This reverts commit 7c52886700a5a70d873400ec022a99d7dce8b03b.

Reverting this as I have to revert another preceding commit,
ac9049df7e62e2ca4dc5d103593b51639b5715e3.
2025-03-22 12:40:40 +02:00
Henry Jiang
7c52886700
[Transform] Clean up strlen loop idiom (#132421)
We should bailout of modifying the CFG if the library functions are not
emittable or disabled.
2025-03-21 19:10:25 -04:00
Henry Jiang
ac9049df7e
Reland "[Transforms] LoopIdiomRecognize recognize strlen and wcslen (#108985)" (#131412)
Relands https://github.com/llvm/llvm-project/pull/108985

This PR continues the effort made in
https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848
and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460
to extend `LoopIdiomRecognize` to find and replace loops of the form
```c
base = str;
while (*str)
  ++str;
```
and transforming the `strlen` loop idiom into the appropriate `strlen`
and `wcslen` library call which will give a small performance boost if
replaced.
```c
str = base + strlen(base)
len = str - base
```
2025-03-21 09:49:10 -04:00
Henry Jiang
5b2a8819fb
Revert "[Transforms] LoopIdiomRecognize recognize strlen and wcslen (#108985)" (#131405)
This reverts commit bf6357f0f51eccc48b92a130afb51c0280d56180.
2025-03-14 19:11:26 -04:00
Henry Jiang
bf6357f0f5
[Transforms] LoopIdiomRecognize recognize strlen and wcslen (#108985)
This PR continues the effort made in
https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848
and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460
to extend `LoopIdiomRecognize` to find and replace loops of the form
```c
base = str;
while (*str)
  ++str;
```
and transforming the `strlen` loop idiom into the appropriate `strlen`
and `wcslen` library call which will give a small performance boost if
replaced.
```c
str = base + strlen(base)
len = str - base
```

---------

Co-authored-by: Michael Kruse <github@meinersbur.de>
2025-03-14 18:56:34 -04:00
Jeremy Morse
792a6f8119
[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.

Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.

(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
2025-03-14 15:50:49 +00:00
Ricardo Jesus
5f84b6edd9
[AArch64] Add MATCH loops to LoopIdiomVectorizePass (#101976)
This patch adds a new loop to LoopIdiomVectorizePass, enabling it to
recognise and vectorise loops such as:
```cpp
template<class InputIt, class ForwardIt>
InputIt find_first_of(InputIt first, InputIt last,
                      ForwardIt s_first, ForwardIt s_last)
{
  for (; first != last; ++first)
    for (ForwardIt it = s_first; it != s_last; ++it)
      if (*first == *it)
        return first;
  return last;
}
```

These loops match the C++ standard library function `std::find_first_of`.
2025-02-10 08:23:34 +00:00
Alex Bradbury
79762a10e4 [test][LoopIidiom][NFC] Add --check-globals to several tests
This reduces the diff for upcoming changes. In some cases there were
already CHECK lines for the globals, but re-running update_test_check.py
deletes them without --check-globals being added. For
memset-pattern-tbaa.ll, the globals weren't checked but should have
been.
2025-02-05 05:07:08 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Paul Walker
56c091ea71
[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856)
This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.
2024-11-21 11:21:12 +00:00
Lee Wei
ead9ad2960
[llvm] Remove br i1 undef from some regression tests [NFC] (#116739)
This PR removes tests with br i1 undef under
`llvm/tests/Transforms/JumpThreading, LCSSA, LICM, LoopDeletion,
LoopIdiom`.
2024-11-19 08:12:25 +00:00
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Alex Voicu
5d734fa4c8
[llvm][SPIRV] Expose fast popcnt support for SPIR-V targets (#109845)
This adds the TTI predicate for conveying the availability of fast
`popcnt`, which subsequently allows passes like `LoopIdiomRecognize` to
collapse known popcount patterns. Since SPIR-V natively exposes
`OpBitcount`, it seems preferable to compress the resulting code, and
retain the information, even if a concrete target might have to expand
back into a loop structure.
2024-09-28 16:45:32 +01:00
wanglei
2ee7183e38
[LoongArch] Add TTI support for cpop with LSX
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106961
2024-09-06 15:48:14 +08:00
Simon Pilgrim
254da5ab8b [CostModel][X86] Add missing costkinds for scalar CTLZ/CTTZ instructions
Baed off worst case llvm-mca numbers for CTLZ/CTTZ(+ZERO_UNDEF) codegen

Prep work for #102885
2024-08-20 15:26:04 +01:00
Nikita Popov
c3c2370c9a [Tests] Regenerate test checks (NFC) 2024-08-06 12:59:55 +02:00
Hari Limaye
3bf83e3866
[LoopIdiom] Reland: Support 'shift until less-than' idiom #95002 (#98298)
The original patch failed to handle the case where the loopback
condition compared against a constant exceeding 64 bit unsigned range -
which caused a buildbot failure.

This PR fixes this and relands the original PR #95002.

The current loop idiom code for recognising and inserting a CTLZ
intrinsic does not support loops where the loopback control is based on
an unsigned less-than condition. This patch adds support for recognising
these loops and inserting a CTLZ intrinsic.

Fixes the missed optimization cases in #51064.

---------

Co-authored-by: David Sherwood <david.sherwood@arm.com>
2024-07-16 13:58:07 +01:00
Hari Limaye
ea39f97727
Revert "[LoopIdiom] Support 'shift until less-than' idiom (#95002)" (#98065)
Reverts #95002 while I investigate buildbot failure.

This reverts commit 83b01aaf51072a07261ee2e5fc14102f71273bc0.
2024-07-08 20:02:31 +01:00
Hari Limaye
83b01aaf51
[LoopIdiom] Support 'shift until less-than' idiom (#95002)
The current loop idiom code for recognising and inserting a CTLZ
intrinsic does not support loops where the loopback control is based on
an unsigned less-than condition. This patch adds support for recognising
these loops and inserting a CTLZ intrinsic.

Fixes the missed optimization cases in #51064

---------

Co-authored-by: David Sherwood <david.sherwood@arm.com>
2024-07-08 14:32:08 +01:00
Min-Yih Hsu
8b55d342b6
[RISCV][LoopIdiomVectorize] Support VP intrinsics in LoopIdiomVectorize (#94082)
Teach LoopIdiomVectorize to use VP intrinsics to replace the byte
compare loops. Right now only RISC-V uses LoopIdiomVectorize of this
style.
2024-07-02 18:48:28 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Min-Yih Hsu
37e309f163
[AArch64][LoopIdiom] Generalize AArch64LoopIdiomTransform into LoopIdiomVectorize (#94081)
To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V,
this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64
to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The
following patch (#94082) will teach LoopIdiomVectorize how to generate VP
intrinsics (in addition to the current masked vector style) in favor of
RVV.
2024-06-07 14:06:11 -07:00
Min-Yih Hsu
f6315a9572
[AArch64][LoopIdiom] Disable LoopIdiomTransform when NoImplicitFloat is present (#87677)
This behavior is aligned with both LoopVectorizer and SLPVectorizer.
2024-04-08 09:10:23 -07:00
paperchalice
29bf32efbb
[NewPM][AArch64] Add AArch64PassRegistry.def (#85215)
PR #83567 ports `SelectionDAGISel` to the new pass manager, then each
backend should provide `<Target>DagToDagISel()` in new pass manager
style. Then each target should provide `<Target>PassRegistry.def` to
register backend passes in `registerPassBuilderCallbacks` to reduce
duplicate code.
This PR adds `AArch64PassRegistry.def` to AArch64 backend and
boilerplate code in `registerPassBuilderCallbacks`.
2024-03-21 10:57:51 +08:00
paperchalice
44a81af510
[AArch64] Run LoopSimplifyPass in byte-compare-index.ll (#86053)
Make this test case work on both new and legacy pass manager. See also
#85215
2024-03-21 10:26:58 +08:00
Nikita Popov
07292b7203
[LIR][SCEVExpander] Restore original flags when aborting transform (#82362)
SCEVExpanderCleaner will currently remove instructions created by
SCEVExpander, but not restore poison generating flags that it may have
dropped. As such, running LIR can currently spuriously drop flags
without performing any transforms.

Fix this by keeping track of original instruction flags in SCEVExpander.

Fixes https://github.com/llvm/llvm-project/issues/82337.
2024-02-21 10:13:41 +01:00
Nikita Popov
fcd6549e58 [LIR] Add test for #82337 (NFC) 2024-02-20 14:42:40 +01:00
Nikita Popov
bec7181d5b [SCEVExpander] Don't use recursive expansion for ptr IV inc
Similar to the non-ptr case, directly create the getelementptr
instruction. Going through expandAddToGEP() no longer makes sense
with opaque pointers, where generating the necessary instruction
is trivial.

This avoids recursive expansion of (the SCEV of) StepV while the
IR is in an inconsistent state, in particular with an incomplete
IV phi node, which utilities may not be prepared to deal with.

Fixes https://github.com/llvm/llvm-project/issues/80954.
2024-02-07 11:27:26 +01:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
paperchalice
e390c229a4
[Pass] Add hyphen to some pass names (#74287)
Here is the list of the renamed passes:
- `callbrprepare` -> `callbr-prepare`
- `dwarfehprepare` -> `dwarf-eh-prepare`
- `flattencfg` -> `flatten-cfg`
- `loweratomic` -> `lower-atomic`
- `lowerinvoke` -> `lower-invoke`
- `lowerswitch` -> `lower-switch`
- `winehprepare` -> `win-eh-prepare`
- `targetir` -> `target-ir`
- `targetlibinfo` -> `target-lib-info`

Legacy passes are not affected.
2024-01-25 16:05:54 +08:00
David Sherwood
fca6992be1
[AArch64] Fix a minor issue with AArch64LoopIdiomTransform (#78136)
I found another case where in the end block we could have a PHI that we
deal with incorrectly. The two incoming values are unique - one of them
is
the induction variable and another one is a value defined outside the
loop, e.g.

  %final_val = phi i32 [ %inc, %while.body ], [ %d, %while.cond ]

We won't correctly select between the two values in the new end block
that
we create and so we will get the wrong result.
2024-01-17 14:30:06 +00:00
David Sherwood
ccaf9e0bc0
[AArch64] Enable AArch64 loop idiom transform pass (#77480)
Following on from

https://github.com/llvm/llvm-project/pull/72273

which added the new AArch64 loop idiom transformation pass, this patch
enables the pass by default for AArch64.
2024-01-10 10:03:14 +00:00
David Sherwood
2c651e6c38
[AArch64] Fix regression introduced by c7148467fc08eefaaae876c7d11d62… (#77467)
…9c849f42cf
2024-01-09 13:22:28 +00:00
David Sherwood
c7148467fc
[AArch64] Add an AArch64 pass for loop idiom transformations (#72273)
We have added a new pass that looks for loops such as the following:

```
  while (i != max_len)
      if (a[i] != b[i])
          break;

  ... use index i ...
```

Although similar to a memcmp, this is slightly different because instead
of returning the difference between the values of the first non-matching
pair of bytes, it returns the index of the first mismatch. As such, we
are not able to lower this to a memcmp call.

The new pass can now spot such idioms and transform them into a
specialised predicated loop that gives a significant performance
improvement for AArch64. It is intended as a stop-gap solution until
this can be handled by the vectoriser, which doesn't currently deal with
early exits.

This specialised loop makes use of a generic intrinsic that counts the
trailing zero elements in a predicate vector. This was added in
https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp
instructions.

Although we have added this pass only for AArch64, it was written in a
generic way so that in theory it could be used by other targets.
Currently the pass requires scalable vector support and needs to know
the minimum page size for the target, however it's possible to make it
work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts
intrinsic used by the pass has generic lowering, but can be made
efficient for targets with instructions similar to SVE's brkb, cntp and
incp.

Original version of patch was posted on Phabricator:

 https://reviews.llvm.org/D158291

Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David
Sherwood (@david-arm)

See the original discussion on Discourse:

https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383
2024-01-09 11:29:28 +00:00
Yingwei Zheng
2c2de4b20e
[ValueTracking] Remove SPF support from computeKnownBitsFromOperator (#76630)
This patch removes redundant SPF support
(5350e1b509)
from `computeKnownBitsFromOperator` as we always canonicalize a SPF into
an intrinsic call.

Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=3dc0638cfc19e140daff7bf1281648daca8212fa&to=8771ef0749fb2ba4304dc68d418c88ec5769346f&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
-0.01%|-0.01%|+0.01%|+0.00%|+0.01%|+0.04%|-0.01%|
2023-12-31 04:38:18 +08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00