279 Commits

Author SHA1 Message Date
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Alex Voicu
5d734fa4c8
[llvm][SPIRV] Expose fast popcnt support for SPIR-V targets (#109845)
This adds the TTI predicate for conveying the availability of fast
`popcnt`, which subsequently allows passes like `LoopIdiomRecognize` to
collapse known popcount patterns. Since SPIR-V natively exposes
`OpBitcount`, it seems preferable to compress the resulting code, and
retain the information, even if a concrete target might have to expand
back into a loop structure.
2024-09-28 16:45:32 +01:00
wanglei
2ee7183e38
[LoongArch] Add TTI support for cpop with LSX
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106961
2024-09-06 15:48:14 +08:00
Simon Pilgrim
254da5ab8b [CostModel][X86] Add missing costkinds for scalar CTLZ/CTTZ instructions
Baed off worst case llvm-mca numbers for CTLZ/CTTZ(+ZERO_UNDEF) codegen

Prep work for #102885
2024-08-20 15:26:04 +01:00
Nikita Popov
c3c2370c9a [Tests] Regenerate test checks (NFC) 2024-08-06 12:59:55 +02:00
Hari Limaye
3bf83e3866
[LoopIdiom] Reland: Support 'shift until less-than' idiom #95002 (#98298)
The original patch failed to handle the case where the loopback
condition compared against a constant exceeding 64 bit unsigned range -
which caused a buildbot failure.

This PR fixes this and relands the original PR #95002.

The current loop idiom code for recognising and inserting a CTLZ
intrinsic does not support loops where the loopback control is based on
an unsigned less-than condition. This patch adds support for recognising
these loops and inserting a CTLZ intrinsic.

Fixes the missed optimization cases in #51064.

---------

Co-authored-by: David Sherwood <david.sherwood@arm.com>
2024-07-16 13:58:07 +01:00
Hari Limaye
ea39f97727
Revert "[LoopIdiom] Support 'shift until less-than' idiom (#95002)" (#98065)
Reverts #95002 while I investigate buildbot failure.

This reverts commit 83b01aaf51072a07261ee2e5fc14102f71273bc0.
2024-07-08 20:02:31 +01:00
Hari Limaye
83b01aaf51
[LoopIdiom] Support 'shift until less-than' idiom (#95002)
The current loop idiom code for recognising and inserting a CTLZ
intrinsic does not support loops where the loopback control is based on
an unsigned less-than condition. This patch adds support for recognising
these loops and inserting a CTLZ intrinsic.

Fixes the missed optimization cases in #51064

---------

Co-authored-by: David Sherwood <david.sherwood@arm.com>
2024-07-08 14:32:08 +01:00
Min-Yih Hsu
8b55d342b6
[RISCV][LoopIdiomVectorize] Support VP intrinsics in LoopIdiomVectorize (#94082)
Teach LoopIdiomVectorize to use VP intrinsics to replace the byte
compare loops. Right now only RISC-V uses LoopIdiomVectorize of this
style.
2024-07-02 18:48:28 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Min-Yih Hsu
37e309f163
[AArch64][LoopIdiom] Generalize AArch64LoopIdiomTransform into LoopIdiomVectorize (#94081)
To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V,
this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64
to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The
following patch (#94082) will teach LoopIdiomVectorize how to generate VP
intrinsics (in addition to the current masked vector style) in favor of
RVV.
2024-06-07 14:06:11 -07:00
Min-Yih Hsu
f6315a9572
[AArch64][LoopIdiom] Disable LoopIdiomTransform when NoImplicitFloat is present (#87677)
This behavior is aligned with both LoopVectorizer and SLPVectorizer.
2024-04-08 09:10:23 -07:00
paperchalice
29bf32efbb
[NewPM][AArch64] Add AArch64PassRegistry.def (#85215)
PR #83567 ports `SelectionDAGISel` to the new pass manager, then each
backend should provide `<Target>DagToDagISel()` in new pass manager
style. Then each target should provide `<Target>PassRegistry.def` to
register backend passes in `registerPassBuilderCallbacks` to reduce
duplicate code.
This PR adds `AArch64PassRegistry.def` to AArch64 backend and
boilerplate code in `registerPassBuilderCallbacks`.
2024-03-21 10:57:51 +08:00
paperchalice
44a81af510
[AArch64] Run LoopSimplifyPass in byte-compare-index.ll (#86053)
Make this test case work on both new and legacy pass manager. See also
#85215
2024-03-21 10:26:58 +08:00
Nikita Popov
07292b7203
[LIR][SCEVExpander] Restore original flags when aborting transform (#82362)
SCEVExpanderCleaner will currently remove instructions created by
SCEVExpander, but not restore poison generating flags that it may have
dropped. As such, running LIR can currently spuriously drop flags
without performing any transforms.

Fix this by keeping track of original instruction flags in SCEVExpander.

Fixes https://github.com/llvm/llvm-project/issues/82337.
2024-02-21 10:13:41 +01:00
Nikita Popov
fcd6549e58 [LIR] Add test for #82337 (NFC) 2024-02-20 14:42:40 +01:00
Nikita Popov
bec7181d5b [SCEVExpander] Don't use recursive expansion for ptr IV inc
Similar to the non-ptr case, directly create the getelementptr
instruction. Going through expandAddToGEP() no longer makes sense
with opaque pointers, where generating the necessary instruction
is trivial.

This avoids recursive expansion of (the SCEV of) StepV while the
IR is in an inconsistent state, in particular with an incomplete
IV phi node, which utilities may not be prepared to deal with.

Fixes https://github.com/llvm/llvm-project/issues/80954.
2024-02-07 11:27:26 +01:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
paperchalice
e390c229a4
[Pass] Add hyphen to some pass names (#74287)
Here is the list of the renamed passes:
- `callbrprepare` -> `callbr-prepare`
- `dwarfehprepare` -> `dwarf-eh-prepare`
- `flattencfg` -> `flatten-cfg`
- `loweratomic` -> `lower-atomic`
- `lowerinvoke` -> `lower-invoke`
- `lowerswitch` -> `lower-switch`
- `winehprepare` -> `win-eh-prepare`
- `targetir` -> `target-ir`
- `targetlibinfo` -> `target-lib-info`

Legacy passes are not affected.
2024-01-25 16:05:54 +08:00
David Sherwood
fca6992be1
[AArch64] Fix a minor issue with AArch64LoopIdiomTransform (#78136)
I found another case where in the end block we could have a PHI that we
deal with incorrectly. The two incoming values are unique - one of them
is
the induction variable and another one is a value defined outside the
loop, e.g.

  %final_val = phi i32 [ %inc, %while.body ], [ %d, %while.cond ]

We won't correctly select between the two values in the new end block
that
we create and so we will get the wrong result.
2024-01-17 14:30:06 +00:00
David Sherwood
ccaf9e0bc0
[AArch64] Enable AArch64 loop idiom transform pass (#77480)
Following on from

https://github.com/llvm/llvm-project/pull/72273

which added the new AArch64 loop idiom transformation pass, this patch
enables the pass by default for AArch64.
2024-01-10 10:03:14 +00:00
David Sherwood
2c651e6c38
[AArch64] Fix regression introduced by c7148467fc08eefaaae876c7d11d62… (#77467)
…9c849f42cf
2024-01-09 13:22:28 +00:00
David Sherwood
c7148467fc
[AArch64] Add an AArch64 pass for loop idiom transformations (#72273)
We have added a new pass that looks for loops such as the following:

```
  while (i != max_len)
      if (a[i] != b[i])
          break;

  ... use index i ...
```

Although similar to a memcmp, this is slightly different because instead
of returning the difference between the values of the first non-matching
pair of bytes, it returns the index of the first mismatch. As such, we
are not able to lower this to a memcmp call.

The new pass can now spot such idioms and transform them into a
specialised predicated loop that gives a significant performance
improvement for AArch64. It is intended as a stop-gap solution until
this can be handled by the vectoriser, which doesn't currently deal with
early exits.

This specialised loop makes use of a generic intrinsic that counts the
trailing zero elements in a predicate vector. This was added in
https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp
instructions.

Although we have added this pass only for AArch64, it was written in a
generic way so that in theory it could be used by other targets.
Currently the pass requires scalable vector support and needs to know
the minimum page size for the target, however it's possible to make it
work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts
intrinsic used by the pass has generic lowering, but can be made
efficient for targets with instructions similar to SVE's brkb, cntp and
incp.

Original version of patch was posted on Phabricator:

 https://reviews.llvm.org/D158291

Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David
Sherwood (@david-arm)

See the original discussion on Discourse:

https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383
2024-01-09 11:29:28 +00:00
Yingwei Zheng
2c2de4b20e
[ValueTracking] Remove SPF support from computeKnownBitsFromOperator (#76630)
This patch removes redundant SPF support
(5350e1b509)
from `computeKnownBitsFromOperator` as we always canonicalize a SPF into
an intrinsic call.

Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=3dc0638cfc19e140daff7bf1281648daca8212fa&to=8771ef0749fb2ba4304dc68d418c88ec5769346f&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
-0.01%|-0.01%|+0.01%|+0.00%|+0.01%|+0.04%|-0.01%|
2023-12-31 04:38:18 +08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Jeremy Morse
d2d9dc8eb4
[DebugInfo][RemoveDIs] Make debugify pass convert to/from RemoveDIs mode (#73251)
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.

Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
2023-11-29 13:19:50 +00:00
Philip Reames
f8742b8d6a
[SCEV] Teach SCEVExpander to use zext nneg when possible (#70815)
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
2023-10-31 09:33:07 -07:00
Philip Reames
6485978120 Refresh a couple of auto-gen tests [nfc]
Reducing spurious diff in an upcoming review.
2023-10-31 07:46:01 -07:00
Nikita Popov
97f1db2fdd [LoopIdimo] Use tryZExtValue() instead of getZExtValue()
To avoid an assertion for large BECounts.

I also suspect that this code is missing an overflow check.

Fixes https://github.com/llvm/llvm-project/issues/70008.
2023-10-24 11:05:42 +02:00
Jeremy Morse
1ce1732f82 [DebugInfo] Use getStableDebugLoc to pick IRBuilder DebugLocs
When IRBuilder is given an insertion position and there is debug-info, it
sets the DebugLoc of newly inserted instructions to the DebugLoc of the
insertion position. Unfortunately, that means if you insert in front of a
debug intrinsics, your "real" instructions get potentially-misleading
source locations from the debug intrinsics. Worse, if you compile -gmlt to
get source locations but no variable locations, you'll get different source
locations to a normal -g build, which is silly.

Rectify this with the getStableDebugLoc method, which skips over any debug
intrinsics to find the next "real" instruction. This is the source location
that you would get if you compile with -gmlt, and it remains stable in the
presence of debug intrinsics. The changed tests show a few locations where
this has been happening, for example selecting line-zero locations for
instrumentation on a perfectly valid call site.

Differential Revision: https://reviews.llvm.org/D159485
2023-09-11 19:00:44 +01:00
Nikita Popov
69bd66b3ce [Tests] Remove some and/or constant expressions in tests (NFC)
In preparation for their removal in D158081.
2023-08-21 12:05:32 +02:00
Nikita Popov
174300a283 [LoopIdiom] Regenerate test checks (NFC) 2023-07-21 10:12:05 +02:00
William S. Moses
3eb6fefb97 [LoopIdiom] Preserve alias information for memset_pattern
TBAA/NoAlias/AliasScope and other information is currently preserved
when upgrading to a memcpy/memset. However, this is missing when upgrading to
the macOS memset_pattern function. This adds the same alias information preservation
to memset_pattern

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D152934
2023-06-14 16:14:53 -04:00
luxufan
e9ddb584e8 [LoopIdiom] Freeze BitPos if !isGuaranteedNotToBeUndefOrPoison
Fixes: https://github.com/llvm/llvm-project/issues/62873

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151690
2023-06-07 14:50:22 +08:00
Nikita Popov
d5c56c5162 [SCEVExpander] Remember phi nodes inserted by LCSSA construction
SCEVExpander keeps track of all instructions it inserted. However,
it currently misses some phi nodes created during LCSSA construction.
Fix this by collecting these into another argument.

This also removes the IRBuilder argument, which was added for
essentially the same purpose, but only handles the root LCSSA nodes,
not those inserted by SSAUpdater.

This was reported as a regression on D149344, but the reduced test
case also reproduces without it.

Differential Revision: https://reviews.llvm.org/D150681
2023-05-25 09:34:19 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
OCHyams
72776850ed Revert "[DebugInfo] Print empty MDTuples wrapped in MetadataAsValue inline"
This reverts commit 1e6fe677f8aa98518e05218affa16e468819f5ed (D140900).

Buildbot: https://lab.llvm.org/buildbot/#/builders/196/builds/29937
2023-04-25 14:37:25 +01:00
OCHyams
1e6fe677f8 [DebugInfo] Print empty MDTuples wrapped in MetadataAsValue inline
This improves the readability of debugging intrinsics. Instead of:

    call void @llvm.dbg.value(metadata !2, ...)
    !2 = !{}

We will see:

    call void @llvm.dbg.value(metadata !{}, ...)
    !2 = !{}

Note that we still get a numbered metadata entry for the node even if it's not
used elsewhere. This is to avoid adding more context to the print functions.

This is already legal IR - LLVM can parse and understand it - so there is no
need to update the parser.

The next patches in this stack will make such empty metadata operands more
common and semantically important.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140900
2023-04-25 14:13:47 +01:00
Craig Topper
8bba57b1f1 [LoopIdiomRecognize] Remove NUW flag from SCEV in getTripCount.
Based on the conversation in D147355.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148170
2023-04-13 11:58:10 -07:00
Tim Northover
150595ab4b LoopIdiom: avoid patterned memset if constant is not relocatable.
The pattern we're using for the memset_pattern* call gets put into a static
global variable initialized, which means it has to be representable with
relocations on the target. Most `ConstantExpr` instances do not satisfy this
constraint, so avoid all of them for now.
2023-01-12 18:53:07 +00:00
Nikita Popov
7a752e8108 [LoopIdiom] Convert tests to opaque pointers (NFC)
The differences here are due to SCEVExpander producing GEPs with
explicit offset calculation, a known difference with opaque pointers.
2023-01-06 11:36:37 +01:00
Nikita Popov
89f1876b61 [LoopIdiom] Name instructions in test (NFC) 2023-01-06 11:07:57 +01:00
Nikita Popov
055fb7795a [Transforms] Convert some tests to opaque pointers (NFC)
These are all tests where conversion worked automatically, and
required no manual fixup.
2023-01-05 12:43:45 +01:00
Roman Lebedev
45fcdaf6b6
[NFC] Port all LoopIdiom tests to -passes= syntax 2022-12-08 02:38:46 +03:00
Roman Lebedev
48c6b2729e
[NFC] Port all LoopIdiom tests to -passes= syntax 2022-12-07 23:15:16 +03:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Simon Pilgrim
37dc4373aa [LoopIdiom] Add non-LZCNT target test coverage 2022-09-19 18:13:11 +01:00
Simon Pilgrim
6b4d409f69 [CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 17:37:58 +01:00
Simon Pilgrim
95c2c9c5c5 [LoopIdiom][X86] Add non-LZCNT test coverage to 'rshift until zero' idiom tests 2022-09-16 17:23:54 +01:00