104 Commits

Author SHA1 Message Date
Yingwei Zheng
84b31581f8
Revert "[PatternMatch] Add m_[Shift]OrSelf matchers." (#152953)
Reverts llvm/llvm-project#152924
According to
f67668b586,
it is not an NFC change.
2025-08-11 09:35:16 +02:00
Yingwei Zheng
1c499351d6
[PatternMatch] Add m_[Shift]OrSelf matchers. (#152924)
Address the comment
https://github.com/llvm/llvm-project/pull/147414/files#r2228612726.
As they are usually used to match integer packing patterns, it is enough
to handle constant shamts.
2025-08-11 09:58:16 +08:00
David Green
8f968fe3ec
[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896)
Similar to #150639 this fixes the AggressiveInstCombine fold for convert
tables to cttz instructions if the gep types are not array types. i.e
`gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.
2025-07-31 16:53:55 +01:00
Nikita Popov
b17f4d3366
[AggressiveInstCombine] Use AA during store merge (#149992)
This is a small extension of #147540, resolving one of the FIXMEs.
Instead of bailing out on any instruction that may read/write memory,
use AA to check whether it can alias the stored parts. Do this using a
crude check based on the underlying object only.

This pattern occurs rarely in practice, but at the same time it also doesn't
seem to add any compile-time cost, so it's probably worth handling.
2025-07-23 10:09:58 +02:00
Nikita Popov
07527596f3
[AggressiveInstCombine] Support store merge with non-consecutive parts (#149807)
This is a minor extension of #147540, resolving one of the FIXMEs. If
the collected parts contain some non-consecutive elements, we can still
handle smaller ranges that *are* consecutive.

This is not common in practice and mostly shows up when the same value
is stored at two different offsets.
2025-07-22 10:15:04 +02:00
Nikita Popov
00d3b39f17
[AggressiveInstCombine] Implement store merge optimization (#147540)
Merge multiple small stores that were originally extracted from one
value into a single store.

This is the store equivalent of the load merge optimization that
AggressiveInstCombine already performs.

This implementation is something of an MVP, with various generalizations
possible.

Fixes https://github.com/llvm/llvm-project/issues/147456.
2025-07-21 10:03:26 +02:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
Craig Topper
96336b2330
[AggressiveInstCombine] Improve popcount matching if the input has known zero bits (#142501)
If the input has known zero bits, InstCombine may have simplied one
of the expected And masks. Teach AggressiveInstCombine to use
MaskedValueIsZero to recover these missing bits.
    
Fixes #142042.
2025-06-03 12:56:53 -07:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Ramkumar Ramachandra
111ac69876
[AggressiveInstCombine] Check GEP nusw, not inbounds (#139708) 2025-05-22 17:43:29 +01:00
David Green
671cef029f
[AggressiveInstcombine] Fold away shift in or reduction chain. (#137875)
If we have `icmp eq or(a, shl(b)), 0` then the shift can be removed so
long as it is nuw or nsw. It is still comparing that some bits are
non-zero.
https://alive2.llvm.org/ce/z/nhrBVX.

This is also true of ne, and true for longer or chains.
2025-05-13 10:33:38 +01:00
Stephen Tozer
19ee7ffdac
[AggrInstCombine][DebugInfo] Propagate DILocation for inlined memchr (#134808)
When AggressiveInstCombine replaces a memchr with a switch instruction,
it currently drops the DILocation for that memchr. This patch changes
this, propagating the memchr DILocation to all the generated
instructions that replace it.

Found using https://github.com/llvm/llvm-project/pull/107279.
2025-04-08 18:54:35 +01:00
Zhenyang Xu
a98707e285
[AggressiveInstCombine] Merge consecutive loads of mixed sizes (#129263)
Proof: https://alive2.llvm.org/ce/z/r7M-Sf
Closes: #128134
2025-03-05 18:56:00 +08:00
Yingwei Zheng
a77346bad0
[IRBuilder] Refactor FMF interface (#121657)
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.

This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2025-01-06 14:37:04 +08:00
Alex MacLean
56e944bede
[NFC] add anonymous namespace to a couple classes (#121511)
This ensures these classes are visible only to the appropriate
translation unit and allows for more optimizations.
2025-01-02 20:13:18 -08:00
Antonio Frighetto
f68b0e3699 [AggressiveInstCombine] Use APInt and avoid truncation when folding loads
A miscompilation issue has been addressed with improved handling.

Fixes: https://github.com/llvm/llvm-project/issues/118467.
2024-12-04 10:20:14 +01:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jeremy Morse
96f37ae453
[NFC] Use initial-stack-allocations for more data structures (#110544)
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.

Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
2024-09-30 23:15:18 +01:00
Stephen Tozer
40d6497a97
[DebugInfo] Transfer strcmp DILocation to generated inline code (#108531)
When AggressiveInstCombine inlines a strcmp call, we currently copy the
strcmp's DILocation only to the br instruction that jumps to the inline
code. While this is roughly analogous to the original call, it leaves
the generated code without any source location, which is precarious for
a memory operation. This patch copies the strcmp call's DILocation to
all the generated code.

An alternative solution would be to generate a new DILocation with a
line 0 location and an inlinedAt pointing to the original call location,
but this would still give limited attribution to the generated code
without traversing the DIE, whereas the submitted solution allows
attribution with just the line table; even though it would be
technically more accurate, pragmatically I believe that copying the
call's location will be more useful for users.
2024-09-23 15:39:44 +01:00
Yingwei Zheng
62e9f40949
[PatternMatch] Use m_SpecificCmp matchers. NFC. (#100878)
Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u
baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3
```
Top 5 improvements:
  stockfish/movegen.ll 2541620819 2538599412 -0.12%
  minetest/profiler.cpp.ll 431724935 431246500 -0.11%
  abc/luckySwap.c.ll 581173720 580581935 -0.10%
  abc/kitTruth.c.ll 2521936288 2519445570 -0.10%
  abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10%
Top 5 regressions:
  openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08%
  openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08%
  spike/vsm4r_vv.ll 1296430080 1297039258 +0.05%
  spike/vsm4r_vs.ll 1312496906 1313093460 +0.05%
  nuttx/lib_rand48.c.ll 126201233 126246692 +0.04%
Overall: -0.02112308%
```
2024-07-29 10:04:06 +08:00
Yingwei Zheng
f58cfacfaf
[AggressiveInstCombine] Expand memchr with small constant strings (#98501)
This patch converts memchr with a small constant string into a switch.
It will reduce overhead of libcall and enable more folds (e.g.,
comparing the result with null).

References: https://en.cppreference.com/w/c/string/byte/memchr
2024-07-17 00:25:36 +08:00
Nikita Popov
74deadf196
[IRBuilder] Don't include Module.h (NFC) (#97159)
This used to be necessary to fetch the DataLayout, but isn't anymore.
2024-06-29 15:05:04 +02:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Franklin Zhang
1241e7692a
[AggressiveInstCombine] Fix strncmp inlining (#91204)
Fix the issue that `char` constants are converted to `uint64_t` in the
wrong way when doing the inlining.
2024-05-06 22:04:35 +08:00
Franklin Zhang
6b948705a0
[AggressiveInstCombine] Inline strcmp/strncmp (#89371)
Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and
exactly one of s1 and s2 are constant.

For example:

```c
int res = strcmp(s, "ab");
```

is converted to

```c
int res = (int)s[0] - (int)'a';
if (res != 0)
  goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
  goto END;
res = (int)s[2] - (int)'\0';
END:
```

Ported from a similar gcc feature [Inline strcmp with small constant
strings](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809).
2024-05-03 13:24:38 +09:00
Yingwei Zheng
930996e9e4
[ValueTracking][NFC] Pass SimplifyQuery to computeKnownFPClass family (#80657)
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.

Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
  %i32 = bitcast float %x to i32
  %cmp = icmp slt i32 %i32, 0
  br i1 %cmp, label %if.then1, label %if.else

if.then1:
  %fneg = fneg float %x
  br label %if.end

if.else:
  br i1 %cond, label %if.then2, label %if.end

if.then2:
  br label %if.end

if.end:
  %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
  %ret = call float @llvm.fabs.f32(float %value)
  ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
2024-02-06 02:30:12 +08:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Mikael Holmen
ce0a750fe4 [AggressiveInstCombine] Ignore debug instructions when load combining (#70200)
We previously included debug instructions when counting instructions when
looking for loads to combine. This meant that the presence of debug
instructions could affect optimization, as shown in the updated testcase.

This fixes #69925.
2023-10-26 09:58:54 +02:00
Craig Topper
e9e458418f [AggressiveInstCombine] Improve line breaks in comment. NFC
The comments contain IR where some instructions don't fit in
80 columns. The extra part of the line was placed in front of
the next IR instruction instead of on its own line.
2023-08-25 10:08:23 -07:00
Alexander Kornienko
0b779b0daa Revert "[AggressiveInstCombine] Fold strcmp for short string literals"
This reverts commit 5dde755188e34c0ba5304365612904476c8adfda,
cbfcf90152de5392a36d0a0241eef25f5e159eef and
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 due to a miscompile introduced in
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 (see
https://reviews.llvm.org/D154725#4568845 for details)

Differential Revision: https://reviews.llvm.org/D157430
2023-08-08 22:53:45 +02:00
Maksim Kita
5dde755188 [AggressiveInstCombine][NFC] Fix typo
AggressiveInstCombine fix typo in expandStrcmp method.

Differential Revision: https://reviews.llvm.org/D156556
2023-08-07 21:51:44 +03:00
David Green
aa97f6b494 [AIC] Fix the sext cost operands in tryToFPToSat
As pointed out in D125755 the operand of a call to getCastInstrCost had the Src
and Dst the wrong way around.

Differential Revision: https://reviews.llvm.org/D154841
2023-08-07 09:33:18 +01:00
Maksim Kita
cbfcf90152 [AggressiveInstCombine] Fold strcmp for short string literals with size 2
Fold strcmp for short string literals with size 2.
Depends D155742.

Differential Revision: https://reviews.llvm.org/D155743
2023-07-27 18:45:21 +03:00
Maksim Kita
8981520b19 [AggressiveInstCombine] Fold strcmp for short string literals
Fold strcmp() against 1-char string literals.

This designates AggressiveInstCombine as the pass for libcalls
simplifications that may need to change the control flow graph.

Fixes https://github.com/llvm/llvm-project/issues/58003.

Differential Revision: https://reviews.llvm.org/D154725
2023-07-19 17:12:27 +02:00
Matt Arsenault
6640df94f9 ValueTracking: Remove CannotBeOrderedLessThanZero
Replace the last user of CannotBeOrderedLessThanZero with new
version. Makes assumes work in this case.
2023-07-11 20:42:18 -04:00
Youngsuk Kim
d22a236ae7 [llvm] Replace use of Type::getPointerTo() (NFC)
Partial progress towards replacing in-tree uses of
`Type::getPointerTo()`.

If `getPointerTo()` is used solely to support an unnecessary bitcast,
remove the bitcast.

Reviewed By: barannikov88, nikic

Differential Revision: https://reviews.llvm.org/D153307
2023-06-23 22:32:29 -04:00
bipmis
cbc50ba12e [AggressiveInstCombine] Handle the nested GEP/BitCast scenario in Load Merge.
This seems to be an issue currently where there are nested/chained GEP/BitCast Pointers.
The patch generates a new GEP for the wider load to avoid dominance problems.

Differential Revision: https://reviews.llvm.org/D150864
2023-05-24 10:36:11 +01:00
Matt Arsenault
86d0b524f3 ValueTracking: Expand signature of isKnownNeverInfinity/NaN
This is in preparation for replacing the implementation
with a wrapper around computeKnownFPClass.
2023-05-16 20:42:58 +01:00
khei4
39a0677784 [AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP-indices
Differential Revision: https://reviews.llvm.org/D146622
    Fixes https://github.com/llvm/llvm-project/issues/61615
    Reviewed By: nikic
2023-05-12 19:02:28 +09:00
Jordan Rupprecht
e08c397a88 Revert "[AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP-indices Differential Revision: https://reviews.llvm.org/D146622 Fixes https://github.com/llvm/llvm-project/issues/61615"
This reverts commit 0574a4be879e07b48ba9be8d63eebba49a04dfe8. It causes a compiler crash due to a div by zero.
2023-05-09 10:38:46 -07:00
khei4
0574a4be87 [AggressiveInstCombine] folding load for constant global patterened arrays and structs by GEP-indices Differential Revision: https://reviews.llvm.org/D146622 Fixes https://github.com/llvm/llvm-project/issues/61615 2023-05-09 23:22:21 +09:00
Arthur Eubanks
ed443d81d1 [AggressiveInstCombine] Only fold consecutive shifts of loads with constant shift amounts
This is what the code assumed but never actually checked.

Fixes https://github.com/llvm/llvm-project/issues/62509.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149896
2023-05-04 13:52:25 -07:00
khei4
cde1c3c014 [AggressiveInstCombine] use m_Deferred on funnel shift(NFC)
Differential Revision: https://reviews.llvm.org/D146798
Reviewed By: nikic
2023-03-24 21:49:02 +09:00
khei4
434b0badb5 [AggressiveInstCombine] folding load for constant global patterened arrays and structs by alignment
Differential Revision: https://reviews.llvm.org/D144445
Reviewed By: nikic

fix: wrong arrow
2023-03-23 23:31:22 +09:00
chenglin.bi
76df706bca Revert "[LogicCombine 1/?] Implement a general way to simplify logical operations."
This reverts commit 97dcbea63e11d566cff0cd3a758cf1114cf1f633.
2023-03-14 09:00:06 +08:00
chenglin.bi
97dcbea63e [LogicCombine 1/?] Implement a general way to simplify logical operations.
This patch involves boolean ring to simplify logical operations. We can treat `&` as ring multiplication and `^` as ring addition.
So we need to canonicalize all other operations to `*` `+`. Like:
```
a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b
```
In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.

For now, the initial patch only enabled on and/or/xor,  Later we can enhance the code step by step.

Reference: https://en.wikipedia.org/wiki/Boolean_ring

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D142803
2023-03-02 20:46:16 +08:00
Paul Walker
f53234cbfd [AggressiveInstCombine] Fix invalid TypeSize conversion when combining loads.
Much of foldLoadsRecursive relies on knowing the size of loaded
data, which is not possible for scalable vector types.  However,
the logic of combining two small loads into one bigger load does
not apply for vector types so rather than converting the algorithm
to use TypeSize I've simply added an early exit for vectors.

Fixes #59510

Differential Revision: https://reviews.llvm.org/D140106
2022-12-17 15:34:27 +00:00