Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
When AggressiveInstCombine inlines a strcmp call, we currently copy the
strcmp's DILocation only to the br instruction that jumps to the inline
code. While this is roughly analogous to the original call, it leaves
the generated code without any source location, which is precarious for
a memory operation. This patch copies the strcmp call's DILocation to
all the generated code.
An alternative solution would be to generate a new DILocation with a
line 0 location and an inlinedAt pointing to the original call location,
but this would still give limited attribution to the generated code
without traversing the DIE, whereas the submitted solution allows
attribution with just the line table; even though it would be
technically more accurate, pragmatically I believe that copying the
call's location will be more useful for users.
This patch converts memchr with a small constant string into a switch.
It will reduce overhead of libcall and enable more folds (e.g.,
comparing the result with null).
References: https://en.cppreference.com/w/c/string/byte/memchr
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and
exactly one of s1 and s2 are constant.
For example:
```c
int res = strcmp(s, "ab");
```
is converted to
```c
int res = (int)s[0] - (int)'a';
if (res != 0)
goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
goto END;
res = (int)s[2] - (int)'\0';
END:
```
Ported from a similar gcc feature [Inline strcmp with small constant
strings](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809).
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
We previously included debug instructions when counting instructions when
looking for loads to combine. This meant that the presence of debug
instructions could affect optimization, as shown in the updated testcase.
This fixes#69925.
The comments contain IR where some instructions don't fit in
80 columns. The extra part of the line was placed in front of
the next IR instruction instead of on its own line.
This reverts commit 5dde755188e34c0ba5304365612904476c8adfda,
cbfcf90152de5392a36d0a0241eef25f5e159eef and
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 due to a miscompile introduced in
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 (see
https://reviews.llvm.org/D154725#4568845 for details)
Differential Revision: https://reviews.llvm.org/D157430
As pointed out in D125755 the operand of a call to getCastInstrCost had the Src
and Dst the wrong way around.
Differential Revision: https://reviews.llvm.org/D154841
Partial progress towards replacing in-tree uses of
`Type::getPointerTo()`.
If `getPointerTo()` is used solely to support an unnecessary bitcast,
remove the bitcast.
Reviewed By: barannikov88, nikic
Differential Revision: https://reviews.llvm.org/D153307
This seems to be an issue currently where there are nested/chained GEP/BitCast Pointers.
The patch generates a new GEP for the wider load to avoid dominance problems.
Differential Revision: https://reviews.llvm.org/D150864
This patch involves boolean ring to simplify logical operations. We can treat `&` as ring multiplication and `^` as ring addition.
So we need to canonicalize all other operations to `*` `+`. Like:
```
a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b
```
In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.
For now, the initial patch only enabled on and/or/xor, Later we can enhance the code step by step.
Reference: https://en.wikipedia.org/wiki/Boolean_ring
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D142803
Much of foldLoadsRecursive relies on knowing the size of loaded
data, which is not possible for scalable vector types. However,
the logic of combining two small loads into one bigger load does
not apply for vector types so rather than converting the algorithm
to use TypeSize I've simply added an early exit for vectors.
Fixes#59510
Differential Revision: https://reviews.llvm.org/D140106
This patch updates the load insert point of the merged load in AggressiveInstCombine().
This is done to handle the reported test breaks by handling Alias Analysis correctly.
Differential Revision: https://reviews.llvm.org/D137201
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
As part of legacy PM optimization pipeline removal.
This shouldn't be used in codegen pipelines so it should be ok to remove.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D137116
This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads.
This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases.
Differential Revision: https://reviews.llvm.org/D137333
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
In the case of non-opaque pointers, when combining consecutive loads,
need to bitcast the pointer source to the combined type size, otherwise
asserts are triggered.
Differential Revision: https://reviews.llvm.org/D135249
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Fix the error reported on reverse load merge.
Differential Revision: https://reviews.llvm.org/D127392
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Differential Revision: https://reviews.llvm.org/D127392
The bug reported on the [0] has been fixed.
The issue was we have not checked if the global variables that
represent cttz tables was constant.
There is a new negative test added in negative-lower-table-based-cttz.ll
that represents this.
[0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b
This reverts commit 053841c5624ca7eacd108a26071d8a1cefe1bebd.
We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.