When AggressiveInstCombine inlines a strcmp call, we currently copy the
strcmp's DILocation only to the br instruction that jumps to the inline
code. While this is roughly analogous to the original call, it leaves
the generated code without any source location, which is precarious for
a memory operation. This patch copies the strcmp call's DILocation to
all the generated code.
An alternative solution would be to generate a new DILocation with a
line 0 location and an inlinedAt pointing to the original call location,
but this would still give limited attribution to the generated code
without traversing the DIE, whereas the submitted solution allows
attribution with just the line table; even though it would be
technically more accurate, pragmatically I believe that copying the
call's location will be more useful for users.
Most of these bring the costs in line with the code generation. The f16 costs
without FullFP16 are usually converted to f32. Extended v2f32->v2f64 vectors
similarly use fcvtl + fcvt. As a backup we use the costs similar to the target
independent code, which should give a relatively high cost.
This patch converts memchr with a small constant string into a switch.
It will reduce overhead of libcall and enable more folds (e.g.,
comparing the result with null).
References: https://en.cppreference.com/w/c/string/byte/memchr
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and
exactly one of s1 and s2 are constant.
For example:
```c
int res = strcmp(s, "ab");
```
is converted to
```c
int res = (int)s[0] - (int)'a';
if (res != 0)
goto END;
res = (int)s[1] - (int)'b';
if (res != 0)
goto END;
res = (int)s[2] - (int)'\0';
END:
```
Ported from a similar gcc feature [Inline strcmp with small constant
strings](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809).
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
We previously included debug instructions when counting instructions when
looking for loads to combine. This meant that the presence of debug
instructions could affect optimization, as shown in the updated testcase.
This fixes#69925.
This reverts commit 5dde755188e34c0ba5304365612904476c8adfda,
cbfcf90152de5392a36d0a0241eef25f5e159eef and
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 due to a miscompile introduced in
8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 (see
https://reviews.llvm.org/D154725#4568845 for details)
Differential Revision: https://reviews.llvm.org/D157430
As pointed out in D125755 the operand of a call to getCastInstrCost had the Src
and Dst the wrong way around.
Differential Revision: https://reviews.llvm.org/D154841
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
This seems to be an issue currently where there are nested/chained GEP/BitCast Pointers.
The patch generates a new GEP for the wider load to avoid dominance problems.
Differential Revision: https://reviews.llvm.org/D150864
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This patch involves boolean ring to simplify logical operations. We can treat `&` as ring multiplication and `^` as ring addition.
So we need to canonicalize all other operations to `*` `+`. Like:
```
a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b
```
In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.
For now, the initial patch only enabled on and/or/xor, Later we can enhance the code step by step.
Reference: https://en.wikipedia.org/wiki/Boolean_ring
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D142803
Address the dominating condition, the urem fold is benefit from the analytics improvements.
Fix https://github.com/llvm/llvm-project/issues/60546
NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp
is used to reduce compile time.
Reviewed By: nikic, arsenm, erikdesjardins
Differential Revision: https://reviews.llvm.org/D144248
Much of foldLoadsRecursive relies on knowing the size of loaded
data, which is not possible for scalable vector types. However,
the logic of combining two small loads into one bigger load does
not apply for vector types so rather than converting the algorithm
to use TypeSize I've simply added an early exit for vectors.
Fixes#59510
Differential Revision: https://reviews.llvm.org/D140106
This patch updates the load insert point of the merged load in AggressiveInstCombine().
This is done to handle the reported test breaks by handling Alias Analysis correctly.
Differential Revision: https://reviews.llvm.org/D137201
This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads.
This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases.
Differential Revision: https://reviews.llvm.org/D137333
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Fix the error reported on reverse load merge.
Differential Revision: https://reviews.llvm.org/D127392
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Differential Revision: https://reviews.llvm.org/D127392
The bug reported on the [0] has been fixed.
The issue was we have not checked if the global variables that
represent cttz tables was constant.
There is a new negative test added in negative-lower-table-based-cttz.ll
that represents this.
[0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b