In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both
`x`/`y`.
In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both
`x`/`y`.
Closes#87143
All the isKnownNonZero() handling for instructions should be inside
this function. This makes the structure more similar to
computeKnownBitsFromOperator() as well.
This may not be entirely NFC due to different depth handling.
If we had a comparison to a literal nan with a false predicate,
we were incorrectly treating it as an unordered compare. This was
correct for fcmp true, but not fcmp false. I noticed this in the
review for e44d3b3e503fa12fdaead2936b28844aa36237c1 but misdiagnosed
the reason. Also change the test for the fcmp true case to be more
useful, but it wasn't wrong previously.
In 2fe81edef6f
[NFC][RemoveDIs] Insert instruction using iterators in Transforms/
we changed
if (*req_idx != *i)
return FindInsertedValue(I->getAggregateOperand(), idx_range,
- InsertBefore);
+ *InsertBefore);
}
but there is no guarantee that is InsertBefore is non-empty at that
point,
which we e.g can see in the added testcase.
Instead just pass on the optional InsertBefore in the recursive call to
FindInsertedValue, as we do at several other places already.
At the moment, getUnderlyingObjects simply continues for phis that do
not refer to the same underlying object in loops, without adding them to
the list of underlying objects, effectively ignoring those phis.
Instead of ignoring those phis, add them to the list of underlying
objects. This fixes a miscompile where LoopAccessAnalysis fails to
identify a memory dependence, because no underlying objects can be found
for a set of memory accesses.
Fixes https://github.com/llvm/llvm-project/issues/82665.
PR: https://github.com/llvm/llvm-project/pull/84339
If a function only exits for certain input values we can still derive
that an argument is "returned". We can also derive range metadata that
describe the possible value range returned by the function. However, it
turns out that those two analyses can result in conflicting information.
Example:
declare i16 @foo(i16 returned)
...
%A = call i16 @foo(i16 4095), !range !{i16 32, i16 33}
To avoid "Bits known to be one AND zero?" assertion failures we know
make sure to discard the known bits for this kind of scenario.
So that we can benefit from some instcombine optimizations.
This PR contains two commits: the first is for adding tests and the
second is for the optimization.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
This helps cover some missing cases in both and hopefully serves as
creating an easier framework for extending general condition based
analysis.
Closes#83161
The false branch for `and` and true branch for `or` provide less
information (intersection as opposed to union), but still can give
some useful information.
Closes#82818
This patch extends `propagatesPoison` to handle more integer intrinsics.
It will turn more logical ands/ors into bitwise ands/ors.
See also https://reviews.llvm.org/D99671.
This patch improves `computeKnownFPClass` by using context-sensitive
information from `DomConditionCache`.
The motivation of this patch is to optimize the following case found in
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566)):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the sign bit of %value is always zero. Then the fabs can be
eliminated.
This pattern also exists in cpython/duckdb/oiio/openexr.
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f82e0809ba12170e2f648f8a1ac01e78ef06c958&to=041218bf5491996edd828cc15b3aec5a59ddc636&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|-0.00%|+0.01%|+0.00%|-0.03%|+0.00%|+0.00%|+0.02%|
This extends computeKnownBits() support for dominating conditions to
also handle and/or conditions. We'll look through either and or or
depending on which edge we're considering.
This change is mainly for the sake of completeness, so we don't start
missing optimizations if SimplifyCFG decides to merge some branches.
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into
`computeKnownFPClass` to improve the signbit inference.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
The definition of the pointer of the memory location being queried is
always one such context. Even this conservative guess can be better than
no guess at all in some cases.
Fixes#64666
Co-authored-by: David Goldblatt <davidgoldblatt@meta.com>
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
Currently isGuaranteedNotToBeUndef() is the same as
isGuaranteedNotToBeUndefOrPoison(). This function is used in places
where we only care about undef (due to multi-use issues), not poison.
Make it more precise by only considering instructions that can create
undef (like loads or call), and ignore those that can only create
poison. In particular, we can ignore poison-generating flags.
This means that inferring more flags has less chance to pessimize other
transforms.
Shufflevector semantics have changed so that poison mask elements
return poison rather than undef. Reflect this in the
canCreateUndefOrPoison() implementation.
When the sub arguments are ptr2int it is not possible to determine
computeKnownBits() of its arguments.
For scalar case generally sub of 2 ptr2int are converted to sub of
indexes.
However a loop with recursive GEP/PHI where the arguments to sub is of
type ptr2int, if it is possible to determine that a sub of this GEP and
another pointer with the same base is KnownNonZero we can return this.
This helps subsequent passes to optimize the loop further.