This patch implements the following two peephole optimisations:
1. ``` abs(X) u> K --> K >= 0 ? `X + K u> 2 * K` : `false` ```;
2. If `abs(INT_MIN)` is `poison`, ```abs(X) u< K --> K >= 1 ? `X + (K -
1) u<= 2 * (K - 1)` : K != 0```.
See the following Alive2 proofs:
[1](https://alive2.llvm.org/ce/z/J2SRSv) and
[2](https://alive2.llvm.org/ce/z/tfxTrU).
Again, these fixes are trivial as we're creating new select instructions
with predicates from existing select instructions.
In this case, we create one select instruction from two existing select
instructions, but since both existing select instructions have the same
predicate, their profile data should be the same, so we can reuse the
profile data from either instruction. Therefore, we arbitrarily reuse
the profile data from the first select instruction.
Tracking issue: #147390
For every type other than i1, ssub.sat x, y = 0 implies x == y. But
ssub.sat.i1 0, -1 = 0 (because the result of 1 saturates to 0).
The changes to instcombine are not strictly necessary. Instcombine
canonicalizes the ssub.sat.i1 before we arrive at these pattern-matches.
The real fix is in ValueTracking.
Nonetheless we agreed in review it makes sense to add these checks to
instcombine, even though they're currently unreachable:
https://github.com/llvm/llvm-project/pull/173742#issuecomment-3696631396
This was found by a fuzzer I'm working on!
instcombine can create srem X, 0 or icmp ult X, 0 mid-pass when
operands fold to zero, which trips assertions in foldICmpSRemConstant.
Bail out on zero divisors / zero ULT constants instead of asserting,
and add a regression test from the minimized reproducer.
This was found by a fuzzer I'm working on. The high-level design is to
randomly generate LLVM IR, run a pass on it, and then run the original
and new IR through the interpreter. They should produce the same
results. Right now I'm only fuzzing instcombine.
visitFCmp() previously bailed out when a following select matched a
clamp pattern. This blocks simplifications when the clamp is provably
redundant.
This PR allows simplification for clamp selects of flavor SPF_FMAXNUM/
SPF_FMINNUM when one arm is a constant and the other is a sitofp/uitofp
of an integer value, and the constant equals the exact min/max of that
integer domain:
* SPF_FMAXNUM (pattern max(X,C)): redundant if C is the minimum integer
mapped exactly to FP (e.g. X = sitofp i8, C = -128.0f).
* SPF_FMINNUM (pattern min(X,C)): redundant if C is the maximum integer
mapped exactly to FP (e.g. X = uitofp i8, C = 255.0f).
This fixes a regression in #173454
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Both `decomposeBitTestICmp` and `decomposeBitTest` have a parameter
called `lookThroughTrunc`. This was spelled in full (i.e. `lookThroughTrunc`)
in the header. However, in the implementation, it's written as `lookThruTrunc`.
I opted to convert all instances of `lookThruTrunc` into
`lookThroughTrunc` to reduce surprise while reading the code and for
conformity.
---
The other change in this PR is the renaming of the wrapper around
`decomposeBitTest()`. Even though it was a wrapper around
`CmpInstAnalysis.h`'s `decomposeBitTest`, the function was called
`decomposeBitTestICmp`. This is quite confusing because such a function
_also_ exists in `CmpInstAnalysis.h`, but it is _not_ the one actually
being used in `InstCombineAndOrXor.cpp`.
Extends the `icmp(trunc(shl))` fold to handle any power of 2 constant as
the shift base, not just 1. This generalizes the following patterns by
adjusting the comparison offsets by `log2(Pow2)`.
```llvm
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u< N
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
; to
(trunc (Pow2 << Y) to iN) == 0 --> Y u>= N - log2(Pow2)
(trunc (Pow2 << Y) to iN) != 0 --> Y u< N - log2(Pow2)
(trunc (Pow2 << Y) to iN) == 2**C --> Y == C - log2(Pow2)
(trunc (Pow2 << Y) to iN) != 2**C --> Y != C - log2(Pow2)
```
Proof: https://alive2.llvm.org/ce/z/2zwTkp
This is a small code size optimization that lets us avoid both shifting
and comparing to a constant if we need the shifted value anyway. On most
architectures the zero comparison is cheaper than a constant comparison
(or free if the shift sets flags).
Although this change appears to remove the optimization entirely, we
continue to do this transform if there is one use because of the code
below the removed code that transforms the shift into an and, followed
by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that
transforms the and into a ult/ugt. Added a test case to verify this
explicitly.
Per [1] reduces clang .text size by 0.09% and dynamic instruction count
by 0.01%.
[1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text
Reviewers: nikic, dtcxzyw
Reviewed By: dtcxzyw
Pull Request: https://github.com/llvm/llvm-project/pull/168007
Fix#157315
alive2: https://alive2.llvm.org/ce/z/TEnuFV
The equality comparison of `min(max(X, Lo), Hi)` and `X` is actually a
range check on `X`. This PR folds this into an unsigned bound check `(X
- Lo) u< (Hi - Lo + 1)`.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
Currently this fold only supports a single GEP. However, in ptradd
representation, it may be split across multiple GEPs. In particular, PR
#151333 will split off constant offset GEPs.
To support this, add a new helper decomposeLinearExpression(), which
decomposes a pointer into a linear expression of the form BasePtr +
Index * Scale + Offset.
I plan to also extend this helper to look through mul/shl on the index
and use it in more places that currently use collectOffset() to extract
a single index * scale. This will make sure such optimizations are not
affected by the ptradd migration.
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
foldCmpLoadFromIndexedGlobal() currently checks that the global type,
the GEP type and the load type match in certain ways. Replace this with
generic logic based on offsets.
This is a reboot of https://github.com/llvm/llvm-project/pull/67093.
This PR is less ambitious by requiring that the constant offset is
smaller than the stride, which avoids the additional complexity of that
PR.
When matching integers, `m_ConstantInt` is a convenient alternative to
`m_APInt` for matching unsigned 64-bit integers, allowing one to
simplify
```cpp
const APInt *IntC;
if (match(V, m_APInt(IntC))) {
if (IntC->ule(UINT64_MAX)) {
uint64_t Int = IntC->getZExtValue();
// ...
}
}
```
to
```cpp
uint64_t Int;
if (match(V, m_ConstantInt(Int))) {
// ...
}
```
However, this simplification is only true if `V` is a scalar type.
Specifically, `m_APInt` also matches integer splats, but `m_ConstantInt`
does not.
This patch ensures that the matching behaviour of `m_ConstantInt`
parallels that of `m_APInt`, and also incorporates it in some obvious
places.
My understanding is that gep [n x i8] and gep i8 can be treated
equivalently - the array type conveys no extra information and could be
removed. This goes through foldCmpLoadFromIndexedGlobal and tries to
make it work for non-array gep types, so long as the index type still
matches the array being loaded.
Fold icmp between a chain of geps and its base pointer. Previously only
a single gep was supported.
This will be extended to handle the case of two gep chains with a common
base in a followup.
This helps to avoid regressions after #137297.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper. This also generalizes an existing fold,
which has therefore been dropped.
Proofs: https://alive2.llvm.org/ce/z/NS7Vzu.
Fixes: https://github.com/llvm/llvm-project/issues/138212.
Logically it does not matter; getFreelyInvertedImpl doesn't
depend on the value for the m_ImmConstant case.
This use count logic should probably sink into getFreelyInvertedImpl,
every use of this appears to just be a hasOneUse or hasNUse count,
so this could change to just be a use count threshold.
Fixes#133344
Proof: https://alive2.llvm.org/ce/z/X3Uh23
InstCombine couldn't optimize `i1` because `canonicalizeICmpBool()` was
transforming the comparison into bitwise operations before
`foldICmpTruncWithTruncOrExt()` was called.
This PR solves the ordering issue by placing
`foldICmpTruncWithTruncOrExt()` before `canonicalizeICmpBool()`.
I believe this will not cause any regressions since all tests are
passing.
Relative to the previous attempt this includes two fixes:
* Adjust callCapturesBefore() to not skip captures(ret: address,
provenance) arguments, as these will not count as a capture
at the call-site.
* When visiting uses during stack slot optimization, don't skip
the ModRef check for passthru captures. Calls can both modref
and be passthru for captures.
------
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.
The key API changes here are:
* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.
For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.
This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.