The bug is that `IsAndVariant` is used to assume which arm in the
select the output `SelInner` should be placed but match the inner
select condition with `m_c_LogicalOp`. With fully simplified ops, this
works fine, but its possible if the select condition is not
simplified, for it match both `LogicalAnd` and `LogicalOr` i.e `select
true, true, false`.
In PR71330 for example, the issue occurs in the following IR:
```
define i32 @bad() {
%..i.i = select i1 false, i32 0, i32 3
%brmerge = select i1 true, i1 true, i1 false
%not.cmp.i.i.not = xor i1 true, true
%.mux = zext i1 %not.cmp.i.i.not to i32
%retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
ret i32 %retval.0.i.i
}
```
When simplifying:
```
%retval.0.i.i = select i1 %brmerge, i32 %.mux, i32 %..i.i
```
We end up matching `%brmerge` as `LogicalAnd` for `IsAndVariant`, but
the inner select (`%..i.i`) condition which is `false` with
`LogicalOr`.
Closes#71489
Call slot optimization may introduce writes to the destination object
that occur earlier than in the original function. We currently already
check that that the destination is dereferenceable and aligned, but we
do not make sure that it is writable. As such, we might introduce a
write to read-only memory, or introduce a data race.
Fix this by checking that the object is writable. For arguments, this is
indicated by the new writable attribute. Tests using
sret/dereferenceable are updated to use it.
Close https://github.com/llvm/llvm-project/issues/56980.
This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.
The rationale behind the patch is simple. See the example:
```C++
A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
}
```
Generally the generated .destroy function may be:
```C++
void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
}
```
Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:
```C++
void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
}
```
I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
Fix the crash for the last land PR70542.
Note:
For '%add = add nuw i32 %x, 1', we can only infer the LowerBound is 1,
but the UpperBound is wrapped to 0 in computeConstantRange.
so we can't assume the UpperBound is valid bound when its value is 0.
Fix https://github.com/llvm/llvm-project/issues/71329.
Reviewed By: zmodem, nikic
This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836.
Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.
I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.
[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].
The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.
This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Update addInfoForInductions to also check if the add-rec is for the
current loop. Otherwise we might add incorrect facts or crash.
Fixes a miscompile & crash introduced by 00396e6a1a0b.
Looking through inttoptr / ptrtoint intermixed with GEPs is very
questionable from a provenance perspective. We also don't seem to
have any test coverage that shows this is useful (apart from one
test I added to guard against a crash).
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
The current code structure results in cases where if a) we can't clone
the IV user (because it's not in our whitelist) or b) can't prove the
SCEV expressions are identical, we'd sometimes leave both the original
unwiddened IV and the partially widdened IV in code. Instead, just
truncate thw wide IV to the use - same as what we'd do if we couldn't
find an addrec to start with.
Noticed this while playing with changing how we produce addrecs. The
current structure results in a very tight interlock between SCEVs
internal capabilities and indvars code.
An issue arose when handling shift amounts while performing
narrowed funnel shifts simplification. Specifically, shift
amounts were incorrectly truncated when their type was
narrower than the target bit width. This has been addressed
by zero-extending `ShAmt` in such cases.
Fixes: https://github.com/llvm/llvm-project/issues/71463.
Proof: https://alive2.llvm.org/ce/z/5draKz.
The hasAddressTaken() call in hasOnlyColdCalls() has quadratic
complexity if there are many cold calls to a function: We're going to
visit each call of the function, and then for each of them iterate all
the users of the function.
We've recently encountered a case where GlobalOpt spends more than an
hour in these hasAddressTaken() checks when full LTO is used.
Avoid this by moving the hasAddressTaken() check into hasChangeableCC()
and caching its result, so it is only computed once per function.
zext nneg was recently added to the IR in #67982. This patch teaches
demanded bits and known bits about the semantics of the instruction, and
adds a couple of test cases to illustrate basic functionality.
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
This causes asserts:
llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082:
Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(
const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &):
Assertion `Part == 0 && "Expected firs part."' failed.
See comment on the code review.
> Currently tryToGatherExtractElements function analyzes the whole vector,
> regrdless number of actual registers, used in this vector. It may
> prevent some optimizations, because per-register analysis may allow to
> simplify the final code by reusing more already emitted vectors and
> better shuffles.
>
> Differential Revision: https://reviews.llvm.org/D148855
This reverts commit 9dfdbd788707edc8c39eb2bff16004aba1f3586b.
Added support for LLVM IR code generation which is used for handling omp
target parallel code. The call for __kmpc_parallel_51 is generated and
the parallel region is outlined to separate function.
The proper setup of kmpc_target_init mode is not included in the commit.
It is assumed that the SPMD mode for target initialization is properly
set by other codegen functions.
(icmp eq/ne (lshr x, C), (lshr y, C) gets optimized to `(icmp
ult/uge (xor x, y), (1 << C)`. This can cause the current equal by
parts detection to miss the high-bits as it may get optimized to the
new pattern.
This commit adds support for detecting / combining the ult/ugt
pattern.
Closes#69884
Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778
We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.
Create this PR to get some early review on the patch.
---------
Co-authored-by: Manman Ren <mren@meta.com>
Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.
Differential Revision: https://reviews.llvm.org/D148855
IndVars has the existing notion of a narrow definition which is known to
positive and thus both sign and zero extension kinds are actually the
same operations. There's existing logic for forming a SCEV based on the
extension kind and the no-wrap flags. This change extends that logic to
form the opposite extension kind for a positive def if doing so is
allowed by the flags. Note that we already do something analogous for
the getWideRecurrence case as well.
The `skipPGO()` function was added in https://reviews.llvm.org/D137184.
Unfortunately, it also blocked functions from being annotated (PGOUse),
which I believe will cause confusion to users if a function has a
profile but it is not PGO'd.
The docs for `noprofile` and `skipprofile` only claim to block
instrumentation, not PGO optimization:
https://llvm.org/docs/LangRef.html
zext nneg was recently added to the IR in #67982. This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.
The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice. For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways. The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR. Those are exactly the cases where
using zext nneg are most useful. We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.
Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.