Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
Previously the inliner always produced a memcpy with alignment 1 for src
and destination, leading to potentially suboptimal Codegen.
Since the Src ptr alignment is only available through the CallBase it
has to be passed to HandleByValArgumentInit. Dst Alignment is already
known so it doesn't have to be passed along.
If there is no specified Src Alignment my changes cause the ptr to have
no align data attached instead of align 1 as before (see
inline-tail.ll), I believe this is fine but since I'm a first time
contributor, please confirm.
My changes are already covered by 4 existing regression tests, so I did
not add any additional ones.
The example from #45778 now results in:
```C
opt -S -passes=inline,instcombine,sroa,instcombine test.ll
define dso_local i32 @test(ptr %t) {
entry:
%.sroa.0.0.copyload = load ptr, ptr %t, align 8 # this used to be align 1 in the original issue
%arrayidx.i = getelementptr inbounds nuw i8, ptr %.sroa.0.0.copyload, i64 24
%0 = load i32, ptr %arrayidx.i, align 4
ret i32 %0
}
```
Fixes#45778.
If we use `CodeExtractor` to extract the block1 into a new function,
```
define void @foo() !dbg !2 {
entry:
%1 = alloca i32, i64 1, align 4
%2 = alloca i32, i64 1, align 4
#dbg_declare(ptr %1, !8, !DIExpression(), !1)
br label %block1
block1:
store i32 1, ptr %1, align 4
store i32 2, ptr %2, align 4
#dbg_declare(ptr %2, !10, !DIExpression(), !1)
ret void
}
```
it will look like the extracted function shown below (with some
irrelevent details removed).
```
define internal void @extracted(ptr %arg0, ptr %arg1) {
newFuncRoot:
br label %block1
block1:
store i32 1, ptr %arg0, align 4
store i32 2, ptr %arg1, align 4
ret void
}
```
You will notice that it has replaced the usage of values that were in
the parent function (%1 and %2) with the arguments to the new function.
But it did not do the same thing with `#dbg_declare` which was simply
dropped because its location pointed to a value outside of the new
function. Similarly arg0 is without any debug record, although the value
that it replaced had one and we could materialize one for it based on
that.
This is not just a theoretical limitations. `CodeExtractor` is used to
create functions that implement many of the `OpenMP` constructs in
`OMPIRBuilder`. As a result of these limitations, the debug information
is missing from the created functions.
This PR tries to address this problem. It iterates over the input to the
extracted function and looks at their debug uses. If they were present
in the new function, it updates their location. Otherwise it materialize
a similar usage in the new function.
Most of these changes are localized in `fixupDebugInfoPostExtraction`.
Only other change is to propagate function inputs and the replacement
values to it.
---------
Co-authored-by: Tim Gymnich <tim@gymni.ch>
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
Optimize
`or disjoint (zext/sext a) (zext/sext b))`
to
`(zext/sext (or disjoint a, b))`
without losing disjoint.
Confirmed here: https://alive2.llvm.org/ce/z/kQ5fJv.
Candidate block for sinking must be dominated by current location. This
is true based on how the candidate block was selected. Runtime checks
are not necessary and has been changed to an assertion.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Use replaceSuccessor/replacePredecessor in
insertBlockAfter/insertBlockBefore. This preserves the predecessor
order, which in turns is needed to not invalidate existing phi recipes.
At the moment this is NFC, but enables additional uses in the future.
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.
PR: https://github.com/llvm/llvm-project/pull/137030
This refactoring is for more easily adding the code to preserve disjoint
or in the PR https://github.com/llvm/llvm-project/pull/136815.
Both casts must have one use for folding logicop and sext/zext when the
src type differ to avoid creating an extra instruction. If the src type
of casts are the same, only one of the casts needs to have one use. This
PR also adds more tests for the same src type.
This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.
Reverted due to the commit including a config in LLVM headers that is not
available outside of the llvm source tree.
This is part of a series of patches that tries to improve DILocation bug
detection in Debugify; see the review for more details. This is the patch
that adds the main feature, adding a set of `DebugLoc::get<Kind>`
functions that can be used for instructions with intentionally empty
DebugLocs to prevent Debugify from treating them as bugs, removing the
currently-pervasive false positives and allowing us to use Debugify (in
its original DI preservation mode) to reliably detect existing bugs and
regressions. This patch does not add uses of these functions, except for
once in Clang before optimizations, and in
`Instruction::dropLocation()`, since that is an obvious case that
immediately removes a set of false positives.
Better to preserve the original order of the alternate nodes to avoid
inter-lane shuffling, select/insert subvector patterns provide better
perf.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/136329
An existing transformation replaces invoke instructions with a call to
the invoked function and a branch to the destination; when this happens,
we propagate the invoke's source location to the call but not to the
branch. This patch updates this behaviour to propagate to the branch as
well.
Found using https://github.com/llvm/llvm-project/pull/107279.
When converting a malloc stored to a global into a global, we will
introduce an i1 flag to track whether the global has been initialized.
In case of atomic loads/stores, this will result in verifier failures,
because atomic ops on i1 are illegal. Even if we changed this to i8, I
don't think it is a good idea to change atomic types in that way.
Instead, bail out of the transform is we encounter any atomic
loads/stores of the global.
Fixes https://github.com/llvm/llvm-project/issues/137152.
Some optimizations in globalopt simplify uses of a global value to uses
of a generated global bool value; in some cases where this happens, the
newly-generated instructions would not have the original source
location(s) of the instructions they replaced propagated to them; this
patch properly preserves those source locations.
Found using https://github.com/llvm/llvm-project/pull/107279.
Add a new helper to manage IR metadata that can be progated to generated
instructions for recipes.
This helps to remove a number of remaining uses of getUnderlyingInstr
during VPlan execution.
PR: https://github.com/llvm/llvm-project/pull/135272
This PR is based on my last PR #132752 (the first commit of this PR),
but addressing a different issue.
This commit addresses the limitation in `PointerMayBeCaptured` analysis
when dealing with derived pointers (e.g. arr+1) as described in issue
#132739.
The current implementation of `PointerMayBeCaptured` may miss captures
of the underlying `alloca` when analyzing derived pointers, leading to
some FNs in TSan, as follows:
```cpp
void *Thread(void *a) {
((int*)a)[1] = 43;
return 0;
}
int main() {
int Arr[2] = {41, 42};
pthread_t t;
pthread_create(&t, 0, Thread, &Arr[0]);
// Missed instrumentation here due to the FN of PointerMayBeCaptured
Arr[1] = 43;
barrier_wait(&barrier);
pthread_join(t, 0);
}
```
Refer to this [godbolt page](https://godbolt.org/z/n67GrxdcE) to get the
compilation result of TSan.
Even when `PointerMayBeCaptured` working correctly, it should backtrack
to the original `alloca` firstly during analysis, causing redundancy to
the outer's `findAllocaForValue`.
```cpp
const AllocaInst *AI = findAllocaForValue(Addr);
// Instead of Addr, we should check whether its base pointer is captured.
if (AI && !PointerMayBeCaptured(Addr, true)) ...
```
Key changes:
Directly analyze the capture status of the underlying `alloca` instead
of derived pointers to ensure accurate capture detection
```cpp
const AllocaInst *AI = findAllocaForValue(Addr);
// Instead of Addr, we should check whether its base pointer is captured.
if (AI && !PointerMayBeCaptured(AI, true)) ...
```
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.
There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.
We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.
Depends on https://github.com/llvm/llvm-project/pull/136021.
PR: https://github.com/llvm/llvm-project/pull/136023
This patch teaches readMemprof to dump the number of frames for each
allocation site match. This information helps us analyze what part of
the call stack in the MemProf profile has matched the IR.
Aside from updating existing test cases, this patch adds one more test
case, memprof-dump-matched-alloc-site.ll, because none of the existing
test cases has the number of frames greater than one.
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.
PR: https://github.com/llvm/llvm-project/pull/136455
willGenerateVectors switches on opcodes of a recipe, but Histogram is
missing in the switch statement, which could cause a crash in some
cases. The crash was initially observed when developing another patch.
This patch disables the fold for logical is_finite test (i.e., `and
(fcmp ord x, 0), (fcmp u* x, inf) -> fcmp o* x, inf`).
It is still possible to allow this fold for several logical cases (e.g.,
`stripSignOnlyFPOps(RHS0)` does not strip any operations). Since this
patch has no real-world impact, I decided to disable this fold for all
logical cases.
Alive2: https://alive2.llvm.org/ce/z/aH4LC7
Closes https://github.com/llvm/llvm-project/issues/136650.
When ranking operands for an expression tree the reassociate pass also
perform canonicalization, putting constants on the right hand side. Such
transforms was however not registered as modifying the IR. So at the end
of the pass, if not having made any other changes, the pass returned
that all analyses should be kept.
With this patch we make sure to set MadeChange to true when modifying
the IR via canonicalizeOperands. This is to make sure analyses such as
DemandedBits are properly invalidated when instructions are modified.
If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.
We can also preserve inbounds if the inbounds GEP and the involved additions are NUW.
For SWDEV-516125.