39630 Commits

Author SHA1 Message Date
Kazu Hirata
8210cdd764
[llvm] Use llvm::replace (NFC) (#137481) 2025-04-26 18:18:09 -07:00
Kazu Hirata
8ba3a232d1
[llvm] Use llvm::copy (NFC) (#137470) 2025-04-26 15:50:38 -07:00
Florian Hahn
826f237cb4
[VPlan] Don't added separate vector latch block (NFC).
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
2025-04-26 22:03:18 +01:00
sallto
419a2cb218
[Inliner] Preserve alignment of byval arguments (#137455)
Previously the inliner always produced a memcpy with alignment 1 for src
and destination, leading to potentially suboptimal Codegen.

Since the Src ptr alignment is only available through the CallBase it
has to be passed to HandleByValArgumentInit. Dst Alignment is already
known so it doesn't have to be passed along.

If there is no specified Src Alignment my changes cause the ptr to have
no align data attached instead of align 1 as before (see
inline-tail.ll), I believe this is fine but since I'm a first time
contributor, please confirm.

My changes are already covered by 4 existing regression tests, so I did
not add any additional ones.

The example from #45778 now results in:
```C
opt -S -passes=inline,instcombine,sroa,instcombine test.ll

define dso_local i32 @test(ptr %t) {
entry:
  %.sroa.0.0.copyload = load ptr, ptr %t, align 8       # this used to be align 1 in the original issue
  %arrayidx.i = getelementptr inbounds nuw i8, ptr %.sroa.0.0.copyload, i64 24
  %0 = load i32, ptr %arrayidx.i, align 4
  ret i32 %0
}
```

Fixes #45778.
2025-04-26 21:38:58 +02:00
Abid Qadeer
58430692fc
[CodeExtractor] Improve debug info for input values. (#136016)
If we use `CodeExtractor` to extract the block1 into a new function,

```
define void @foo() !dbg !2 {
entry:
  %1 = alloca i32, i64 1, align 4
  %2 = alloca i32, i64 1, align 4
  #dbg_declare(ptr %1, !8, !DIExpression(), !1)
  br label %block1

block1:
  store i32 1, ptr %1, align 4
  store i32 2, ptr %2, align 4
  #dbg_declare(ptr %2, !10, !DIExpression(), !1)
  ret void
}
```

it will look like the extracted function shown below (with some
irrelevent details removed).

```
define internal void @extracted(ptr %arg0, ptr %arg1) { 
newFuncRoot:
  br label %block1

block1:
  store i32 1, ptr %arg0, align 4
  store i32 2, ptr %arg1, align 4
  ret void
}
```

You will notice that it has replaced the usage of values that were in
the parent function (%1 and %2) with the arguments to the new function.
But it did not do the same thing with `#dbg_declare` which was simply
dropped because its location pointed to a value outside of the new
function. Similarly arg0 is without any debug record, although the value
that it replaced had one and we could materialize one for it based on
that.

This is not just a theoretical limitations. `CodeExtractor` is used to
create functions that implement many of the `OpenMP` constructs in
`OMPIRBuilder`. As a result of these limitations, the debug information
is missing from the created functions.

This PR tries to address this problem. It iterates over the input to the
extracted function and looks at their debug uses. If they were present
in the new function, it updates their location. Otherwise it materialize
a similar usage in the new function.

Most of these changes are localized in `fixupDebugInfoPostExtraction`.
Only other change is to propagate function inputs and the replacement
values to it.

---------

Co-authored-by: Tim Gymnich <tim@gymni.ch>
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
2025-04-26 10:12:44 +01:00
LU-JOHN
571e024d00
[Sink][NFC] Move all checks for unsafe instructions into one function (#137398)
Move check for instruction that is unsafe to sink into isSafeToMove
function.

Signed-off-by: John Lu <John.Lu@amd.com>
2025-04-26 10:10:27 +02:00
Yingwei Zheng
3e1e4062e1
[InstCombine] Preserve signbit semantics of NaN with fold to fabs (#136648)
As per the LangRef and IEEE 754-2008 standard, the sign bit of NaN is
preserved if there is no floating-point operation being performed.
See also
862e35e25a
for reference.

Alive2: https://alive2.llvm.org/ce/z/QYtEGj
Closes https://github.com/llvm/llvm-project/issues/136646
2025-04-26 14:03:12 +08:00
Jim Lin
12d1cb1347
[InstCombine] Preserve disjoint or after folding casted bitwise logic (#136815)
Optimize
`or disjoint (zext/sext a) (zext/sext b))`
to
`(zext/sext (or disjoint a, b))`
without losing disjoint.

Confirmed here: https://alive2.llvm.org/ce/z/kQ5fJv.
2025-04-26 12:35:04 +08:00
LU-JOHN
f9d4e7ef8b
[NFC][Sink] Change runtime checks to asserts (#137354)
Candidate block for sinking must be dominated by current location. This
is true based on how the candidate block was selected. Runtime checks
are not necessary and has been changed to an assertion.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-04-25 23:21:11 +02:00
Florian Hahn
c4d84e1b00
[VPlan] Use replaceSuccessor/replacePredecessor in insertBlock (NFC).
Use replaceSuccessor/replacePredecessor in
insertBlockAfter/insertBlockBefore. This preserves the predecessor
order, which in turns is needed to not invalidate existing phi recipes.

At the moment this is NFC, but enables additional uses in the future.
2025-04-25 20:46:10 +01:00
Matt Arsenault
91865ac9ba
Use isa instead of !dyn_cast (#137344) 2025-04-25 19:11:56 +02:00
Matt Arsenault
559a50c5f0
SimplifyIndVar: Use use_empty instead of hasNUses(0) (#137346) 2025-04-25 19:11:39 +02:00
Matt Arsenault
a214084ae4
BypassSlowDivision: Use use_empty instead of hasNUses(0) (#137345) 2025-04-25 19:01:45 +02:00
Matt Arsenault
cf766f5210
InlineFunction: Use use_empty instead of hasNUses(0) (#137347) 2025-04-25 19:01:20 +02:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Matt Arsenault
4ea2278e39
SLPVectorizer: Use use_empty instead of hasNUses(0) (#137336) 2025-04-25 17:27:01 +02:00
Matt Arsenault
bdc523f31f
LowerMatrixIntrinsics: Use use_empty instead of hasNUses(0) (#137334) 2025-04-25 17:21:40 +02:00
Jim Lin
462bf4746f
[InstCombine] Refactor the code for folding logicop and sext/zext. NFC. (#137132)
This refactoring is for more easily adding the code to preserve disjoint
or in the PR https://github.com/llvm/llvm-project/pull/136815.

Both casts must have one use for folding logicop and sext/zext when the
src type differ to avoid creating an extra instruction. If the src type
of casts are the same, only one of the casts needs to have one use. This
PR also adds more tests for the same src type.
2025-04-25 10:59:01 +08:00
Stephen Tozer
fdbf073a86 Revert "[DLCov] Implement DebugLoc coverage tracking (#107279)"
This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.

Reverted due to the commit including a config in LLVM headers that is not
available outside of the llvm source tree.
2025-04-25 00:36:28 +01:00
Matt Arsenault
37b135cc8f
Attributor: Don't rely on use_empty for constants (#137218)
This allows inferring noalias on a null argument parameter. This
avoids a non-NFC diff in a future change.
2025-04-24 21:41:55 +02:00
Jeffrey Byrnes
1636f4af7b
[CmpInstAnalysis] Decompose icmp eq (and x, C) C2 (#136367)
This type of decomposition is used in multiple places already. Adding it
to `CmpInstAnalysis` reduces code duplication.
2025-04-24 12:40:26 -07:00
Florian Hahn
7cce38beea
[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC).
ScalarEvolution is not used by the function, remove the dead arg.
2025-04-24 19:59:05 +01:00
Stephen Tozer
a9d93ecf1f
[DLCov] Implement DebugLoc coverage tracking (#107279)
This is part of a series of patches that tries to improve DILocation bug
detection in Debugify; see the review for more details. This is the patch
that adds the main feature, adding a set of `DebugLoc::get<Kind>`
functions that can be used for instructions with intentionally empty
DebugLocs to prevent Debugify from treating them as bugs, removing the
currently-pervasive false positives and allowing us to use Debugify (in
its original DI preservation mode) to reliably detect existing bugs and
regressions. This patch does not add uses of these functions, except for
once in Clang before optimizations, and in
`Instruction::dropLocation()`, since that is an obvious case that
immediately removes a set of false positives.
2025-04-24 19:41:25 +01:00
Alexey Bataev
a7a74b349d
[SLP]Improve reordering of the alternate nodes
Better to preserve the original order of the alternate nodes to avoid
inter-lane shuffling, select/insert subvector patterns provide better
perf.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/136329
2025-04-24 14:33:10 -04:00
Alexey Bataev
f427890a1d [SLP]Fix PHI comparator to make it follow weak strict ordering restriction
Fixes #137164
2025-04-24 11:08:17 -07:00
Stephen Tozer
d6bb786705
[DebugInfo] Propagate source loc from invoke to replacement branch (#137206)
An existing transformation replaces invoke instructions with a call to
the invoked function and a branch to the destination; when this happens,
we propagate the invoke's source location to the call but not to the
branch. This patch updates this behaviour to propagate to the branch as
well.

Found using https://github.com/llvm/llvm-project/pull/107279.
2025-04-24 18:59:29 +01:00
Simon Pilgrim
f572a5951a [VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values
Fixes #132563
2025-04-24 14:17:55 +01:00
Nikita Popov
57530c23a5
[GlobalOpt] Do not promote malloc if there are atomic loads/stores (#137158)
When converting a malloc stored to a global into a global, we will
introduce an i1 flag to track whether the global has been initialized.

In case of atomic loads/stores, this will result in verifier failures,
because atomic ops on i1 are illegal. Even if we changed this to i8, I
don't think it is a good idea to change atomic types in that way.

Instead, bail out of the transform is we encounter any atomic
loads/stores of the global.

Fixes https://github.com/llvm/llvm-project/issues/137152.
2025-04-24 15:15:47 +02:00
Stephen Tozer
224cd50e00
[DebugInfo][GlobalOpt] Preserve source locs for optimized loads (#134828)
Some optimizations in globalopt simplify uses of a global value to uses
of a generated global bool value; in some cases where this happens, the
newly-generated instructions would not have the original source
location(s) of the instructions they replaced propagated to them; this
patch properly preserves those source locations.

Found using https://github.com/llvm/llvm-project/pull/107279.
2025-04-24 14:09:53 +01:00
Florian Hahn
06d4876982
[VPlan] Replace checking IR loop with checking VPlan predecessors (NFC).
Update check to use VPEarlyExitBlock's predecessors, which removes a
dependence on underlying IR and is more in line with the comment below.
2025-04-24 12:29:34 +01:00
Florian Hahn
5d136f90a9
[VPlan] Manage instruction metadata in VPlan. (#135272)
Add a new helper to manage IR metadata that can be progated to generated
instructions for recipes.

This helps to remove a number of remaining uses of getUnderlyingInstr
during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/135272
2025-04-24 11:57:19 +01:00
Camsyn
59b26abbbe
[TSan, SanitizerBinaryMetadata] Analyze the capture status for alloca rather than arbitrary Addr (#132756)
This PR is based on my last PR #132752 (the first commit of this PR),
but addressing a different issue.

This commit addresses the limitation in `PointerMayBeCaptured` analysis
when dealing with derived pointers (e.g. arr+1) as described in issue
#132739.

The current implementation of `PointerMayBeCaptured` may miss captures
of the underlying `alloca` when analyzing derived pointers, leading to
some FNs in TSan, as follows:
```cpp
void *Thread(void *a) {
  ((int*)a)[1] = 43;
  return 0;
}

int main() {
  int Arr[2] = {41, 42};
  pthread_t t;
  pthread_create(&t, 0, Thread, &Arr[0]);
  // Missed instrumentation here due to the FN of PointerMayBeCaptured
  Arr[1] = 43;
  barrier_wait(&barrier);
  pthread_join(t, 0);
}
```
Refer to this [godbolt page](https://godbolt.org/z/n67GrxdcE) to get the
compilation result of TSan.

Even when `PointerMayBeCaptured` working correctly, it should backtrack
to the original `alloca` firstly during analysis, causing redundancy to
the outer's `findAllocaForValue`.
```cpp
    const AllocaInst *AI = findAllocaForValue(Addr);
    // Instead of Addr, we should check whether its base pointer is captured.
    if (AI && !PointerMayBeCaptured(Addr, true)) ...
```

Key changes:
Directly analyze the capture status of the underlying `alloca` instead
of derived pointers to ensure accurate capture detection
```cpp
    const AllocaInst *AI = findAllocaForValue(Addr);
    // Instead of Addr, we should check whether its base pointer is captured.
    if (AI && !PointerMayBeCaptured(AI, true)) ...
```
2025-04-24 10:48:07 +02:00
Luke Lau
3883b27ba8
[VPlan] Fix typo in assertion. NFC (#137009) 2025-04-24 16:36:32 +08:00
Florian Hahn
e268f71c59
[VPlan] Remove unneeded early continue. (NFC)
As suggested in
https://github.com/llvm/llvm-project/pull/136455, now unreachable exit
blocks won't have any phi nodes.
2025-04-24 08:59:30 +01:00
Florian Hahn
15bb1db4a9
[VPlan] Remove ILV::sinkScalarOperands. (#136023)
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.

There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.

We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.

Depends on https://github.com/llvm/llvm-project/pull/136021.

PR: https://github.com/llvm/llvm-project/pull/136023
2025-04-24 08:37:49 +01:00
Kazu Hirata
cb96a3dc07
[memprof] Dump the number of matched frames (#137082)
This patch teaches readMemprof to dump the number of frames for each
allocation site match.  This information helps us analyze what part of
the call stack in the MemProf profile has matched the IR.

Aside from updating existing test cases, this patch adds one more test
case, memprof-dump-matched-alloc-site.ll, because none of the existing
test cases has the number of frames greater than one.
2025-04-23 21:29:16 -07:00
Arthur Eubanks
0547e84181
[FunctionAttrs] Bail if initializes range overflows 64-bit signed int (#137053)
Otherwise the range doesn't make sense since we interpret it as signed.

Fixes #134115
2025-04-23 15:56:24 -07:00
Florian Hahn
71f2c1e204
[VPlan] Use early exit in ::extractLastLaneOfFirstOperand (NFC).
Reduce indent level, as suggested in
https://github.com/llvm/llvm-project/pull/136455.
2025-04-23 21:55:35 +01:00
Florian Hahn
ff36508d21
[VPlan] Remove redundant setting of parent in createLoopRegion (NFC).
The regions parents will be set when the parents are set after creating
the parent region.
2025-04-23 21:45:15 +01:00
Florian Hahn
3fbbe9b8d0
[VPlan] Add exit phi operands during initial construction (NFC). (#136455)
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.

PR: https://github.com/llvm/llvm-project/pull/136455
2025-04-23 20:40:42 +01:00
Ramkumar Ramachandra
bdf21ca8ac
[LV] Fix missing entry in willGenerateVectors (#136712)
willGenerateVectors switches on opcodes of a recipe, but Histogram is
missing in the switch statement, which could cause a crash in some
cases. The crash was initially observed when developing another patch.
2025-04-23 19:06:38 +01:00
Yingwei Zheng
8abc917fe0
[InstCombine] Do not fold logical is_finite test (#136851)
This patch disables the fold for logical is_finite test (i.e., `and
(fcmp ord x, 0), (fcmp u* x, inf) -> fcmp o* x, inf`).
It is still possible to allow this fold for several logical cases (e.g.,
`stripSignOnlyFPOps(RHS0)` does not strip any operations). Since this
patch has no real-world impact, I decided to disable this fold for all
logical cases.

Alive2: https://alive2.llvm.org/ce/z/aH4LC7
Closes https://github.com/llvm/llvm-project/issues/136650.
2025-04-24 00:12:30 +08:00
Nikita Popov
eea1efed30 [InstrProfiling] Avoid unnecessary bitcast (NFC)
Not needed with opaque pointers.
2025-04-23 15:29:49 +02:00
Nikita Popov
208257f7e0 [CoroElide] Remove unnecessary bitcast (NFCI)
No longer needed with opaque pointers.
2025-04-23 15:21:52 +02:00
Nikita Popov
01ee03c262 [CoroElide] Avoid AA query on non-pointers (NFCI) 2025-04-23 15:21:52 +02:00
Nikita Popov
14dee0aeaa [NewGVN] Avoid AA query on non-pointers (NFCI)
In order for the instruction result to alias with the pointer it
needs to be a pointer.
2025-04-23 15:21:52 +02:00
Nikita Popov
91e1922d45 [DSE] Skip non-pointer args in initializes handling (NFCI)
Avoid performing AA queries on non-pointers.
2025-04-23 15:21:52 +02:00
Nicholas Guy
1ce709cb84
[LV] Fix crash when building partial reductions using types that aren't known scale factors (#136680) 2025-04-23 13:19:18 +01:00
Björn Pettersson
2a9f77f6bd
[Reassociate] Invalidate analysis passes after canonicalizeOperands (#136835)
When ranking operands for an expression tree the reassociate pass also
perform canonicalization, putting constants on the right hand side. Such
transforms was however not registered as modifying the IR. So at the end
of the pass, if not having made any other changes, the pass returned
that all analyses should be kept.

With this patch we make sure to set MadeChange to true when modifying
the IR via canonicalizeOperands. This is to make sure analyses such as
DemandedBits are properly invalidated when instructions are modified.
2025-04-23 12:52:00 +02:00
Fabian Ritter
720a91183b
[SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking and NUW (#130617)
If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.

We can also preserve inbounds if the inbounds GEP and the involved additions are NUW.

For SWDEV-516125.
2025-04-23 12:38:41 +02:00