40045 Commits

Author SHA1 Message Date
Thurston Dang
d398f476c5
[msan] Rename '-msan-dump-strict-intrinsics' to '-msan-dump-heuristic-instructions' (#143186)
This updates the flag from https://github.com/llvm/llvm-project/pull/123381

Also expands the description of msan-dump-strict-*instructions*
2025-06-06 13:06:00 -07:00
Alex MacLean
107601ed06
[InstCombine] Allow min/max in constant BOp min/max folding (#142878)
Extend folding for `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2)
BOp C1` to allow min and max as `BOp`. This ensures a constant clamping
pattern is folded into a pair of min/max instructions. Here is a
simplified example of a case where this folding is not occurring
currently.

int clampToU8(int v) {
    if (v < 0) return 0;
    if (v > 255) return 255;
    return v;
}

https://godbolt.org/z/78jhKPWbv

Generic proof: https://alive2.llvm.org/ce/z/cdpLYy
2025-06-06 12:44:04 -07:00
Peter Collingbourne
e3c72e1075
LowerTypeTests: Shrink check size by 1 instruction on x86.
We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:

lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx              # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx              # compute `address - base`
ror $0x3, %rcx              # compute `(address - base) ror 3` i.e. index
cmp $0x4, %rcx              # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1

A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:

lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax                      # compute `last element - address`
ror $0x3, %rax                      # compute `(last element - address) ror 3` i.e. 4 - index
cmp $0x4, %rax                      # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1

Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.

Reviewers: fmayer, vitalybuka

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/142887
2025-06-06 12:43:24 -07:00
vporpo
47d9473e49
[SandboxVec][BottomUpVec] Fix ownership of Legality (#143018)
Fix the ownership of `Legality` member variable of BottomUpVec. It
should get created in runOnFunction() and get destroyed when the
function returns.
2025-06-06 12:21:25 -07:00
Kazu Hirata
445974547d
[llvm] Ensure newline at the end of files (NFC) (#143061)
Without newlines at the end, git diff would display:

  No newline at end of file
2025-06-05 22:58:15 -07:00
Kazu Hirata
ad6631fb0d
[SCCP] Directly call SCCPSolver::isOverdefined (NFC) (#143059)
We don't need a lambda here.
2025-06-05 22:58:10 -07:00
Yingwei Zheng
4eac8daa38
[LoopPeel] Handle non-local instructions/arguments when updating exiting values (#142993)
Similar to
7e14161f49,
the exiting value may be a non-local instruction or an argument.

Closes https://github.com/llvm/llvm-project/issues/142895.
2025-06-06 12:56:28 +08:00
Teresa Johnson
b58b3e1d36
[MemProf] Add dot graph dumping immediately after stack node update (#143025)
To aid in debugging, (optionally) dump the dot graph immediately after
the stack update phase (which matches nodes to interior callsites) and
before we cleanup mismatched callee edges (either via tail call fixup,
indirect call fixup, or nulling otherwise).
2025-06-05 13:49:11 -07:00
Florian Hahn
01b9828a66
[VPlan] Remove unneeded friend classes from VPValue (NFC).
None of the removed classes makes use of the friendship relationship.
2025-06-05 21:40:21 +01:00
Peter Collingbourne
0a85b31a81 LowerTypeTests: Fix UAF. 2025-06-05 12:33:13 -07:00
Jon Roelofs
7b2ac8ff54
[Matrix] Pass ShapeInfo to Visit* methods (NFC). (#142487)
They all require it now.
2025-06-05 11:22:17 -07:00
Peter Collingbourne
b88e8cceb9
LowerTypeTests: Avoid zext of ptrtoint ConstantExpr.
In the LowerTypeTests pass we used to create IR like this:

  %3 = zext i8 ptrtoint (ptr @__typeid_allones7_align to i8) to i64
  %4 = lshr i64 %2, %3
  %5 = zext i8 sub (i8 64, i8 ptrtoint (ptr @__typeid_allones7_align to i8)) to i64
  %6 = shl i64 %2, %5
  %7 = or i64 %4, %6

This is because when this code was originally written there were no
funnel shifts and as I recall it was necessary to create an i8 and zext
to pointer width (instead of just having a ptrtoint of pointer width)
in order for the shl/shr/or to be pattern matched to ror. At the time
this caused no problems because there existed a zext ConstantExpr. But
after zext ConstantExpr was removed in #71040, the newly present zext
instruction can prevent pattern matching the rotate, for example if
the zext gets hoisted to a loop preheader or common ancestor of the
check. LowerTypeTests was made to use fshr in #141735 so now we can
ptrtoint to pointer width and stop creating the zext.

Reviewers: fmayer, nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/142886
2025-06-05 11:10:35 -07:00
Peter Collingbourne
3fa231f47c
Add SimplifyTypeTests pass.
This pass figures out whether inlining has exposed a constant address to
a lowered type test, and remove the test if so and the address is known
to pass the test. Unfortunately this pass ends up needing to reverse
engineer what LowerTypeTests did; this is currently inherent to the design
of ThinLTO importing where LowerTypeTests needs to run at the start.

Reviewers: teresajohnson

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/141327
2025-06-05 11:09:20 -07:00
Peter Collingbourne
d1b0b4bb44
Add -funique-source-file-identifier option.
This option complements -funique-source-file-names and allows the user
to use a different unique identifier than the source file path.

Reviewers: teresajohnson

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/142901
2025-06-05 10:52:01 -07:00
Florian Hahn
eb83c43fe9
[Matrix] Don't update Changed based on Visit* return value (NFC). (#142417)
Visit* are always modifying the IR, remove the boolean result.

Depends on https://github.com/llvm/llvm-project/pull/142416.

PR: https://github.com/llvm/llvm-project/pull/142417
2025-06-05 17:54:06 +01:00
Vasileios Porpodas
79861d2db7 Reapply "[SandboxVec] Add a simple pack reuse pass (#141848)"
This reverts commit 31abf0774232735ad7a7d45e531497305bf99fae.
2025-06-05 09:14:17 -07:00
Snehasish Kumar
16c7b3c9f5
[MemProf] Split MemProfiler into Instrumentation and Use. (#142811)
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
2025-06-05 07:36:50 -07:00
Ryotaro Kasuga
1e5f7f64b0
[LoopInterchange] Handle confused dependence correctly (#140709)
This patch fixes the handling of a confused `Dependence` object. Such an
object doesn’t contain any information about dependencies, so we must
process it conservatively. However, it was converted into a direction
vector like `[I I ... I]`. As a result, it was treated as if there are
no loop-carried dependencies, which can lead to illegal loop exchanges.

Fixes #140238
2025-06-05 16:44:02 +09:00
Florian Hahn
2e337349f4
[VPlan] Remove unnecessary DomTreeUpdater flush (NFC).
The current version does not need the explicit flush at this point.
2025-06-05 08:17:42 +01:00
Vasileios Porpodas
31abf07742 Revert "[SandboxVec] Add a simple pack reuse pass (#141848)"
This reverts commit 1268352656f81ea173860a8002aadb88844137e7.
2025-06-04 14:24:49 -07:00
vporpo
1268352656
[SandboxVec] Add a simple pack reuse pass (#141848)
This patch implements a simple pass that tries to de-duplicate packs. If
there are two packing patterns inserting the exact same values in the
exact same order, then we will keep the top-most one of them. Even
though such patterns may be optimized away by subsequent passes it is
still useful to do this within the vectorizer because otherwise the cost
estimation may be off, making the vectorizer over conservative.
2025-06-04 14:12:06 -07:00
Teresa Johnson
3ec2de2753
[MemProf] Optionally save context size info on largest cold allocations (#142837)
Reapply PR142507 with fix for test: add in the same x86_64-linux
requirement as other tests as the stack ids are currently computed
differently on big endian systems. This will be investigated separately.

In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
2025-06-04 13:08:56 -07:00
PiJoules
f32f048719
[llvm] Use ABI instead of preferred alignment for const prop checks (#142500)
We'd hit an assertion checking proper alignment for an i8 when building
chromium because we used the prefered alignment (which is 4 bytes)
instead of the ABI alignment (which is 1 byte). The ABI alignment should
be used because that's the actual alignment needed to load a constant
from the vtable.

This also updates the two `virtual-const-prop-small-alignment-*` to
explicitly give ABI alignments for i64s.
2025-06-04 10:57:59 -07:00
John Brawn
81d3189891
[LAA] Keep pointer checks on partial analysis (#139719)
Currently if there's any memory access that AccessAnalysis couldn't
analyze then all of the runtime pointer check results are discarded.
This patch makes this able to be controlled with the AllowPartial
option, which makes it so we generate the runtime check information
for those pointers that we could analyze, as transformations may still
be able to make use of the partial information.

Of the transformations that use LoopAccessAnalysis, only
LoopVersioningLICM changes behaviour as a result of this change. This is
because the others either:
* Check canVectorizeMemory, which will return false when we have partial
pointer information as analyzeLoop() will return false.
* Examine the dependencies returned by getDepChecker(), which will be
empty as we exit analyzeLoop if we have partial pointer information
before calling areDepsSafe(), which is what fills in the dependency
information.
2025-06-04 16:47:20 +01:00
Yingwei Zheng
519cb460f6
[SCCP] Remove masking operations (#142736)
CVP version:
2d5820cd72
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=3ec0c5c7fef03985b43432c6b914c289d8a5435e&to=92b4df90695dd37defdabf8a30f0b0322b648a00&stat=instructions:u
2025-06-04 22:31:08 +08:00
clubby789
8ed3cb0e64
[DSE] Fix uninitialized variable (#142768)
Introduced by accident in #138299
(https://lab.llvm.org/buildbot/#/builders/164/builds/10604)
2025-06-04 15:00:28 +02:00
Yingwei Zheng
5e2dcfe42c
[InstCombine] Avoid infinite loop in foldSelectValueEquivalence (#142754)
Before this patch, InstCombine hung because it replaced a value with a
more complex one:
```
%sel = select i1 %cmp, i32 %smax, i32 0 ->
%sel = select i1 %cmp, i32 %masked, i32 0 ->
%sel = select i1 %cmp, i32 %smax, i32 0 ->
...
```
This patch makes this replacement more conservative. It only performs
the replacement iff the new value is one of the operands of the original
value.

Closes https://github.com/llvm/llvm-project/issues/142405.
2025-06-04 19:42:56 +08:00
Yingwei Zheng
e2c698c7e8
[InstCombine] Fix miscompilation in sinkNotIntoLogicalOp (#142727)
Consider the following case:
```
define i1 @src(i8 %x) {
  %cmp = icmp slt i8 %x, -1
  %not1 = xor i1 %cmp, true
  %or = or i1 %cmp, %not1
  %not2 = xor i1 %or, true
  ret i1 %not2
}
```
`sinkNotIntoLogicalOp(%or)` calls `freelyInvert(%cmp,
/*IgnoredUser=*/%or)` first. However, as `%cmp` is also used by `Op1 =
%not1`, the RHS of `%or` is set to `%cmp.not = xor i1 %cmp, true`. Thus
`Op1` is out of date in the second call to `freelyInvert`. Similarly,
the second call may change `Op0`. According to the analysis above, I
decided to avoid this fold when one of the operands is also a user of
the other.

Closes https://github.com/llvm/llvm-project/issues/142518.
2025-06-04 17:48:01 +08:00
Florian Hahn
370d01765c
[Matrix] Use shape info for StoreInst directly. (#142664)
ShapeInfo for the store operand may be dropped, e.g. because the operand
got folded by transpose optimizations to another instruction w/o shape
info. This was exposed by the assertion added in
https://github.com/llvm/llvm-project/pull/142416.

This updates VisitStore to use the shape-info directly from the
instruction, which is in line with the other Visit* functions and
ensures that we won't lose shape info.

PR: https://github.com/llvm/llvm-project/pull/142664
2025-06-04 09:15:57 +01:00
clubby789
c7c79d2590
[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299)
Add a `alloc-variant-zeroed` function attribute which can be used to
inform folding allocation+memset. This addresses
https://github.com/rust-lang/rust/issues/104847, where LLVM does not
know how to perform this transformation for non-C languages.

Co-authored-by: Jamie <jamie@osec.io>
2025-06-04 09:35:20 +02:00
Nikita Popov
9a0197c3a4
[EarlyCSE] Check attributes for commutative intrinsics (#142610)
Commutative intrinsics go through a separate code path, which did not
check for attribute compatibility, resulting in a later assertion
failure.

Fixes https://github.com/llvm/llvm-project/issues/142462.
2025-06-04 09:04:27 +02:00
Yingwei Zheng
7e1fa09ce2
[SimplifyCFG] Bail out on vector GEPs in passingValueIsAlwaysUndefined (#142526)
Closes https://github.com/llvm/llvm-project/issues/142522.
2025-06-04 12:37:30 +08:00
Vitaly Buka
3531cc1cc7
[PromoteMem2Reg] Optimize memory usage in PromoteMem2Reg (#142474)
When BasicBlock has a large number of allocas, and
successors, we had to copy entire IncomingVals and
IncomingLocs vectors for successors.

Also updates to IncomingVals and
IncomingLocs are infrequent (only Load/Store into
alloca affect arrays).

Given the nature of DFS traversal, instead of copying
the entire vector, we can keep track of the changes
and undo all changes done by successors.

Fixes #142461

On the attached to issue #142461 IR RSS drops from 35Gb to 1.8Gb.

But it does not affect compile time on average

https://llvm-compile-time-tracker.com/compare.php?from=2e98ed8caa0b47ee79af4ad24b5436a89fe49dfa&to=effac6d1fd600e544f8bc21382c7e541973b1378&stat=instructions:u
2025-06-03 19:34:36 -07:00
Teresa Johnson
6c1091ea3f
Revert "[MemProf] Optionally save context size info on largest cold allocations" (#142688)
Reverts llvm/llvm-project#142507 due to buildbot failures that I will
look into tomorrow.
2025-06-03 16:05:16 -07:00
Vitaly Buka
ac4893dd77
[NFC][PromoteMem2Reg] Move IncomingVals, IncomingLocs, Worklist into class (#142468)
They are all DFS state related, as `Visited`. But `Visited` is already a
class member, so we make things more consistent and less
parameters to pass around.

By itself, the patch has little value, but it simplifies stuff in the
#142474.

For #142461
2025-06-03 14:47:45 -07:00
Teresa Johnson
f2adae5780
[MemProf] Optionally save context size info on largest cold allocations (#142507)
In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.

Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.

To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.

As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
2025-06-03 14:20:38 -07:00
Craig Topper
96336b2330
[AggressiveInstCombine] Improve popcount matching if the input has known zero bits (#142501)
If the input has known zero bits, InstCombine may have simplied one
of the expected And masks. Teach AggressiveInstCombine to use
MaskedValueIsZero to recover these missing bits.
    
Fixes #142042.
2025-06-03 12:56:53 -07:00
Vitaly Buka
3cb967a2cd
[NFCI][PromoteMem2Reg] Don't handle the first successor out of order (#142464)
Just for consistency, to avoid confusing conditions.

`reverse` helps to avoid tests updates as nothing is
changing for for successors count <=2.

For #142461
2025-06-03 10:26:55 -07:00
Vitaly Buka
b9dec5aa79
[NFC] Remove goto in PromoteMem2Reg::RenamePass (#142454)
'goto' is essentially a shortcut for push/pop for worklist.
It can be expensive if we copy vectors, but if we move them, it
should not be an issue.

Without 'goto' it's easier to reason about the
code, when `PromoteMem2Reg::RenamePass` processes
exactly one edge at a time.

There is out of order processing of the first
successor, I keep it just to make this patch pure NFC. I'll
remove this in follow up patches.

For #142461
2025-06-03 09:14:04 -07:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Ramkumar Ramachandra
6716d4eaa8
[LV] Prefer DenseMap::lookup over find (NFC) (#141809)
Apart from the stylistic improvement, lookup has the nice property of
returning a default-constructed object on failure-to-find, while find
returns the end iterator, which cannot be dereferenced.
2025-06-03 14:37:19 +01:00
Florian Hahn
5520ab3d50
[VPlan] Add ComputeAnyOfResult VPInstruction (NFC) (#141932)
Add a dedicated opcode for any-of reduction, similar to
https://github.com/llvm/llvm-project/pull/132689 and
https://github.com/llvm/llvm-project/pull/132690.

The patch also explictly adds the start value to not require
RecurrenceDescriptor during execute. It also allows freezing the start
value to make it poison-safe.

PR: https://github.com/llvm/llvm-project/pull/141932
2025-06-03 14:33:53 +01:00
Luke Lau
ddfeecf4c5
[VPlan] Convert to concrete recipes before dissolving loop regions. NFCI (#141999)
After updating #118638 on tip of tree, expanding
VPWidenIntOrFpInductionRecipes fails because it needs the loop region to
get the latch to insert the increment into:

VPBasicBlock *ExitingBB =
Plan->getVectorLoopRegion()->getExitingBasicBlock();
Builder.setInsertPoint(ExitingBB,
ExitingBB->getTerminator()->getIterator());
    auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags,
WidenIVR->getDebugLoc(), "vec.ind.next");

However after #117506, the region is dissolved so it doesn't work.

This shuffles the dissolveLoopRegions steps to be after
convertToConcreteRecipes so we can use the region when expanding
VPWidenIntOrFpInductionRecipes
2025-06-03 12:05:13 +01:00
Weibo He
038dc2c63b
[CoroSplit] Always erase lifetime intrinsics for spilled allocas (#142551)
If the control flow between `lifetime.start` and `lifetime.end` is too
complex, it is acceptable to give up the optimization opportunity and
collect the alloca to the frame. However, storing to the frame will
lengthen the lifetime of the alloca, and the sanitizer will complain. I
propose we always erase lifetime intrinsics of spilled allocas.

Fix #124612

---------

Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
2025-06-03 18:52:41 +08:00
Florian Hahn
2eab83f618
[VPlan] Remove CanonicalIV when dissolving loop regions (NFC). (#142372)
Directly replace the canonical IV when we dissolve the containing
region. That ensures that it won't get removed before the region gets
removed, which would result in an invalid region.

This removes the current ordering constraint between
convertToConcreteRecipes and dissolving regions.

PR: https://github.com/llvm/llvm-project/pull/142372
2025-06-03 10:05:28 +01:00
Kazu Hirata
54d836a080
[llvm] Use *Set::insert_range (NFC) (#138237) 2025-06-02 19:48:13 -07:00
Usama Hameed
cc400d4417
[HWASan][bugfix] Fix kernel check in ShadowMapping::init (#142226)
The function currently checks for the command line argument only to
check if compiling for kernel. This is incorrect as the setting can also
be passed programatically.
2025-06-02 10:39:15 -07:00
Florian Hahn
adba40e188
[Matrix] Assert that there's shapeinfo in Visit* (NFC). (#142416)
We should only call Visit* for instructions with shape info. Turn early
exit into assert.

PR: https://github.com/llvm/llvm-project/pull/142416
2025-06-02 18:18:42 +01:00
Florian Hahn
11713e86b0
[LV] Move VPlan-based calculateRegisterUsage to VPlanAnalysis (NFC). (#135673)
Move VPlan-based calculateRegisterUsage from LoopVectorize
to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps
to reduce the size of LoopVectorize.

PR: https://github.com/llvm/llvm-project/pull/135673
2025-06-02 17:40:50 +01:00
Kazu Hirata
c261bb7649
[memprof] Deduplicate alloc site matches (#142334)
With:

  commit 2425626d803002027cbf71c39df80cb7b56db0fb
  Author: Kazu Hirata <kazu@google.com>
  Date:   Sun Jun 1 08:09:58 2025 -0700

we print out a lot of duplicate alloc site matches.

This patch partially reverts the patch above.  The core idea of using
a map to deduplicate entries remains the same, but details are
different.  Specifically:

- This PR uses the [FullStackID, MatchLength] as the key, where
  MatchLength is the length of an alloc site match.

- AllocMatchInfo in this PR no longer has Matched because we always
  report matches.

- AllocMatchInfo in this PR no longer has NumFramesMatched because it
  has become part of the key.

This deduplication roughly halves the amount of messages printed out.
2025-06-02 07:59:34 -07:00