638 Commits

Author SHA1 Message Date
Shoreshen
00ee53cc7b
[Attributor] Propagate alignment through ptrmask (#150158)
Propagate alignment through ptrmask based on potential constant values
of mask and align of ptr.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-11-04 12:26:17 +08:00
Kazu Hirata
707bab651f
[llvm] Remove redundant typename (NFC) (#166087)
Identified with readability-redundant-typename.
2025-11-02 13:15:16 -08:00
Kazu Hirata
042ac912b1
[llvm] Add "override" where appropriate (NFC) (#165168)
Note that "override" makes "virtual" redundant.

Identified with modernize-use-override.
2025-10-26 13:34:32 -07:00
Kazu Hirata
ae78957112
[Support] Rename CTLog2 to ConstantLog2 in MathExtras.h (#158006)
This patch renames CTLog2 to ConstantLog2 for readability.

This patch provides a forwarder under LLVM_DEPRECATED because CTLog2
is used downstream.
2025-09-11 07:54:27 -07:00
Philip Reames
e6b4a21849
[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856)
Goal is simply to reduce direct usage of getLength and setLength so that
if we end up moving memset.pattern (whose length is in elements) there
are fewer places to audit.
2025-08-20 13:50:11 -07:00
Kazu Hirata
228e96b28a
[llvm] Use std::make_optional (NFC) (#151627)
std::make_optional<T> is a lot like std::make_unique<T> in that it
performs perfect forwarding of arguments for T's constructor.  As a
result, we don't have to repeat type names twice.
2025-08-01 00:24:40 -07:00
Jeremy Morse
57a5f9c47e
[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
2025-07-15 15:34:10 +01:00
Shoreshen
181b014c06
Attributor: Infer noalias.addrspace metadata for memory instructions (#136553)
Add noalias.addrspace metadata for store, load and atomic instruction in
AMDGPU backend.
2025-07-08 09:50:31 +08:00
Andreas Jonson
0a067dc107
[Attributor] Swap range metadata to attribute for calls. (#108835) 2025-07-05 16:47:03 +02:00
zGoldthorpe
f393211454
[Reland][IPO] Added attributor for identifying invariant loads (#146584)
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.

The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
2025-07-01 17:46:19 -04:00
zGoldthorpe
00ae89a1cb
Revert "[IPO] Added attributor for identifying invariant loads" (#144808)
Reverts llvm/llvm-project#141800

The implementation critically misunderstands the `AAMemoryBehavior`
attributor, which it relies on heavily.

@shiltian, since I do not have commit permissions.
2025-06-18 18:35:01 -04:00
zGoldthorpe
25dcd231bf
[IPO] Added attributor for identifying invariant loads (#141800)
The attributor conservatively marks pointers whose loads are eligible to
be marked as `!invariant.load`.
It does so by identifying:
1. Pointers marked `noalias` and `readonly`
2. Pointers whose underlying objects are all eligible for invariant
loads.

The attributor then manifests this attribute at non-atomic non-volatile
load instructions.
2025-06-16 11:16:47 -05:00
Shilei Tian
f32b75658f
[Attributor] Use known non-flat AS before getAssumedAddrSpace (#143221)
If the underlying object already has a non-flat address space, we simply
use
that before calling `getAssumedAddrSpace`.

Partially fixes SWDEV-536263.
2025-06-09 10:11:34 -04:00
Kazu Hirata
54d836a080
[llvm] Use *Set::insert_range (NFC) (#138237) 2025-06-02 19:48:13 -07:00
Shilei Tian
4d48673562 Reapply "Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)""
This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.
2025-05-30 22:11:22 -04:00
Shilei Tian
37ea3b32cd Revert "Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)""
This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.
2025-05-30 22:06:16 -04:00
Shilei Tian
4efc13f8ff Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)"
This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.
2025-05-30 21:56:24 -04:00
Shilei Tian
3c6211c183 Revert "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)"
This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.
2025-05-30 21:15:25 -04:00
Shilei Tian
9bf6b2a8cb
[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488) 2025-05-30 17:30:42 -04:00
Alex MacLean
3a84a4e55d
Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143)
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.

Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
2025-05-22 19:38:10 -07:00
Alex Maclean
e72d8b2553 Revert "[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)"
This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.
2025-05-22 17:28:43 +00:00
Alex MacLean
735209c068
[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
2025-05-21 08:14:15 -07:00
Kazu Hirata
1ecba5bd62
[llvm] Use std::tie to implement operator< (NFC) (#139487) 2025-05-11 21:28:47 -07:00
Kazu Hirata
2e230f5685
[llvm] Use llvm::interleaved (NFC) (#137496) 2025-04-26 23:28:46 -07:00
Matt Arsenault
37b135cc8f
Attributor: Don't rely on use_empty for constants (#137218)
This allows inferring noalias on a null argument parameter. This
avoids a non-NFC diff in a future change.
2025-04-24 21:41:55 +02:00
Nikita Popov
d69ee885cc
[CaptureTracking] Remove dereferenceable_or_null special case (#135613)
Remove the special case where comparing a dereferenceable_or_null
pointer with null results in captures(none) instead of
captures(address_is_null).

This special case is not entirely correct. Let's say we have an
allocated object of size 2 at address 1 and have a pointer `%p` pointing
either to address 1 or 2. Then passing `gep p, -1` to a
`dereferenceable_or_null(1)` function is well-defined, and allows us to
distinguish between the two possible pointers, capturing information
about the address.

Now that we ignore address captures in alias analysis, I think we're
ready to drop this special case. Additionally, if there are regressions
in other places, the fact that this is inferred as address_is_null
should allow us to easily address them if necessary.
2025-04-17 12:44:57 +02:00
Matt Arsenault
34e8f00066
Attributor: Propagate align to cmpxchg instructions (#134838)
Fixes #134480
2025-04-08 22:15:50 +07:00
Matt Arsenault
66f0343609
Attributor: Propagate align to atomicrmw instructions (#134837)
Partially fixes #134480
2025-04-08 22:12:20 +07:00
Matt Arsenault
783201b184
Attributor: Don't follow uses of ConstantData (#134573)
These should not really have uselists, and it's not worth the compile
time of looking at all uses of trivial constants. The main observable
change of this is it no longer adds align attributes on constant null
uses, but those are not useful. Some of these cases should potentially
be more aggressive and not look at any Constant users.
2025-04-07 23:59:53 +07:00
Tim Gymnich
049f179606
[Analysis][NFC] Extract KnownFPClass (#133457)
- extract KnownFPClass for future use inside of GISelKnownBits

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-03-28 18:10:02 +01:00
Kazu Hirata
0dcc201ac4
[Transforms] Use *Set::insert_range (NFC) (#132056)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-19 15:35:01 -07:00
Kazu Hirata
8789c0083d
[Transforms] Avoid repeated hash lookups (NFC) (#131554) 2025-03-17 07:42:21 -07:00
Johannes Doerfert
9f28621fae
[Attributor][NFC] Clang format (#129163) 2025-02-27 23:59:08 -05:00
Nikita Popov
e56a6a2683
Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880) (#128020)
Relative to the previous attempt this includes two fixes:
 * Adjust callCapturesBefore() to not skip captures(ret: address,
    provenance) arguments, as these will not count as a capture
    at the call-site.
 * When visiting uses during stack slot optimization, don't skip
    the ModRef check for passthru captures. Calls can both modref
    and be passthru for captures.

------

This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-27 09:38:29 +01:00
Nico Weber
e2ba1b6ffd Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)"
This reverts commit 0fab404ee874bc5b0c442d1841c7d2005c3f8729.
Seems to break LTO builds of clang on Windows, see comments on
https://github.com/llvm/llvm-project/pull/125880
2025-02-19 11:32:57 -05:00
Nikita Popov
7e3735d1a1 Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)
Relative to the previous attempt, this adjusts isEscapeSource()
to not treat calls with captures(ret: address, provenance) or similar
arguments as escape sources. This addresses the miscompile reported at:
https://github.com/llvm/llvm-project/pull/125880#issuecomment-2656632577

The implementation uses a helper function on CallBase to make this
check a bit more efficient (e.g. by skipping the byval checks) as
checking attributes on all arguments if fairly expensive.

------

This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-14 12:38:04 +01:00
Nikita Popov
1e64ea9914 Revert "[CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)"
This reverts commit ee655ca27aad466bcc54f6eba03f7e564940ad5a.

A miscompilation has been reported at:
https://github.com/llvm/llvm-project/pull/125880#issuecomment-2656632577
2025-02-13 14:56:12 +01:00
Nikita Popov
ee655ca27a
[CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880)
This extends CaptureTracking to support inferring non-trivial
CaptureInfos. The focus of this patch is to only support FunctionAttrs,
other users of CaptureTracking will be updated in followups.

The key API changes here are:

* DetermineUseCaptureKind() now returns a UseCaptureInfo where the UseCC
component specifies what is captured at that Use and the ResultCC
component specifies what may be captured via the return value of the
User. Usually only one or the other will be used (corresponding to
previous MAY_CAPTURE or PASSTHROUGH results), but both may be set for
call captures.
* The CaptureTracking::captures() extension point is passed this
UseCaptureInfo as well and then can decide what to do with it by
returning an Action, which is one of: Stop: stop traversal.
ContinueIgnoringReturn: continue traversal but don't follow the
instruction return value. Continue: continue traversal and follow the
instruction return value if it has additional CaptureComponents.

For now, this patch retains the (unsound) special logic for comparison
of null with a dereferenceable pointer. I'd like to switch key code to
take advantage of address/address_is_null before dropping it.

This PR mainly intends to introduce necessary API changes and basic
inference support, there are various possible improvements marked with
TODOs.
2025-02-13 09:36:35 +01:00
Nikita Popov
8a43d0e873 [Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR()
This case is intended to check the callee argument, not the call-site.

Fixes an issue introduced in #123181.
2025-01-29 17:34:10 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Mats Jun Larsen
416f1c465d
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
2025-01-21 00:32:56 +09:00
macurtis-amd
d1a6eaa478
[Attributor][NFC] Performance improvements (#122923)
` forallInterferingAccesses` is a hotspot and for large modules these
changes make a measurable improvement in compilation time.

For LTO kernel compilation of 519.clvleaf (SPEChpc 2021) I measured the
following:
```
                    |   Measured times (s)   | Average | speedup
--------------------+------------------------+---------+---------
Baseline            | 33.268  33.332  33.275 |  33.292 |      0%
Cache "kernel"      | 30.543  30.339  30.607 |  30.496 |    9.2%
templatize callback | 30.981  30.97   30.964 |  30.972 |    7.5%
Both changes        | 29.284  29.201  29.053 |  29.179 |   14.1%
```
2025-01-14 12:51:25 -06:00
Jay Foad
f8559751fc
[llvm-project] Fix typo "propogate" (#114795) 2024-11-04 15:33:19 +00:00
Kazu Hirata
98ea1a81a2
[IPO] Remove unused includes (NFC) (#114716)
Identified with misc-include-cleaner.
2024-11-03 13:48:55 -08:00
Shilei Tian
5a74a4a667
[Attributor] Take the address space from addrspacecast directly (#108258)
Currently `AAAddressSpace` relies on identifying the address spaces of
all underlying objects. However, it might infer sub-optimal address
space when the underlying object is a function argument. In
`AMDGPUPromoteKernelArgumentsPass`, the promotion of a pointer kernel
argument is by adding a series of `addrspacecast` instructions (as shown
below), and hoping `InferAddressSpacePass` can pick it up and do the
rewriting accordingly.

Before promotion:

```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
  %val = load i32, ptr %to_be_promoted
  ...
  ret void
}
```

After promotion:

```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
  %ptr.cast.0 = addrspace cast ptr % to_be_promoted to ptr addrspace(1)
  %ptr.cast.1 = addrspace cast ptr addrspace(1) %ptr.cast.0 to ptr
  # all the use of %to_be_promoted will use %ptr.cast.1
  %val = load i32, ptr %ptr.cast.1
  ...
  ret void
}
```

When `AAAddressSpace` analyzes the code after promotion, it will take
`%to_be_promoted` as the underlying object of `%ptr.cast.1`, and use its
address space (which is 0) as its final address space, thus simply do
nothing in `manifest`. The attributor framework will them eliminate the
address space cast from 0 to 1 and back to 0, and replace `%ptr.cast.1`
with `%to_be_promoted`, which basically reverts all changes by
`AMDGPUPromoteKernelArgumentsPass`.

IMHO I'm not sure if `AMDGPUPromoteKernelArgumentsPass` promotes the
argument in a proper way. To improve the handling of this case, this PR
adds an extra handling when iterating over all underlying objects. If an
underlying object is a function argument, it means it reaches a terminal
such that we can't futher deduce its underlying object further. In this
case, we check all uses of the argument. If they are all `addrspacecast`
instructions and their destination address spaces are same, we take the
destination address space.

Fixes: SWDEV-482640.
2024-10-09 22:51:07 -04:00
Johannes Doerfert
335e137267
[Attributor][FIX] Track returned pointer offsets (#110534)
If the pointer returned by a function is not "the base pointer" but has
an offset, we need to track the offset such that users can apply it to
their offset chain when they create accesses.
This was reported by @ye-luo and reduced test cases are included. The
OffsetInfo was moved and the container was replaced with a set to avoid
excessive growth. Otherwise, the patch just replaces the "returns
pointer" flag with the "returned offsets", and deals with the applying
to offsets at the call site.

---------

Co-authored-by: Johannes Doerfert <jdoerfert@llnl.gov>
2024-10-01 12:41:15 -05:00
Jeremy Morse
96f37ae453
[NFC] Use initial-stack-allocations for more data structures (#110544)
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.

Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
2024-09-30 23:15:18 +01:00
Shilei Tian
0b7a18bd4a
[Attributor] Use more appropriate approach to check flat address space (#108713) 2024-09-27 18:26:55 -04:00
macurtis-amd
72fd35b85b
[Attributor] Report change when updating ReachesReturn (#108965) 2024-09-19 11:10:18 -05:00