693 Commits

Author SHA1 Message Date
Jakub Kuderski
4c21d0cb14
[ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Pankaj Dwivedi
4d7093b806
[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819)
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.

see the PR:https://github.com/llvm/llvm-project/pull/116953
2025-10-30 12:32:32 +05:30
Pankaj Dwivedi
20532c0aab
[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265)
There has been an issue(using function analysis inside the module pass
in OPM) integrating this pass into the LLC pipeline, which currently
lacks NPM support. I tried finding a way to get the per-function
analysis, but it seems that in OPM, we don't have that option.

So the best approach would be to make it a function pass.

Ref: https://github.com/llvm/llvm-project/pull/116953
2025-10-29 11:56:43 +05:30
Sam Clegg
7ebc3dbe8b
[llvm] Make getEffectiveRelocModel helper consistent across targets. NFC (#165121)
- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.
2025-10-25 21:20:20 -07:00
paperchalice
f3df058b03
[Passes] Report error when pass requires target machine (#142550)
Fixes #142146
Do nullptr check when pass accept `const TargetMachine &` in
constructor, but it is still not exhaustive.
2025-10-23 12:57:03 +08:00
Carl Ritson
af6fa77a35
[AMDGPU] Add DAG mutation to improve scheduling before barriers (#142716)
Add scheduler DAG mutation to add data dependencies between atomic
fences and preceding memory reads. This allows some modelling of the
impact an atomic fence can have on outstanding memory accesses.

This is beneficial when a fence would cause wait count insertion, as
more instructions will be scheduled before the fence hiding memory
latency.
2025-10-21 13:28:52 +09:00
Pankaj Dwivedi
53aad35208
[AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (#116953)
This pass introduces optimizations for AMDGPU intrinsics by leveraging
the uniformity of their arguments. When an intrinsic's arguments are
detected as uniform, redundant computations are eliminated, and the
intrinsic calls are simplified accordingly.

By utilizing the UniformityInfo analysis, this pass identifies cases
where intrinsic calls are uniform across all lanes, allowing
transformations that reduce unnecessary operations and improve the IR's
efficiency.

These changes enhance performance by streamlining intrinsic usage in
uniform scenarios without altering the program's semantics.

For background, see PR #99878
2025-10-09 12:44:56 +05:30
Shilei Tian
4ddc0f3ffd
[AMDGPU] Add the missing enabling check of AMDGPUAttributor (#162420) 2025-10-08 04:21:04 +00:00
James Y Knight
783c1a7617
AMDGPU: skip AMDGPUAttributor pass on R600 some more. (#162418)
This is a follow-up for #162207, where I neglected to skip the second
use of AMDGPUAttributor for R600 targets. This use is covered by the
test lld/test/ELF/lto/r600.ll.
2025-10-08 03:40:30 +00:00
James Y Knight
2f4275b195
AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. (#162207)
These passes call `getSubtarget<GCNSubtarget>`, which doesn't work on
R600 targets, as that uses an `R600Subtarget` type, instead.
Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked
static_cast to `ST&`, which makes it easy for this error to go
undetected.

The modifications here were verified by running check-llvm with an
assert added to getSubtarget. However, that asssert requires that RTTI
is enabled, which LLVM doesn't use, so I've reverted the assert before
sending this fix upstream.

These errors have been present for some time, but were detected after
#162040 caused an uninitialized memory read to be reported by asan/msan.
2025-10-07 16:17:59 +00:00
Gang Chen
640644d68a
[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove the fixme (#161531)
Move LowerBufferFatPointers pass after CodegenPrepare and
LoadStoreVectorizer pass, and remove the fixme about that.
2025-10-01 17:52:15 -07:00
Reid Kleckner
f3efbce4a7
[llvm] Move data layout string computation to TargetParser (#157612)
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.

Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.

By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.

This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.

The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-11 11:05:29 -07:00
Stanislav Mekhanoshin
5901d896f4
[AMDGPU] Register amdgpu-lower-vgpr-encoding pass in npm (#156971) 2025-09-05 00:07:47 -07:00
Stanislav Mekhanoshin
1f0f3473e6
[AMDGPU] High VGPR lowering on gfx1250 (#156965) 2025-09-04 16:20:47 -07:00
Nicolai Hähnle
353b5e43c6
AMDGPU: Refactor lowering of s_barrier to split barriers (#154648)
Let's do the lowering of non-split into split barriers in a new IR pass,
AMDGPULowerIntrinsics. That way, there is no code duplication between
SelectionDAG and GlobalISel. This simplifies some upcoming extensions to
the code.
2025-08-28 07:01:20 -07:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
Shoreshen
04aebbfbe2
[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548)
Fixes #153150
2025-08-14 16:16:32 +08:00
Vikram Hegde
e50bd78d54
Reapply "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#151098)
Reapplies https://github.com/llvm/llvm-project/pull/148114
includes shared lib build failure fixes for AMDGPU and X86.
2025-07-30 13:29:15 +05:30
Alex Voicu
6bcff9eb13
[HIPSTDPAR] Add handling for math builtins (#140158)
When compiling in `--hipstdpar` mode, the builtins corresponding to the
standard library might end up in code that is expected to execute on the
accelerator (e.g. by using the `std::` prefixed functions from
`<cmath>`). We do not have uniform handling for this in AMDGPU, and the
errors that obtain are quite arcane. Furthermore, the user-space changes
required to work around this tend to be rather intrusive.

This patch adds an additional `--hipstdpar` specific pass which forwards
to the run time component of HIPSTDPAR the intrinsics / libcalls which
result from the use of the math builtins, and which are not properly
handled. In the long run we will want to stop relying on this and handle
things in the compiler, but it is going to be a rather lengthy journey,
which makes this medium term escape hatch necessary.

The paired change in the run time component is here
<https://github.com/ROCm/rocThrust/pull/551>.
2025-07-28 22:29:31 +01:00
Vikram Hegde
495774d6d5
Revert "[CodeGen][NPM] Stitch up loop passes in codegen pipeline" (#150883)
Reverts llvm/llvm-project#148114

will update with fixed PR.
2025-07-28 11:28:00 +05:30
Vikram Hegde
d35bf478a8
[CodeGen][NPM] Stitch up loop passes in codegen pipeline (#148114)
same as https://github.com/llvm/llvm-project/pull/133050

Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-28 11:13:44 +05:30
Matt Arsenault
8f3e78f971
AMDGPU: Add pass to replace constant materialize with AV pseudos (#149292)
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.

The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
2025-07-18 17:15:38 +09:00
Vikram Hegde
72c61a6a25
[AMDGPU][NPM] Fill in addPreSched2 passes (#148112)
same as https://github.com/llvm/llvm-project/pull/139516

Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-17 12:26:27 +05:30
Vikram Hegde
1ccd779324
[AMDGPU][NewPM] Port "AMDGPUResourceUsageAnalysis" to NPM (#130959) 2025-07-10 13:35:43 +05:30
Akshat Oke
edaf656d5e
[CodeGen][NPM] Differentiate pipeline-required and opt-required passes (#135752)
"Required" passes relate to actually running the pass on the IR,
regardless of whether they are in the pipeline.

CGPassBuilder was mistakenly still adding them to the pipeline.

The test `llc -stop-after=greedy -enable-new-pm` would still add
`greedy` to the pipeline otherwise.
2025-07-09 14:52:58 +05:30
Akshat Oke
415733bac4
[AMDGPU][NPM] Complete optimized regalloc pipeline (#138491)
Also fill in some other passes.
2025-07-09 14:14:55 +05:30
Akshat Oke
f78691630a
[CodeGen][NPM] Support CodeGenSCCOrder in pipeline (#136818)
Wrap passes into Post order CGSCC pass manager in codegen pass builder.

I am adding the pipeline test in this but it is not yet complete.
2025-07-09 11:32:21 +05:30
Matt Arsenault
c8ea114741
AMDGPU: Introduce a pass to replace VGPR MFMAs with AGPR (#145024)
In gfx90a-gfx950, it's possible to emit MFMAs which use AGPRs or VGPRs
for vdst and src2. We do not want to do use the AGPR form, unless
required by register pressure as it requires cross bank register
copies from most other instructions. Currently we select the AGPR
or VGPR version depending on a crude heuristic for whether it's possible
AGPRs will be required. We really need the register allocation to
be complete to make a good decision, which is what this pass is for.
    
This adds the pass, but does not yet remove the selection patterns
for AGPRs. This is a WIP, and NFC-ish. It should be a no-op on any
currently selected code. It also does not yet trigger on the real
examples of interest, which require handling batches of MFMAs at
once.
2025-06-27 21:05:03 +09:00
Matt Arsenault
2dcf436340
AMDGPU: Remove legacy pass manager version of AMDGPUAttributor (#145262) 2025-06-23 16:49:46 +09:00
Matt Arsenault
96493c514e
AMDGPU: Use reportFatalUsageError for regalloc flag error (#145198) 2025-06-22 22:41:05 +09:00
Matt Arsenault
5f2135df17
AMDGPU: Really delete AMDGPUAnnotateKernelFeatures (#145136) 2025-06-21 17:25:30 +09:00
Matt Arsenault
c361bffa50
AMDGPU: Remove legacy pass manager version of AMDGPUUnifyMetadata (#144985)
This is only run in the new pass manager now.
2025-06-20 16:44:06 +09:00
Matt Arsenault
1cae21da47
AMDGPU: Remove legacy PM version of AMDGPUPromoteAllocaToVector (#144986)
This is only run in the middle end with the new pass manager now,
so garbage collect the old PM version.
2025-06-20 16:43:39 +09:00
Andrew Rogers
19658d1474
[llvm] annotate interfaces in llvm/Target for DLL export (#143615)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.

In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-17 13:28:45 -07:00
Diana Picus
40a7dce9ef
[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead
of keeping two copies for DAG/Global ISel.

Also remove isKernelCC, which doesn't agree with isKernel and doesn't
seem very useful.

While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and
mark them constexpr.
2025-06-05 11:19:20 +02:00
Pengcheng Wang
f393986b53
[MISched] Add templates for creating custom schedulers (#141935)
We rename `createGenericSchedLive` and `createGenericSchedPostRA`
to `createSchedLive` and `createSchedPostRA`, and add a template
parameter `Strategy` which is the generic implementation by default.

This can simplify some code for targets that have custom scheduler
strategy.
2025-06-03 11:37:40 +08:00
Shilei Tian
4d48673562 Reapply "Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)""
This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.
2025-05-30 22:11:22 -04:00
Shilei Tian
37ea3b32cd Revert "Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)""
This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.
2025-05-30 22:06:16 -04:00
Shilei Tian
4efc13f8ff Reapply "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)"
This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.
2025-05-30 21:56:24 -04:00
Shilei Tian
3c6211c183 Revert "[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488)"
This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.
2025-05-30 21:15:25 -04:00
Shilei Tian
9bf6b2a8cb
[AMDGPU] Make getAssumedAddrSpace return AS1 for pointer kernel arguments (#137488) 2025-05-30 17:30:42 -04:00
Shilei Tian
84a69a0f8f
[AMDGPU] Move InferAddressSpacesPass to middle end optimization pipeline (#138604)
It will run twice in the non-LTO pipeline with `O1` or higher. In LTO post link pipeline, it will be run once with `O2` or higher, since inline and SROA don't run in `O1`.
2025-05-29 17:20:56 -04:00
Carl Ritson
e6b43bdde3
[AMDGPU] Cluster export instructions in PostRA Scheduler (#141399)
DAG mutation needs to be applied post-RA to maintain order established
during pre-RA scheduler.
2025-05-26 18:00:43 +09:00
Alexander Richardson
07e2ba445d
[AMDGPU] Set AS8 address width to 48 bits
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.

Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.

Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-19 17:26:05 -07:00
Austin Kerbow
2c9a46cce3
[AMDGPU] Move kernarg preload logic to separate pass (#130434)
Moves kernarg preload logic to its own module pass. Cloned function
declarations are removed when preloading hidden arguments. The inreg
attribute is now added in this pass instead of AMDGPUAttributor. The
rest of the logic is copied from AMDGPULowerKernelArguments which now
only check whether an arguments is marked inreg to avoid replacing
direct uses of preloaded arguments. This change requires test updates to
remove inreg from lit tests with kernels that don't actually want
preloading.
2025-05-11 21:18:11 -07:00
Matthias Braun
675cb70641
Register assembly printer passes (#138348)
Register assembly printer passes in the pass registry.

This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.

Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
2025-05-06 18:01:17 -07:00
Shilei Tian
d6dbe7799e
[AMDGPU][Attributor] Add ThinOrFullLTOPhase as an argument (#123994) 2025-05-02 11:33:56 -04:00
Akshat Oke
e91cbd4f29
[CodeGen][NPM] Port VirtRegRewriter to NPM (#130564) 2025-04-30 14:10:46 +05:30
Sergei Barannikov
bb1765179e
[TTI] Simplify implementation (NFCI) (#136674)
Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.

The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).

The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.

Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-26 15:25:40 +03:00
Christudasan Devadasan
940108b24d
[AMDGPU][NewPM] Make the pass flow consistent with the legacy pipeline. (#136551) 2025-04-21 15:34:15 +05:30