The "at construction" binop folds in SelectionDAG::getNode() has
different behaviour when compared to the equivalent LLVM IR. This PR
makes the behaviour consistent while also extending the coverage to
include signed/unsigned max/min operations.
This PR exposes the backend pass config to plugins via a callback.
Plugin authors can register a callback that is being triggered before
the target backend adds their passes to the pipeline. In the callback
they then get access to the `TargetMachine`, the `PassManager`, and the
`TargetPassConfig`. This allows plugins to call
`TargetPassConfig::insertPass`, which is honored in the subsequent
`addPass` of the main backend. We implemented this using the legacy pass
manager since backends still use it as the default.
This continues s-barannikov's work TableGen-erating SDNode descriptions.
This takes the initial patch from #119709 and moves documentation and the
rest of the AArch64ISD nodes to TableGen. Some issues were found by the
generated SDNode verification added in this patch. These issues have been
described and fixed in the following PRs:
- #140706
- #140711
- #140713
- #140715
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
In `X86MCInstLower::LowerMachineOperand`, a new `MCSymbol` can be
created in `GetSymbolFromOperand(MO)` where `MO.getType()` is
`MachineOperand::MO_ExternalSymbol`
```
case MachineOperand::MO_ExternalSymbol:
return LowerSymbolOperand(MO, GetSymbolFromOperand(MO));
```
at
725a7b664b/llvm/lib/Target/X86/X86MCInstLower.cpp (L196)
However, this newly created symbol will not be marked properly with its
`IsExternal` field since `Ctx.getOrCreateSymbol(Name)` doesn't know if
the newly created `MCSymbol` is for `MachineOperand::MO_ExternalSymbol`.
Looking at other backends, for example `Arch64MCInstLower` is doing for
handling `MC_ExternalSymbol`
14c36db16f/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp (L366-L367)14c36db16f/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp (L145-L148)
It creates/gets the MCSymbol from `AsmPrinter.OutContext` instead of
from `Ctx`. Moreover, `Ctx` for `AArch64MCLower` is the same as
`AsmPrinter.OutContext`.
8e7d6baf0e/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (L100).
This applies to almost all the other backends except X86 and M68k.
```
$git grep "MCInstLowering("
lib/Target/AArch64/AArch64AsmPrinter.cpp💯 : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this),
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:223: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:257: AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/AMDGPU/R600MCInstLower.cpp:52: R600MCInstLower MCInstLowering(OutContext, STI, *this);
lib/Target/ARC/ARCAsmPrinter.cpp:41: MCInstLowering(&OutContext, *this) {}
lib/Target/AVR/AVRAsmPrinter.cpp:196: AVRMCInstLower MCInstLowering(OutContext, *this);
lib/Target/BPF/BPFAsmPrinter.cpp:144: BPFMCInstLower MCInstLowering(OutContext, *this);
lib/Target/CSKY/CSKYAsmPrinter.cpp:41: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(OutContext, *this) {}
lib/Target/Lanai/LanaiAsmPrinter.cpp:147: LanaiMCInstLower MCInstLowering(OutContext, *this);
lib/Target/Lanai/LanaiAsmPrinter.cpp:184: LanaiMCInstLower MCInstLowering(OutContext, *this);
lib/Target/MSP430/MSP430AsmPrinter.cpp:149: MSP430MCInstLower MCInstLowering(OutContext, *this);
lib/Target/Mips/MipsAsmPrinter.h:126: : AsmPrinter(TM, std::move(Streamer)), MCInstLowering(*this) {}
lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp:695: WebAssemblyMCInstLower MCInstLowering(OutContext, *this);
lib/Target/X86/X86MCInstLower.cpp:2200: X86MCInstLower MCInstLowering(*MF, *this);
```
This patch makes `X86MCInstLower` and `M68KInstLower` to have their
`Ctx` from `AsmPrinter.OutContext` instead of getting it from
`MF.getContext()` to be consistent with all the other backends.
I think since normal use case (probably anything other than our
un-conventional case) only handles one llvm module all the way through
in the codegen pipeline till the end of code emission (AsmPrint),
`AsmPrinter.OutContext` is the same as MachineFunction's MCContext, so
this change is an NFC.
----
This fixes an error while running the generated code in ORC JIT for our
use case with
[MCLinker](https://youtu.be/yuSBEXkjfEA?si=HjgjkxJ9hLfnSvBj&t=813) (see
more details below):
https://github.com/llvm/llvm-project/pull/133291#issuecomment-2759200983
We (Mojo) are trying to do a MC level linking so that we break llvm
module into multiple submodules to compile and codegen in parallel
(technically into *.o files with symbol linkage type change), but
instead of archive all of them into one `.a` file, we want to fix the
symbol linkage type and still produce one *.o file. The parallel codegen
pipeline generates the codegen data structures in their own `MCContext`
(which is `Ctx` here). So if function `f` and `g` got split into
different submodules, they will have different `Ctx`. And when we try to
create an external symbol with the same name for each of them with
`Ctx.getOrCreate(SymName)`, we will get two different `MCSymbol*`
because `f` and `g`'s `MCContext` are different and they can't see each
other. This is unfortunately not what we want for external symbols.
Using `AsmPrinter.OutContext` helps, since it is shared, if we try to
get or create the `MCSymbol` there, we'll be able to deduplicate.
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.
Namely:
1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.
2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h
The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.
The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
This reverts commit 4307198d51487cc16f98eebb2113caf4a1905914.
Broke bot ppc64le-clang-multistage-test:
undefined reference to
`llvm::DroppedVariableStats::populateVarIDSetAndInlinedMap in
In function `llvm::DroppedVariableStatsIR::visitEveryInstruction
To get Dropped variable statistics for MIR, we need to move the base
class DroppedVariableStats code to the CodeGen library because we cannot
have CodeGen link against Passes.
Also moved the code for the virtual functions to the header because
clang/lib/CodeGen doesn't link against llvm/lib/CodeGen however it does
link against Passes which contains the `class StandardInstrumentations`
code but not the definition for the virtual functions leading to the
error about not finding vtable for `class DroppedVariableStatsIR`
This reverts commit 10ed7d94b52c21317a1e02ef1e2c3ff2b2d08301.
Revert "Reland 2de78815604e9027efd93cac27c517bf732587d2 (#119650)"
This reverts commit 0e80f9a1b51e0e068adeae1278d59cd7baacd5d8.
This is because the clang-ppc64le-linux-multistage bot breaks with error
undefined reference to `vtable for llvm::DroppedVariableStatsIR'
This reverts commit 37606b4c22654ab66eee8f89448a117f3534f2f4.
Broke the llvm-nvptx-nvidia-ubuntu bot with error: the vtable symbol may
be undefined because the class is missing its key function
This reverts commit acf3b1aa932b2237c181686e52bc61584a80a3ff.
Broke https://lab.llvm.org/buildbot/#/builders/76/builds/5002
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o:(.toc+0x258): undefined reference to `vtable for llvm::DroppedVariableStatsIR'
This reverts commit 249755cedb17ffa707253edcef1a388f807caa35.
Broke https://lab.llvm.org/buildbot/#/builders/160/builds/9420
Note: This is test shard 99 of 154.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from DroppedVariableStatsMIR
[ RUN ] DroppedVariableStatsMIR.InlinedAt
--
exit: -11
Moved the IR unit test to the CodeGen folder to resolve linker errors:
`error: undefined reference to 'vtable for
llvm::DroppedVariableStatsIR'`
This patch is trying to reland
https://github.com/llvm/llvm-project/pull/115563
This reverts commit 0f8849349ae3d3f2f537ad6ab233a586fb39d375.
Resolve conflict in `MachinePostDominators.h` There is a conflict after
merging #96378, resolved in #96852. Both PRs modified
`MachinePostDominators.h` and triggered build failure.
This commit converts most of `DomTreeUpdater` into
`GenericDomTreeUpdater` class template, so IR and MIR can reuse some
codes.
There are some differences between interfaces of `BasicBlock` and
`MachineBasicBlock`, so subclasses still need to implement some
functions, like `forceFlushDeletedBB`.
Akin to `llvm::PatternMatch` and `llvm::MIPatternMatch`, the
`llvm::SDPatternMatch` introduced in this patch provides a DSL-alike
framework to match SDValue / SDNode with a more succinct syntax.
Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.
---------
Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>
In fact, there are several backends, e.g. AArch64, AMDGPU etc. add
module pass after function pass, this patch removes this constraint.
This patch also adds a simple unit test for `CodeGenPassBuilder`.
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331309692e8d6e06e93b3492b6a40989f.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
- Add some unittests for the findDebugLoc, rfindDebugLoc,
findPrevDebugLoc and rfindPrevDebugLoc helpers in MachineBasicBlock.
- Clean up code comments and code formatting related to the functions
mentioned above.
This was extracted as a pre-commit to D150577, adn some of the tests
are commented out since they would crash/assert in a rather
uncontrolled way.
Previously, `CCState::AllocateStack` always allocated stack space by increasing
offsets. For targets with stack growing up (away from zero) it is more
convenient to allocate arguments by decreasing offsets, so that the first
argument is at the top of the stack. This is important when calling a function
with variable number of arguments: the callee does not know the size of the
stack, but must be able to access "fixed" arguments. For that to work, the
"fixed" arguments should have fixed offsets relative to the stack top, i.e. the
variadic arguments area should be at the stack bottom (at lowest addresses).
The in-tree target with stack growing up is AMDGPU, but it allocates
arguments by increasing addresses. It does not support variadic arguments.
A drive-by change is to promote stack size/offset to 64-bit integer.
This is what MachineFrameInfo expects.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149575
This reduces dependencies on `llvm-tblgen` so much.
`CodeGenTypes` depends on `Support` at the moment.
Be careful to append deps on this, since Targets' tablegens
depend on this.
Depends on D149024
Differential Revision: https://reviews.llvm.org/D148769
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.
To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D131930
This review is extracted from D96035.
This patch adds possibility to keep not only DwarfStringPoolEntry, but also
pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry.
String map keeps string data and corresponding DwarfStringPoolEntry
info. Not all string map entries may be included into the result,
and then not all string entries should have DwarfStringPoolEntry
info. Currently StringMap keeps DwarfStringPoolEntry for all entries.
It leads to extra memory usage. This patch allows to keep
DwarfStringPoolEntry info only for entries which really need it.
[reland] : make msan happy.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D126883
This review is extracted from D96035.
This patch adds possibility to keep not only DwarfStringPoolEntry, but also
pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry.
String map keeps string data and corresponding DwarfStringPoolEntry
info. Not all string map entries may be included into the result,
and then not all string entries should have DwarfStringPoolEntry
info. Currently StringMap keeps DwarfStringPoolEntry for all entries.
It leads to extra memory usage. This patch allows to keep
DwarfStringPoolEntry info only for entries which really need it.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D126883
Delay reading global metadata until the first function or the end of
the file is emitted. That way, earlier module passes can set metadata
that is emitted in the ELF.
`emitStartOfAsmFile` gets called when the passes are initialized,
which prevented earlier passes from changing the metadata.
This fixes issues encountered after converting
AMDGPUResourceUsageAnalysis to a Module pass in D117504.
Differential Revision: https://reviews.llvm.org/D118492
Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)
Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.
In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).
We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.
Differential Revision: https://reviews.llvm.org/D115195
This patch shifts the InstrRefBasedLDV class declaration to a header.
Partially because it's already massive, but mostly so that I can start
writing some unit tests for it. This patch also adds the boilerplate for
said unit tests.
Differential Revision: https://reviews.llvm.org/D110165
- Distinct metadata needs generating in the codegen to attach correct
AAInfo on the loads/stores after lowering, merging, and other relevant
transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
which provides basic interfaces to create/retrive metadata slots. In
addition, once LLVM IR is processsed, additional hooks are also
introduced to help collect machine metadata and assign them slot
numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
an additional 'machineMetadataNodes' field containing all the
definition of those nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103205
If the size of memory access is unknown, do not use it to analysis. One
example of unknown size memory access is to load/store scalable vector
objects on the stack.
Differential Revision: https://reviews.llvm.org/D91833
Added unittests. In the process, separated core construction - which just
needs the hits, order, and 'HardHints' values - from construction from
current register allocation state, to simplify testing.
Differential Revision: https://reviews.llvm.org/D88455
DW_FORM_sec_offset and DW_FORM_strp imply values of different sizes with
DWARF32 and DWARF64. The patch fixes DIE value classes to use correct
sizes when emitting their values. For DIELocList it ensures that the
requested DWARF form matches the current DWARF format because that class
uses a method that selects the size automatically.
Differential Revision: https://reviews.llvm.org/D87009
These methods are used to emit values which are 32-bit in DWARF32 and
64-bit in DWARF64. The patch fixes them so that they choose the length
automatically, depending on the DWARF format set in the Context.
Differential Revision: https://reviews.llvm.org/D87008
This relands commit 320eab2d558fde0b61437e9b9075bfd301c2c474.
The test failed because it was looking for x86-linux target
unconditionally. Now it gets the default target.