The operand lists for these opcode require 1 byte per operand and are
usually small values that fit in 3-4 bits. This makes their storage
inefficient. In addition, many EmitNode/MorphNodeTo in the isel table
will use the same list of operand numbers.
This patch proposes to separate the operand lists into their own table
where they can be de-duplicated. The OPC_EmitNode/MorphNodeTo in the
main table will only store an index into this smaller table.
This is a reduced version of a suggestion from this very old FIXME.
d8d4096c0b/llvm/utils/TableGen/DAGISelMatcherGen.cpp (L1070)
For RISC-V this reduces the main table from 1437353 bytes to 1276015
bytes plus a 929 byte operand list table. A savings of about 11%.
For X86 this reduces the main table from 719237 bytes to 623612 bytes
plus a 1042 byte operand list table. A savings of about 11%.
I expect further savings could be had by moving more bytes over.
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.
Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.
GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.
Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
- Remove pass initialization calls from pass constructors.
- For some passes, add the initialization to `initializeCodeGen` or
`initializeGlobalISel`.
- Remove redundant initializations from llc and X86 target for some
passes.
Previously this was taking a duplicate copy of this information
from TargetLowering. This moves the bulk of libcall checks to use
the new analysis. There are still a few straggler uses in misc.
passes in a few backends (mainly AArch64 has some libcall emission
in FinalizeISel and PrologEpilogInserter).
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
The way HwMode is currently implemented, tablegen duplicates each
pattern that is dependent on hardware mode. The HwMode predicate is
added as a pattern predicate on the duplicated pattern.
RISC-V uses HwMode on the GPR register class which means almost every
isel pattern is affected by HwMode. This results in the isel table
being nearly twice the size it would be if we only had a single GPR
size.
This patch proposes to do the expansion at instruction selection time
instead. To accomplish this new opcodes like OPC_CheckTypeByHwMode
are added to the isel table. The unique combinations of types and HwMode
are converted to an index that is the payload for the new opcodes.
TableGen emits a new virtual function getValueTypeByHwMode that uses
this index and the current HwMode to look up the type.
This reduces the size of the isel table on RISC-V from ~2.38 million
bytes to ~1.38 million bytes.
I did not add an OPC_SwitchTypeByHwMode opcode yet. If the VT requires a
hardware mode, we emit an OPC_Scope+OPC_CheckTypeByHwMode instead. I
expect adding an OPC_SwitchTypeByHwMode could further reduce the table
size. I will investigate this as a follow up.
Many of the matcher classes in tablegen now use ValueTypeByHwMode
insteadof MVT. This may have an impact on the memory usage and runtime of
tablegen. We can mitigate some of this by splitting the matchers into MVT and
ValueTypeByHwMode versions. We can also explore alternate data
structures for ValueTypeByHwMode instead of a std::map. Maybe a sorted vector.
A similar change can be made to GlobalISel as a follow up.
Instead emit this as an OPC_EmitInteger, but print the string
when the value is known to be 0..63 (when we don't need a VBR).
Also print the string into a comment when comments are not omitted
so it isn't lost when a VBR is needed.
Previously, we used a VBR that stored the sign bit in bit 0 followed
by the absolute value in subsequent bits.
This patch changes it to use SLEB128 which discards redundant sign
bits, but keeps the bits in the same positions. This uses the same
number of bytes to encode values so doesn't change the table size.
My goal is to remove OPC_EmitStringInteger as a special opcode type.
Instead, we can print the string directly with OPC_EmitInteger for
any string that has an enum value of 0..63.
This reverts commit 3ff2637d867a6cc23ea5d5127b065efb8299d196.
I accidentally merged another PR into this during a rebase. Reverting
to commit it correctly.
Previously, we used a VBR that stored the sign bit in bit 0 followed by
the absolute value in subsequent bits.
This patch changes it to use SLEB128 which discards redundant sign bits,
but keeps the bits in the same positions. This uses the same number of
bytes to encode values so doesn't change the table size.
My goal is to remove OPC_EmitStringInteger as a special opcode type.
Instead, we can print the string directly with OPC_EmitInteger for any
string that has an enum value of 0..63.
At the moment the MIR tests are somewhat redundant. The waitcnt
one is needed to ensure we actually have a load, given we are
currently just emitting an error on ExternalSymbol. The asm printer
one is more redundant for the moment, since it's stressed by the IR
test. However I am planning to change the error path for the IR test,
so it will soon not be redundant.
This was discovered while looking at the codegen for x64 when Control
Flow Guard is enabled.
When using `SelectionDAG`, LLVM would generate the following sequence
for a CF guarded indirect call:
```
leaq target_func(%rip), %rax
rex64 jmpq *__guard_dispatch_icall_fptr(%rip) # TAILCALL
```
However, when Fast ISel was used the following is generated:
```
leaq target_func(%rip), %rax
movq __guard_dispatch_icall_fptr(%rip), %rcx
rex64 jmpq *%rcx # TAILCALL
```
This was happening despite Fast ISel aborting and falling back to
`SelectionDAG`.
The root cause for this code gen is that `SelectionDAGISel` has a
special case when Fast ISel aborts when lowering a `CallInst` where it
tries to lower the instruction as its own basic block, which for such a
CF Guard call means that it is lowering an indirect call to
`__guard_dispatch_icall_fptr` without observing that the function was
being loaded into a pointer in the preceding (and bundled) instruction.
The fix for this is to not use the special case when a `CallInst` has
bundled instructions: it's better to allow the call and its bundled
instructions to be lowered together by `SelectionDAG` instead.
In the new test, we're trying to fold a load and a X86ISD::CALL. The
call has a CopyToReg glued to it. The load and the call have different
input chains so they need to be merged. This results in a TokenFactor
that gets put between the CopyToReg and the final CALLm instruction. The
DAG scheduler can't handle that.
The load here was created by legalization of the extract_element using a
stack temporary store and load. A normal IR load would be chained into
call sequence by SelectionDAGBuilder. This would usually have the load
chained in before the CopyToReg. The store/load created by legalization
don't get chained into the rest of the DAG.
Fixes#63790
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
Branch probabilities from PGO profile data were not preserved during
instruction selection at -O0 because BranchProbabilityInfo was only
requested when OptLevel != None.
`shouldUseDebugInstrRef` can return different value than
`useDebugInstrRef`, since the first depends on opt level which can
change. Inconsistent usage can lead to errors later.
I believe that using `should...` instead of `use...` here is a result of
a minor error during this:
https://github.com/llvm/llvm-project/pull/94149/files#diff-8ec547e1244562c5837ed180dd9bed61b3cd960ef90bb6002ea2db41a67ed693
Notice how before the change `InstrRef` is assigned value from
`should...` *before* the opt change. Now, it's done after -- opt change
happens here:
```c
bool SelectionDAGISelLegacy::runOnMachineFunction(MachineFunction &MF) {
...
// Decide what flavour of variable location debug-info will be used, before
// we change the optimisation level.
MF.setUseDebugInstrRef(MF.shouldUseDebugInstrRef());
....
return Selector->runOnMachineFunction(MF);
}
```
Then `runOnMachineFunction` uses `should...`, which after opt change may
return different value than it did previously.
An alternative approach to #149732 , which sorts the DAG before dumping
it. That approach runs a risk of altering the codegen result as we don't
know if any of the downstream DAG users relies on the node ID, which was
updated as part of the sorting.
The new method proposed by this PR does not update the node ID or any of
the DAG's internal states: the newly added
`SelectionDAG::getTopologicallyOrderedNodes` is a const member function
that returns a list of all nodes in their topological order.
This patch adds a new `TargetLowering` hook `lowerEHPadEntry()` that is
called at the start of lowering EH pads in SelectionDAG. This allows the
insertion of target-specific actions on entry to exception handlers.
This is used on AArch64 to insert SME streaming-mode switches at landing
pads. This is needed as exception handlers are always entered with
PSTATE.SM off, and the function needs to resume the streaming mode of
the function body.
llvm/llvm-project#147560 changed when the legacy SelectionDAG pass needs
TargetTransformInfoWrapperPass to always require it (rather than only
when assertions are enabled). `SelectionDAGISelLegacy::getAnalysisUsage`
was not updated in that PR, which was causing crashes on
assertions-disabled builds, which are hard to track down.
This makes the required update, which should avoid crashes being seen on
some buildbots and by some users.
This reverts commit 8ac7210b7f0ad49ae7809bf6a9faf2f7433384b0.
This breaks the building the AArch64 backend, e.g. see
https://github.com/llvm/llvm-project/pull/144947
Revert to unbreak the build.
Also reverts follow-up commits 1e76f012db3ccfaa05e238812e572b5b6d12c17e.
If a kernel is known to be executing only a single lane, IR
UniformityAnalysis will take note of that (via
GCNTTIImpl::hasBranchDivergence) and report that all values are uniform.
SelectionDAG's built-in divergence tracking should do the same.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call
for every function return.
Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new
approach. Also we have abandoned the AArch64 specific implementation
of same pass in PR #121938 in favor of this more general solution.
The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation
The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.
Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.