Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
DXIL shader bitcode is hashed and the hash is placed into the final
output object file in its own data part.
This change modifies the DXContainerGlobals pass to compute the shader
hash (just an MD5 of the bitcode) and put the shader hash data into a
global for the HASH part.
This also sets the hash flag as appropriate for if the hashed shader
contained debug information. There is additional handling required to
get debug information in shaders working correctly with our tooling,
but that will be addressed in subsequent patches.
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D139357
Summary:
According to DWARF5 specification and gnu specification for DWARF4 the offset
entry in the CU/TU Index is 32 bits. This presents a problem when
.debug_info.dwo in DWP file grows beyond 4GB. The CU Index becomes partially
corrupted.
This diff adds manual parsing of .debug_info.dwo/.debug_abbrev.dwo to
reconstruct CU index in general, and TU index for DWARF5. This is a work around
until DWARF6 spec is finalized.
Next patch will change internal CU/TU struct to 64 bit, and change uses as
necessary. The plan is to land all the patches in one go after all are approved.
This patch originates from the discussion in: https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902
Differential Revision: https://reviews.llvm.org/D137882
Summary:
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
This reverts commit d24915207c631b7cf637081f333b41bc5159c700.
Thinking about this more this probably chewed up 100+ bytes of stack
for each recursive call. So this probably needs more thought. The
code simplification wasn't that much.
This is not a correctness fix because the set is only used for debug
output. However, it helps avoid noise when looking at diffs between
compiler runs.
The set is only maintained with debug output enabled, so the added cost
should be acceptable.
Differential Revision: https://reviews.llvm.org/D139465
On 64-bit target, when doing i64 SELECT_CC where one of the comparison operands
is a constant zero, try to fold the compare and MOVcc into a MOVr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138922
After https://reviews.llvm.org/D137653 named sub-operands can be used
in the auto-generated instruction decoders. This allows the
auto-generated decoders to work properly, so all the hand-coded
decoders in the sparc target can be removed.
In some instances, a manually-written decoder had not been implemented
for an instruction, and thus that instruction was not decoded
properly. These have been fixed (and tests added).
Differential Revision: https://reviews.llvm.org/D137727
Commit a538d1f13a13 first added support for named sub-operands in
CodeEmitterGen. We now add a few more features to that, enabling
further target cleanups.
1. Adds support for handling an EncoderMethod in a sub-operand in
CodeEmitterGen. Previously, the specified encoder of a sub-operand was
ignored, and only the default used.
2. Adds support for sub-operands in DecoderEmitter, along with support
for tied sub-operands.
The changes to the decoder required a few minor tweaks to a few
targets, where existing brokeness was exposed. In order to keep this
patch small, I left FIXMEs which will be addressed in upcoming
patches. (Except MIPS16, since its object file emission/decoding is
totally broken).
Differential Revision: https://reviews.llvm.org/D137653
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
OMod was disabled if OpSel was enabled, but that restriction is more
specific than necessary. Any VOP3 with float operands can use OMod.
On GFX11, FMAC_F16_e64 can use op_sel.
Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when
they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on
V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139469
The global constant arguments could be in a different address space
than the first argument, so we have to add another overloaded argument.
This patch was originally made for CHERI LLVM (where globals can be in
address space 200), but it also appears to be useful for in-tree targets
as can be seen from the test diffs.
Differential Revision: https://reviews.llvm.org/D138722
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.
Say we have:
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb
bb:
%v = phi i1 %true, throwbb, %false....
When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb
bb:
%v = phi i32 %true, throwbb, %false....
To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.
Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.
Differential Revision: https://reviews.llvm.org/D137905
By definition, the AVL of the scalar move is equally zero to the prior AVL if they are the same value. This generalizes the existing code to the case where the scalar move has a register AVL which is unknown, but unchanged from the preceeding instruction.
This doesn't cause any interesting diffs on its own, but another patch makes this case much more common. Split off to reduce a future diff.
The MC layer instructions have the correct register classes, and
the pseudos don't have any additional operands. So there doesn't
seem to be any reason for them to exist.
The pseudos were incorrectly going through code in RISCVMCInstLower
that converted LMUL>1 register classes to LMUL1 register class.
This makes the MCInst technically malformed, and prevented the
vl2r.v, vl4r.v, and vl8r.v InstAliases from matching. This accounts
for all of the .ll test diffs.
Differential Revision: https://reviews.llvm.org/D139511
We no longer need to increase vector size to 16 for intrinsics that use more
than 8 vgprs for addr. There is no image intrinsic that needs more than 12
so all currently existing cases will be covered. Using incorrect size was
causing an error in instruction selection because instructions were updated
to require new types (9x32, 10x32, 11x32, 12x32).
Differential Revision: https://reviews.llvm.org/D139546
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
This breaks Windows bots with
`warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)`
Some shift operators are lacking a proper literal unit ('1ULL' instead of
'1'). Will reland once fixed.
This reverts commit c621c1a8e81856e6bf2be79714767d80466e9ede.
Before performing this change, I checked that `ByteAlignment` was never `0` inside `MCAsmStreamer:emitZeroFill` and `MCAsmStreamer::emitLocalCommonSymbol`.
I believe it is NFC as `0` values are illegal in `emitZeroFill` anyways, `Log2(ByteAlignment)` would be undefined.
And currently, all calls to `emitLocalCommonSymbol` are provably `>0`.
Differential Revision: https://reviews.llvm.org/D139439
SOPK_Pseudo was not inheriting from SOP_Pseudo at all, and some other
Pseudo classes were needlessly redefining things that were already
defined by SOP_Pseudo.
Differential Revision: https://reviews.llvm.org/D139527
Move some code that checks if an instruction is a waitcount into a separate
function, mainly to aid readability in the logic where it is used.
Differential Revision: https://reviews.llvm.org/D139522
The old code didn't bother to memoize blocks for which exact exit count is not
known. As result, in situation when exact isn't known but symbolic is known, this
info was lost. This patch fixes the situation: now we memoize when symbolic is
known (exact always implies symbolic, so this is a strict superset of what was before).
Differential Revision: https://reviews.llvm.org/D139515
Reviewed By: nikic
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.
Differential Revision: https://reviews.llvm.org/D136754
The LLVM C bindings currently offer no way to query the version string
dynamically. This is a useful feature in situations where a program
isn't compiled against a specific version of LLVM but rather loads it
dynamically (e.g. using dlopen()).
In situations where the shared library filename doesn't reveal the
version (e.g. LLVM-C.dll) and to adapt to version-specific API
differences, it is then useful to be able to query the version string by
calling the proposed LLVMGetVersion function.
Differential Revision: https://reviews.llvm.org/D139381
Given the significant commonality between the bfmlal* and fmlal*
instructions it makes sense to use just a single class for both.
We can do this now that the bfmlal* lane intrinsics take a i32
index.
Differential Revision: https://reviews.llvm.org/D138906
Update spill code to account for new vector types with
bit widths: 288, 320, 352, 384.
Related to D138205.
Differential Revision: https://reviews.llvm.org/D139203