Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
DXIL shader bitcode is hashed and the hash is placed into the final
output object file in its own data part.
This change modifies the DXContainerGlobals pass to compute the shader
hash (just an MD5 of the bitcode) and put the shader hash data into a
global for the HASH part.
This also sets the hash flag as appropriate for if the hashed shader
contained debug information. There is additional handling required to
get debug information in shaders working correctly with our tooling,
but that will be addressed in subsequent patches.
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D139357
This reverts commit d24915207c631b7cf637081f333b41bc5159c700.
Thinking about this more this probably chewed up 100+ bytes of stack
for each recursive call. So this probably needs more thought. The
code simplification wasn't that much.
On 64-bit target, when doing i64 SELECT_CC where one of the comparison operands
is a constant zero, try to fold the compare and MOVcc into a MOVr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138922
After https://reviews.llvm.org/D137653 named sub-operands can be used
in the auto-generated instruction decoders. This allows the
auto-generated decoders to work properly, so all the hand-coded
decoders in the sparc target can be removed.
In some instances, a manually-written decoder had not been implemented
for an instruction, and thus that instruction was not decoded
properly. These have been fixed (and tests added).
Differential Revision: https://reviews.llvm.org/D137727
Commit a538d1f13a13 first added support for named sub-operands in
CodeEmitterGen. We now add a few more features to that, enabling
further target cleanups.
1. Adds support for handling an EncoderMethod in a sub-operand in
CodeEmitterGen. Previously, the specified encoder of a sub-operand was
ignored, and only the default used.
2. Adds support for sub-operands in DecoderEmitter, along with support
for tied sub-operands.
The changes to the decoder required a few minor tweaks to a few
targets, where existing brokeness was exposed. In order to keep this
patch small, I left FIXMEs which will be addressed in upcoming
patches. (Except MIPS16, since its object file emission/decoding is
totally broken).
Differential Revision: https://reviews.llvm.org/D137653
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
OMod was disabled if OpSel was enabled, but that restriction is more
specific than necessary. Any VOP3 with float operands can use OMod.
On GFX11, FMAC_F16_e64 can use op_sel.
Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when
they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on
V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139469
By definition, the AVL of the scalar move is equally zero to the prior AVL if they are the same value. This generalizes the existing code to the case where the scalar move has a register AVL which is unknown, but unchanged from the preceeding instruction.
This doesn't cause any interesting diffs on its own, but another patch makes this case much more common. Split off to reduce a future diff.
The MC layer instructions have the correct register classes, and
the pseudos don't have any additional operands. So there doesn't
seem to be any reason for them to exist.
The pseudos were incorrectly going through code in RISCVMCInstLower
that converted LMUL>1 register classes to LMUL1 register class.
This makes the MCInst technically malformed, and prevented the
vl2r.v, vl4r.v, and vl8r.v InstAliases from matching. This accounts
for all of the .ll test diffs.
Differential Revision: https://reviews.llvm.org/D139511
We no longer need to increase vector size to 16 for intrinsics that use more
than 8 vgprs for addr. There is no image intrinsic that needs more than 12
so all currently existing cases will be covered. Using incorrect size was
causing an error in instruction selection because instructions were updated
to require new types (9x32, 10x32, 11x32, 12x32).
Differential Revision: https://reviews.llvm.org/D139546
SOPK_Pseudo was not inheriting from SOP_Pseudo at all, and some other
Pseudo classes were needlessly redefining things that were already
defined by SOP_Pseudo.
Differential Revision: https://reviews.llvm.org/D139527
Move some code that checks if an instruction is a waitcount into a separate
function, mainly to aid readability in the logic where it is used.
Differential Revision: https://reviews.llvm.org/D139522
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.
Differential Revision: https://reviews.llvm.org/D136754
Given the significant commonality between the bfmlal* and fmlal*
instructions it makes sense to use just a single class for both.
We can do this now that the bfmlal* lane intrinsics take a i32
index.
Differential Revision: https://reviews.llvm.org/D138906
Update spill code to account for new vector types with
bit widths: 288, 320, 352, 384.
Related to D138205.
Differential Revision: https://reviews.llvm.org/D139203
Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.
I've maintained backwards compatibility with the old i64 variants
and used the autoupgrade mechanism.
Differential Revision: https://reviews.llvm.org/D138788
We've exploited test data class instructions introduced in ISA 3.0.
This change unifies the scalar intrinsics into ppc_test_data_class
and add support for 128-bit precision float values using xststdcqp.
Vector versions of the intrinsic can't be unified because they return
vector int instead of int.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138105
s_endpgm is a special SOPP instruction in that its operand is optional
and if it is not present then we don't want to print a space after the
mnemonic.
Previously this was handled by defaulting real_name to the mnemonic with
a trailing space, and having s_endpgm override it to be the mnemonic
with no trailing space.
This patch implements a different approach where the separator between
Mnemonic and AsmOperands defaults to a space, but s_endpgm overrides it
to be the empty string.
Differential Revision: https://reviews.llvm.org/D139412
This reverts commit 7883e5b061bdbbe8bee5f479ebe911db5045b7e9.
The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139509
Correct the pseudo atomic instruction size for branch
relaxation and branch folding passes.
Inspired by D118175, D118009 and D117970.
Depends on D138481
Reviewed By: SixWeining, gonglingqin, xen0n
Differential Revision: https://reviews.llvm.org/D138469
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.
Differential Revision: https://reviews.llvm.org/D139506
The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.
If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included.
Since most instructions are not included for Zfhmin, so most operations are promoted.
The patch primarily about making f16 a legal type.
RISC-V ISA info:
https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139391
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D138379
Since changes to account for LMUL in scheduler model existed over patches, we had to keep
both LMUL specific and all LMUL classes around. Now that only the LMUL specific
classes are used, we can remove the old ones.
It is likley that subtargets act differently for a vector load store instructions based on the LMUL.
This patch creates seperate SchedRead, SchedWrite, WriteRes, ReadAdvance for each relevant LMUL.
Differential Revision: https://reviews.llvm.org/D137429
It is likley that subtargets act differently for vector fixed-point arithmetic instructions
based on the LMUL. This patch creates seperate SchedRead, SchedWrite, WriteRes, ReadAdvance
for each relevant LMUL.
Differential Revision: https://reviews.llvm.org/D137428
We need a scratch GPR to increment the base pointer for each subsequent
register. We currently reuse the input GPR for the base pointer without
declaring it as a Def of the pseudo.
We can't add it as a Def of the pseudo at creation time because it doesn't
get register allocated. This was tried in D109405.
Seems the only choice we have is to scavenge the GPR. This patch
moves the expansion to eliminateFrameIndex where we can create
virtual registers that will be scavenged. This also eliminates the
extra operand for passing vlenb from frame lowering to expand pseudos.
I need to do more testing on real world code, but wanted to get this
up for early review.
I hope this will fix the issue reported in D123394, but I haven't
checked yet.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139169
This patch adds the following NFC fixes to PPCInstrInfo.cpp when getting the DefMI:
- Fix documentation error to state that we want to flag a use of register
between the def and the MI (in post-RA)
- Setting the DefMI to null if the DefMI is neither an LI or and ADDI
(while still being in SSA form).
In terms of setting the DefMI to null, this change aims to account for the
scenario of when we end up going through all operands on the machine instruction
MI and updating OpNoForForwarding accordingly once an ADDI is found as the DefMI.
It is possible that once an ADDI is found, we will continue to go through all
operands in attempts to find an LI, but end up looking at every operand until
we reach the end if we have not yet found an LI. In the case where the end is
reached and we never end up finding an LI/ADDI, DefMI would be pointing to the
last operand of MI while OpNoForForwarding would still be pointing at the
previous ADDI operand found. We reset DefMI to avoid having DefMI point to an
instruction that differs from the one represented by OpNoForForwarding.
Differential Revision: https://reviews.llvm.org/D137483