Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
This opens up a door for reusing reassociation optimizations on
target-specific binary operations with non-standard operand list.
This is effectively a NFC.
CASE_VFMA_OPCODE_VV and CASE_VFMA_CHANGE_OPCODE_VV need to match up if we are
are to avoid "Unexpected opcode" errors, but in CASE_VFMA_CHANGE_OPCODE_VV,
CASE_VFMA_CHANGE_OPCODE_LMULS_MF2 had mistakenly been used instead of
CASE_VFMA_CHANGE_OPCODE_LMULS_MF4.
This PR includes:
* vadd.vv/vand.vv/vor.vv/vxor.vv
* vmseq.vv/vmsne.vv
* vmin.vv/vminu.vv/vmax.vv/vmaxu.vv
* vmul.vv/vmulh.vv/vmulhu.vv
* vwadd.vv/vwaddu.vv
* vwmul.vv/vwmulu
* vwmacc.vv/vwmaccu.vv
* vadc.vvm
There is no test change, I may add it later.
Fixes part of #64422
Reviewers: michaelmaitland, preames, lukel97, topperc, asb
Reviewed By: topperc, lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/88379
We split target-dependent MachineCombiner patterns into their target
folder.
This makes MachineCombiner much more target-independent.
Reviewers:
davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc
Reviewed By: topperc, mshockwave
Pull Request: https://github.com/llvm/llvm-project/pull/87991
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.
This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.
I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
This restructures the code to make the fact that most of
getVLENFactoredAmount is just a generic multiply w/immediate more
obvious and prepare for a couple of upcoming enhancements to this code.
Note that I plan to switch mulImm to early return, but decided I'd do
that as a separate commit to keep this diff readable.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
This TSFlags was introduced by https://reviews.llvm.org/D108767.
A base class of all RISCV RegisterClass is added and we store
IsVRegClass/VLMul/NF into TSFlags and add helpers to get them.
This can reduce some lines and I think there will be more usages.
Reviewers: preames, topperc
Reviewed By: topperc
Pull Request: https://github.com/llvm/llvm-project/pull/84894
When the encoding of register tuples are aligned, we can use a copy
with larger LMUL to reduce copies.
Reviewers: preames, topperc, lukel97
Reviewed By: topperc, lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/84455
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
Instead of initializing the accumulator to 0. Initialize it on first
assignment with a mv from the register that holds VLENB << ShiftAmount.
Fix a missing kill flag on the final Add.
I have no real interest in this case, just an easy optimization I
noticed.
This was trying to rewrite a branch that uses X0 to a branch that uses a
register produced by LI of 1 or -1. Using X0 is free so there is no
reason to rewrite it. Doing so would just extend the live range of the
LI register increasing register pressure.
In practice this might not have triggered often because we were calling
MRI.hasOneUse on X0. I'm not sure what the returns for a physical
reigster.
If it isn't virtual, we may extend the live range of the physical
register past were it is valid. For example, across a call.
Found while trying to enable -riscv-enable-sink-fold which enables some
copy propagation in machine sink that led to ADDIs with physical
register destinations.
When a branch target is too far away we need to emit an indirect branch.
We scavenge a register for this since we don't know we need this until
after register allocation.
Jumps using X1 and X5 as the source are hints to the hardware to pop the
return-address stack. We should avoiding using them for jumps that
aren't a return or tail call.
This patch adds basic TLSDESC support in the RISC-V backend.
Specifically, we add new relocation types for TLSDESC, as prescribed in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.
This patch does not try to optimize the local dynamic case, which can be
improved in separate patches.
Linker side changes will also be handled separately.
The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
Make Candidate's front() and back() functions return references to
MachineInstr and introduce begin() and end() returning iterators, the
same way it is usually done in other container-like classes.
This makes possible to iterate over the instructions contained in
Candidate the same way one can iterate over MachineBasicBlock (note that
begin() and end() return bundled iterators, just like MachineBasicBlock
does, but no instr_begin() and instr_end() are defined yet).
Ensure that getVLENFactoredAmount does not fail when the scale amount
requires the use of a non-trivial multiplication but the M extension is
not enabled. In such case, perform the multiplication using shifts and
adds.
This helper function handles common cases where we can determine a
constant value is being defined in a register. Although it looks like
codegen changes are possible due to this being called in
PeepholeOptimizer, my main motivation is to use this in
describeLoadedValue.
-Rename to GPRPair.
-Rename registers to be named like X10_X11 instead of X10_PD. Except X0
which is now X0_Pair since it is not paired with X1.
-Use unknown size and offset for the subreg indices. This might
be a functional change, but does not affect any lit tests.
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.
For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
Since #72467, `@plt` in assembly output "call foo@plt" is omitted. We
can trivially merge MO_PLT and MO_CALL without any functional change to
assembly/relocatable file output.
Earlier architectures use different call relocation types whether a PLT
is potentially needed: R_386_PLT32/R_386_PC32, R_68K_PLT32/R_68K_PC32,
R_SPARC_WDISP30/R_SPARC_WPLT320. However, as the PLT property is
per-symbol instead of per-call-site and linkers can optimize out a PLT,
the distinction has been confusing.
Arm made good names R_ARM_CALL/R_AARCH64_CALL. Let's use MO_CALL instead
of MO_PLT.
As follow-ups, we can merge fixup_riscv_call/fixup_riscv_call_plt and
VK_RISCV_CALL/VK_RISCV_CALL_PLT.
-Rename sub_32_hi to sub_gpr_odd
-Add dedicated sub_gpr_even.
-Rename sub_32 and sub_16 to sub_fpr32 and sub_fpr16.
-Remove start offset from sub_gpr_odd. AArch64 doesn't use non-zero offset for GPR
tuples so I don't think we need to.
This is preparation for a RV64 GPRPair for Zacas.
Split out from #73789, so as to leave that PR just for flipping load
clustering to on by default. Clusters if the operations are within a
cache line of each other (as AMDGPU does in shouldScheduleLoadsNear).
X86 does something similar, but does `((Offset2 - Offset1) / 8 > 64)`.
I'm not sure if that's intentionally set to 512 bytes or if the division
is in error.
Adopts the suggestion from @wangpc-pp to query the cache line size and
use it if available.
We also cap the maximum cluster size to cap the potential register
pressure impact (which may lead to additional spills).
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
I noted AArch64 happily accepts a FrameIndex operand as well as a
register. This doesn't cause any changes outside of my C++ unit test for
the current state of in-tree, but this will cause additional test
changes if #73789 is rebased on top of it.
Note that the returned Offset doesn't seem at all as meaningful if you
have a FrameIndex base, though the approach taken here follows AArch64
(see D54847). This change won't harm the approach taken in
shouldClusterMemOps because memOpsHaveSameBasePtr will only return true
if the FrameIndex operand is the same for both operations.
This adds minimal support for load clustering, but disables it by
default. The intent is to iterate on the precise heuristic and the
question of turning this on by default in a separate PR. Although
previous discussion indicates hope that the MachineScheduler would
replace most uses of the SelectionDAG scheduler, it does seem most
targets aren't using MachineScheduler load clustering right now:
PPC+AArch64 seem to just use it to help with paired load/store formation
and although AMDGPU uses it for general clustering it also implements
ShouldScheduleLoadsNear for the SelectionDAG scheduler's clustering.
This hook is called by the default implementation of
getMemOperandWithOffset and by the load/store clustering code in the
MachineScheduler though this isn't enabled by default and is not yet
enabled for RISC-V. Only return true for queries on scalar loads/stores
for now (this is a conservative starting point, and vector load/store
can be handled in a follow-on patch).
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
This hook is called by the target-independent implementation of
TargetInstrInfo::describeLoadedValue. I've opted to test it via a C++
unit test, which although fiddly to set up seems the right way to test a
function with such clear intended semantics (rather than testing the
impact indirectly).
isAddImmediate will never recognise ADDIW as an add immediate which I
_think_ is conservatively correct, as the caller may not understand its
semantics vs ADDI.
Note that although the doc comment for isAddImmediate specifies its
behaviour solely in terms of physical registers, none of the current
in-tree implementations (including this one) bail out on virtual
registers (see #72357).