These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
I noted AArch64 happily accepts a FrameIndex operand as well as a
register. This doesn't cause any changes outside of my C++ unit test for
the current state of in-tree, but this will cause additional test
changes if #73789 is rebased on top of it.
Note that the returned Offset doesn't seem at all as meaningful if you
have a FrameIndex base, though the approach taken here follows AArch64
(see D54847). This change won't harm the approach taken in
shouldClusterMemOps because memOpsHaveSameBasePtr will only return true
if the FrameIndex operand is the same for both operations.
This adds minimal support for load clustering, but disables it by
default. The intent is to iterate on the precise heuristic and the
question of turning this on by default in a separate PR. Although
previous discussion indicates hope that the MachineScheduler would
replace most uses of the SelectionDAG scheduler, it does seem most
targets aren't using MachineScheduler load clustering right now:
PPC+AArch64 seem to just use it to help with paired load/store formation
and although AMDGPU uses it for general clustering it also implements
ShouldScheduleLoadsNear for the SelectionDAG scheduler's clustering.
This hook is called by the default implementation of
getMemOperandWithOffset and by the load/store clustering code in the
MachineScheduler though this isn't enabled by default and is not yet
enabled for RISC-V. Only return true for queries on scalar loads/stores
for now (this is a conservative starting point, and vector load/store
can be handled in a follow-on patch).
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
This hook is called by the target-independent implementation of
TargetInstrInfo::describeLoadedValue. I've opted to test it via a C++
unit test, which although fiddly to set up seems the right way to test a
function with such clear intended semantics (rather than testing the
impact indirectly).
isAddImmediate will never recognise ADDIW as an add immediate which I
_think_ is conservatively correct, as the caller may not understand its
semantics vs ADDI.
Note that although the doc comment for isAddImmediate specifies its
behaviour solely in terms of physical registers, none of the current
in-tree implementations (including this one) bail out on virtual
registers (see #72357).
DAGCombiner, as well as InstCombine, tend to canonicalize GE/LE into
GT/LT, namely:
```
X >= C --> X > (C - 1)
```
Which sometime generates off-by-one constants that could have been CSE'd
with surrounding constants.
Instead of changing such canonicalization, this patch tries to swap
those branch conditions post-isel, in the hope of resurfacing more
constant CSE opportunities. More specifically, it performs the following
optimization:
For two constants C0 and C1 from
```
li Y, C0
li Z, C1
```
To remove redundnat `li Y, C0`,
1. if C1 = C0 + 1 we can turn:
(a) blt Y, X -> bge X, Z
(b) bge Y, X -> blt X, Z
2. if C1 = C0 - 1 we can turn:
(a) blt X, Y -> bge Z, X
(b) bge X, Y -> blt Z, X
This optimization will be done by PeepholeOptimizer through
RISCVInstrInfo::optimizeCondBranch.
Call this method directly from each vector case with the correct
arguments. This allows us to treat each type of copy as its own
special case and not pass variables to a common merge point. This
is similar to how AArch64 is structured.
I think I can reduce the number of operands to this new method, but
I'll do that as a follow up.
This duplicates the BuildMI from the 3 FP copy types, but separates them
from the VR code and gets rid of the IsScalableVector flag.
I need to add FPR<->GPR copies for GISel which needs another variation
of BuildMI. So it seems cleaner to start enumerating each case as its
own if.
This patch adds stackmap support for RISC-V without targets (i.e. the nop patchable forms).
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D123496
This uses the recently introduced sink-and-fold support in MachineSink.
https://reviews.llvm.org/D152828
This enables folding ADDI into load/store addresses.
Enabling by default will be a separate PR.
The handling for vector pseudos in hasAllNBitUsers is duplicated across
RISCVISelDAGToDAG and RISCVOptWInstrs. This deduplicates it between the
two,
with the common denominator between the two call sites being the opcode
and
SEW: We need to handle extracting these separately since one operates at
the
SelectionDAG level and the other at the MachineInstr level.
This adds the shifts and the immediate forms of the instructions that
were already supported.
There are still more instructions that can be predicated, but this is
the rest of what we had in our downstream.
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Refered to https://reviews.llvm.org/rG3e081703c349dd00b8ef6991c2d15964915dd8f4
Reviewed By: asb, kito-cheng, benshi001
Differential Revision: https://reviews.llvm.org/D129173
We need a pseudo for each scalar FP register class. Previously
we distinquished the pseudos by naming them with F16, F32, F64, or
BF16 in place of the F in the normal instruction name.
Because these strings can appear in other parts of the name we had
to do things like matching "_VBF16" to "_VF".
This patch replaces the F16, F32, F64 strings with FPR16, FPR32, and
FPR64. We also use FPR16 for BF16 since that is the scalar register
class for bf16.
Since the FPR16/32/64 string does not anywhere else in the pseudo
names, we can use this to simplify the string replacements. This
also allows us to simplify some BF16 related code.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D157749
These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers.
Differential Revision: https://reviews.llvm.org/D156883
Instead of checking '!Zfh && Zhfmin' first, handle Zfh. Then assert
that the other case is F+Zfhmin. The F+Zfhmin check will need to be
relaxed for bfloat16 support. As it was written before there would
be now error to catch that. Instead it would just silently create
fsgnj.h instructions.
Depends on D154628
For the cover letter of the patch-set, please checkout D154628.
This is the 2nd patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154629
If the operands to the mul have other uses we may be extending their
live range past a kill flag.
Reviewed By: asb, asi-sc
Differential Revision: https://reviews.llvm.org/D155046
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This change handles most of the binary pseudos. I excluded pseudos which _TIED variants, and those that produce mask results. Both a bit different in functionality, and deserve their own change and review. As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions.
Differential Revision: https://reviews.llvm.org/D154245
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. In D153155, we started removing the legacy distinction between unsuffixed (TA) and _TU pseudos. This patch continues that effort for the unary instruction families.
The change consists of a few interacting pieces:
* Adding a vector policy operand to VPseudoUnaryNoMaskTU.
* Then using VPseudoUnaryNoMaskTU for all cases where VPseudoUnaryNoMask was previously used and deleting the unsuffixed form.
* Then renaming VPseudoUnaryNoMaskTU to VPseudoUnaryNoMask, and adjusting the RISCVMaskedPseudo table to use the combined pseudo.
* Fixing up two places in C++ code which manually construct VMV_V_* instructions.
Normally, I'd try to factor this into a couple of changes, but in this case, the table structure is tied to naming and thus we can't really separate the otherwise NFC bits.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions.
Differential Revision: https://reviews.llvm.org/D153899
We were only checking for the previous insructions to write exactly
the register or a super register. We ignored writes to a subregister
and continued searching for the producing instruction. We need to
abort instead.
There's another check inside the if body to abort if the registers
don't match exactly. So we just need to check for overlap so we
enter the if body.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D153490
With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits
"kcfi" operand bundles to indirect call instructions. Similarly to
the target-specific lowering added in D119296, implement KCFI operand
bundle lowering for RISC-V.
This patch disables the generic KCFI pass for RISC-V in Clang, and
adds the KCFI machine function pass in `RISCVPassConfig::addPreSched`
to emit target-specific `KCFI_CHECK` pseudo instructions before calls
that have KCFI operand bundles. The machine function pass also bundles
the instructions to ensure we emit the checks immediately before the
calls, which is not possible with the generic pass.
`KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a
contiguous code sequence that traps if the expected hash in the
operand bundle doesn't match the hash before the target function
address. This patch emits an `ebreak` instruction for error handling
to match the Linux kernel's `BUG()` implementation. Just like for X86,
we also emit trap locations to a `.kcfi_traps` section to support
error handling, as we cannot embed additional information to the trap
instruction itself.
Relands commit 62fa708ceb027713b386c7e0efda994f8bdc27e2 with fixed
tests.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D148385
With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits
"kcfi" operand bundles to indirect call instructions. Similarly to
the target-specific lowering added in D119296, implement KCFI operand
bundle lowering for RISC-V.
This patch disables the generic KCFI pass for RISC-V in Clang, and
adds the KCFI machine function pass in `RISCVPassConfig::addPreSched`
to emit target-specific `KCFI_CHECK` pseudo instructions before calls
that have KCFI operand bundles. The machine function pass also bundles
the instructions to ensure we emit the checks immediately before the
calls, which is not possible with the generic pass.
`KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a
contiguous code sequence that traps if the expected hash in the
operand bundle doesn't match the hash before the target function
address. This patch emits an `ebreak` instruction for error handling
to match the Linux kernel's `BUG()` implementation. Just like for X86,
we also emit trap locations to a `.kcfi_traps` section to support
error handling, as we cannot embed additional information to the trap
instruction itself.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D148385
We can mostly get this from the operand info in MCInstrDesc.
The exception is the _TIED pseudos so I've added a new flag for those.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D152313
Sometimes an developer would like to have more control over cmov vs branch. We have unpredictable metadata in LLVM IR, but currently it is ignored by X86 backend. Propagate this metadata and avoid cmov->branch conversion in X86CmovConversion for cmov with this metadata.
Example:
```
int MaxIndex(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is converted to branch by X86CmovConversion
if (a[i] > a[t]) t = i;
}
return t;
}
int MaxIndex2(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is preserved
if (__builtin_unpredictable(a[i] > a[t])) t = i;
}
return t;
}
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118118