When generating unwind tables for code which uses return-address
signing, we need to toggle the RA_SIGN_STATE DWARF register around any
tail-calls, because these require the return address to be authenticated
before the call, and could throw an exception. This is done using the
.cfi_negate_ra_state directive before the call, and .cfi_restore_state
at the start of the next basic block.
However, since D153098, the .cfi_restore_state isn't being inserted,
because the CFIFixup pass isn't being run. This re-enables that pass
when return-adress signing is enabled.
Reviewed By: ikudrin, MaskRay
Differential Revision: https://reviews.llvm.org/D156428
Backwards frame index elimination uses backwards register scavenging,
which is preferred because it does not rely on accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156691
Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic
Previously we used the number of registers needed saved and pushable as the
number of pushed registers. We also use pushed register number to caculate
the stack size. It is not correct because Zcmp pushes registers from $ra to the
max register needed saved and there is no gurantee that the needed saved
registers are a sequenced list from $ra.
There is an example about that. PushPopRegs should be 6 (ra,s0 - s4)= instead of 1.
```
; llc -mtriple=riscv32 -mattr=+zcmp
define void @foo() {
entry:
; Old: .cfi_def_cfa_offset 16
; New: .cfi_def_cfa_offset 32
tail call void asm sideeffect "li s4, 0", "~{s4}"()
ret void
}
```
Reviewed By: Jim, kito-cheng
Differential Revision: https://reviews.llvm.org/D156407
This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test
cases, it simply picks off an initial set that can be supported
especially easy. In order to make the review more manageable, I'll
follow up with other cases.
There is zero innovation in the test cases - they simply take the
existing half/float cases and replace f16->bf16 and half->bfloat.
Differential Revision: https://reviews.llvm.org/D156895
isOperationLegalOrCustomOrPromote returns true only if VT is other or legal
and operation action is Legal, Custom or Promote.
Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type.
One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering,
so vadd for element type i32 can be converted to addw.
Reviewed By: jacquesguan, craig.topper
Differential Revision: https://reviews.llvm.org/D156692
fp-imm.ll and zfh-imm.ll test 0.0 and -0.0 while float/double/half-imm.ll
tested other non-zero constants. It seems like they should all be
tested together.
There are slight coverage changes due to different command lines,
but I'm not sure its meaningful. For example, we now don't test
double 0.0 and -0.0 with only the F extension.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156929
D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0)
and specifically converts fixed extracted vectors to scalable vectors when
lowering vector reduction. It's not enough because there is another way to
create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert
like #64327.
#64327: https://github.com/llvm/llvm-project/issues/64327.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156863
The first trivial example I tried failed to merge due to the user scan
logic. Remove the complicated scan of users handling with distance
thresholds, with a same block restriction. The actual expansion of
sincos is basically the same size as sin or cos individually. Copy the
technique the generic optimization uses, which is to just use the
input instruction as the insert point or just insert at the start of
the entry block.
https://reviews.llvm.org/D156706
Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged.
Differential Revision: https://reviews.llvm.org/D156349
These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers.
Differential Revision: https://reviews.llvm.org/D156883
For G_ABS with type v2s16 and sgpr inputs break down into two s32 G_ABS
instructions.
Patch by: Acim Maravic
Differential Revision: https://reviews.llvm.org/D155867
There is no need to increase the size of odd sized vectors if they are
going to be scalarized by a different rule.
Patch by: Acim Maravic
Differential Revision: https://reviews.llvm.org/D155865
As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's
more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand
with appropriate mask. There are a range of possible ways to go about
this - e.g. writing a combine based on the
`llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new
interface to AtomicExpandPass to allow target-specific atomics
conversions, or trying to lift the conversion into AtomicExpandPass
itself based on querying some target hook. Ultimately I've gone with
what appears to be the simplest approach - just covering this case in
emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a
different name way back when it was introduced.
This also handles the `atomicrmw xchg a, -1` case suggested by Craig
during review.
Fixes https://github.com/llvm/llvm-project/issues/64090
Differential Revision: https://reviews.llvm.org/D156801
MachineVerifier does not check that DBG_VALUE, DBG_VALUE_LIST and
DBG_INSTR_REF have the expected number of operands, so printing them
(e.g. with -print-after-all) should not crash.
Differential Revision: https://reviews.llvm.org/D156226
Issue mentioned: https://github.com/riscv/riscv-code-size-reduction/issues/182
The order of callee-saved registers stored by Zcmp push in memory is reversed.
Pseudo code for cm.push in https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-1/Zc.1.0.4-1.pdf
```
if (XLEN==32) bytes=4; else bytes=8;
addr=sp-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) {
//if register i is in xreg_list
if (xreg_list[i]) {
switch(bytes) {
4: asm("sw x[i], 0(addr)");
8: asm("sd x[i], 0(addr)");
}
addr-=bytes;
}
}
```
The placement order for push is s11, s10, ..., ra.
CFI offset should be calculed as reversed order for correct stack unwinding.
Reviewed By: fakepaper56, kito-cheng
Differential Revision: https://reviews.llvm.org/D156437
It's not a libcall so doesn't really belong here to begin
with. Relying on checking the target name and explicit features isn't
particularly sound either. The library doesn't use the intrinsic
anymore, so it doesn't matter anyway.
The purpose of this code is to restrict overlap between source and destination registers. The tied input register is conceptually part of the destination. I can't see any reason why we need to prevent a partial undef tied source here, and skipping it reduces register pressure slightly.
Differential Revision: https://reviews.llvm.org/D156709
This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar.
Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code.
The primary effects of this change are:
* Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset.
* Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here.
Differential Revision: https://reviews.llvm.org/D156249
This handles logical ops of setccs and optimizes when the true or
false value is -1.
Reviewed By: asb, wangpc
Differential Revision: https://reviews.llvm.org/D156810
Reapplying after revert due to sanitizer failure. Includes fix to avoid querying dead lanes for vreg introduced by previous transform.
The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.
I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.
As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)
Differential Revision: https://reviews.llvm.org/D156477
When dealing with the subunits of a resource group, we should reset
the subunits availability at the first avaiable cycle of the resource
that contains the subunits. Previously, the reset operation was
returning cycle 0, effectively erasing the booking history of the
subunits.
Without this change, when using intervals for models have make use of
subunits, the erasing of resource booking for subunits can raise the
assertion "A resource is being overwritten" in
`ResourceSegments::add`. The test added in the patch is one of such
cases.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D156530
Adds support for SPV_INTEL_optnone.
Currently still in draft form but I wanted to open this revision
to ask some questions.
Differential Revision: https://reviews.llvm.org/D156297
Previously when llvm.reduce.* lowered, riscv backend created scalar vector with
netural element as start value. For llvm.reduce.and/or/min/max/fmax/fmin, we
could use the first element of source as the start value. It's benefit for RVV
since we could just use source vector as start vector.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155929
Currentlt, bf16 operations are automatically supported by promoting to float. This patch adds bf16 support by ensuring that load extension / truncate store operations are properly expanded.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156646
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
This reverts commit c56514f21b2cf08eaa7ac3a57ba4ce403a9c8956. This
commit adds global state that is shared between clang driver and clang
cc1, which is not correct when clang is used with `-fno-integrated-cc1`
option (no integrated cc1). The -march and -mtune option needs to be
properly passed through cc1 command-line and stored in TargetInfo.
This patch contains a number of uncontroversial changes:
- Replace all uses of
`errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with
informative error strings.
- Replace calls to `fail` in loops with at most one call per error
instance. Previously a function with 19 arguments would log "too many
args" 14 times. This was not helpful.
- Change one `if (..) switch ...` to `if (..) { switch ...`. The added
brace is consistent with a near-identical switch immediately above.
- Elide one `SDValue` copy by using a reference rather than value. This
is consistent with a variable declared immediately before it.
Reviewed By: yonghong-song
Differential Revision: https://reviews.llvm.org/D156136