If a VL is zero then it's known to be less than or equal to every other
VL.
This looks weird on its own since a VL of zero isn't that common. The
test diffs come from a type being split resulting in a VP intrinsic's
EVL being zero.
The motivation for this is to split off part of an upcoming patch I plan
on submitting for RISCVVLOptimizer, which generalizes it to handle
recurrences, and needs to reason about an initial state of demanded VLs
set to zero.
This wasn't scalable and made the RISCVCC enum effectively just
a different way of spelling the branch opcodes.
This patch reduces RISCVCC back down to 6 enum values. The primary user
is select pseudoinstructions which now share the same encoding across
all
vendor extensions. The select opcode and condition code are used to
determine the branch opcode when expanding the pseudo.
The Cond SmallVector returned by analyzeBranch now returns the opcode
instead of the RISCVCC. reverseBranchCondition now works directly on
opcodes. getOppositeBranchCondition is also retained.
Stacked on #145622
We don't support any of the immediate branches in this function yet so
explicitly exclude them rather than relying on isReg to return false.
Remove use of AnalyzeBranch. It doesn't help us much. Part of the code
was already getting the operands directly and it just complicated
creating a new branch.
I also inlined the modifyBranch function so we could use addReg on
BuildMI.
As far as I know binutils does not have a similar option and I don't
know of a reason we shouldn't accept the RVC hint instructions.
The wording in the spec in the past suggested that maybe these
weren't valid instruction names, but that's been modified recently.
We were incorrectly changing -1 to 0 for unsigned compares in case 1.
The comment incorrectly said UINT64_MAX is bigger than INT64_MAX, but
we were doing a signed compare and UINT64_MAX is smaller than INT64_MAX
in signed space.
Prevent changing 0 constants since we can use x0. The test cases
for these are contrived to use addi rd, x0, 0. We're more likely
to have a COPY from x0 which we already don't optimize for other
reasons.
Check if registers are virtual before calling hasOneUse. The use count
for physical registers is meaningless.
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.
This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.
The Subtarget function is just deprecated for the moment.
Merge some macros that are only used once by another macro.
Rename macros to remove _MF4 where not needed.
I suspect these are artifacts from FP being split from integer in the
past.
In #90049 we removed the side effect flag on the vleNff pseudos with the
reasoning that we modelled the effect of setting vl as an output
operand.
This extends this further by removing the implicit def on vl, inserting
it back in RISCVInsertVSETVLI when we also emit the PseudoReadVL.
The motiviation for this is to make it easier to handle vleff in more
places in RISCVVectorPeephole in a follow up patch, which in turn will
make migrating the last vmerge peephole over from RISCVISelDAGToDAG
easier.
Some of these tests claim that the vleff shouldn't be deleted when none
of its values are used, but these are from the initial commit in
3b5430eb0dad5. I'm not sure if these still hold today?
This also moves the fault-only-first predicate to
RISCVInstrPredicates.td since we can't rely on the implicit vl operand
anymore.
This covers similar ground as #139086. Although I think it makes sense
to land both, the practical motivation for #139086 is significantly
reduced here.
This canonicalisation makes no difference to compressibility (we have
compress patterns for the operands in either order), but it does mean
that easier to read assembly is printed, as we don't have aliases
defined for beq/bne with x0 in either position.
We can end up with bltu/bgeu with the zero register as the 0th operand
after tail duplication and then machine copy propagation. Canonicalising
in simplifyInstruction (called from MachineCopyPropagation) both
produces easier to read assembly output, and means it's possible that
the compressed c.beqz/c.bnez forms are produced.
Similar to #141261.
These aren't easy to test without write MIR tests in areas we don't
currently have tests. I'm not sure we use X0_Pair anywhere today.
I'm going to try to migrate RISCVMakeCompressible to use copyToReg so we
can share that code instead of basically duplicating it.
I'm still concerned that target independent code may fold an
extract_subreg operation and get an incorrect register if we do start
using X0_Pair. We may have to special case dummy_reg_pair_with_x0 in the
encoder and printer to be safe.
isLoadFromStackSlot/isStoreToStackSlot/getMemOperandsWithOffsetWidth
The first 2 probably requires spills/reloads which we don't use
LD_RV32/SD_RV32 for yet.
I think getMemOperandsWithOffsetWidth is mainly used for load/store
clustering. I think we can assume this just works.
Add ISEL patterns for generating the Xqcibi branch immediate
instructions. Similar to #135771 adds new CondCodes for the various
branch instructions and uses them to return the appropriate instruction.
PR #136875 was posted as a draft PR that handled a subset of these
cases, using the CompressPat mechanism. The consensus from that
discussion (and a conclusion I agree with) is that it would be
beneficial doing this optimisation earlier on, and in a way that isn't
limited just to cases that can be handled by instruction compression.
The most common source for instructions that can be
optimized/canonicalized in this way is through tail duplication in
MachineBlockPlacement followed by machine copy propagation. For RISC-V,
choosing a more canonical instruction allows it to be compressed when it
couldn't be before. There is the potential that it would make other
MI-level optimisations easier.
This modifies ~910 instructions across an llvm-test-suite build
including SPEC2017, targeting rva22u64. Looking at the diff, it seems
there's room for eliminating instructions or further propagating after
this.
Coverage of instructions is based on observations from a script written
to find redundant or improperly canonicalized instructions (though I aim
to support all instructions in a 'group' at once, e.g. MUL* even if I
only saw some variants of MUL in practice).
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
If there is another branch instruction also with immediate operand, but
it is used to specify which bit to be tested is set or clear. We only
check whether operand2 is immediate or not here. There are no way to
distinguish between them.
So add new CondCode COND_CV_BEQIMM/COND_CV_BNEIMM that we can know what
kinds of immediate branch instruction are matched in Select_* Pseudo.
This patch adds support for parsing symbols in the Xqcili load large
immediate instructions. The 32 bit `qc.li` instructions uses the
`R_RISCV_QC_ABS20_U` relocation while the 48 bit `qc.e.li` instruction
uses the `R_RISCV_QC_E_32` relocation and the `InstFormatQC_EAI`
instruction format.
Vendor relocation support will be added in a later patch.
Spimm in the spec refers to the 2-bit encoded value. All of the code
uses the 0, 16, 32, or 48 adjustment value.
Also remove the decodeZcmpSpimm as its identical to the default
behavior for no custom DecoderMethod.
As with #132002, these do show up in a compilation of llvm-test-suite
(including SPEC 2017). We remove 30-40 static instances so this isn't
anything earth shattering.
rs2 is always added to the other shifted (and potentially extended)
operand unmodified, so rs1==zero is equivalent to a copy.
This adds coverage for additional instructions in isCopyInstrImpl, for
now picking just those where I can observe that there is a codegen
difference for SPEC.
This allows MachineCopyPropagation to successfully eliminate no-op moves in this form.
In the `qc.cm.pushfp` instruction, it is like `cm.pushfp` except in one
important way - `qc.cm.pushfp {ra}, -N*16` is not a valid encoding,
because this would update `s0`/`fp`/`x8` without saving it.
This change now correctly rejects this variant of the instruction, both
during parsing and during disassembly. I also implemented validation for
immediates that represent register lists (both kinds), which may help to
catch bugs in the future.
This patch is an alternative to PRs #117060, #131684, #131728.
The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.
Adding Michael as a co-author as most of the code that evaluates the
condition comes from #131684.
Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
The primary reason is that if you pass a TypeSize without explicitly
converting to LocationSize, you otherwise implicit convert to uint64_t
to call the respective LocationSize constructor. This means that any
scalable value becomes a runtime assertion failure.
By replacing uint64_t with TypeSize in this API, we avoid the implicit
conversion for TypeSize. uint64_t callers implicit convert to
LocationSize (via the raw constructor) which should have unchanged
behavior.
optimizeCondBranch isn't allowed to modify the CFG, but it can rewrite
the branch condition freely. However, If we could fold a conditional
branch to an unconditional one (aside from that restriction), we can
also rewrite it into some canonical conditional branch instead.
Looking at the diffs, the only cases this catches in tree tests are
cases where we could have constant folded during lowering from IR, but
didn't. This is inspired by trying to salvage code from
https://github.com/llvm/llvm-project/pull/131684 which might be useful.
Given the test impact, it's of questionable merits. The main advantage
over only the late cleanup pass is that it kills off the LIs for the
constants early - which can help e.g. register allocation.
The primary effect of this is that we get proper scalable sizes printed
by the assembler, but this may also enable proper aliasing analysis. I
don't see any test changes resulting from the later.
Getting the size is slightly tricky as we store the scalable size as a
non-scalable quantity in the object size field for the frame index. We
really should remove that hack at some point...
For the synthetic tuple spills and fills, I dropped the size from the
split loads and stores to avoid incorrect (overly large) sizes. We could
also divide by the NF factor if we felt like writing the code to do so.