The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes wrong at times, and the newly
inserted vector instructions are placed before the exec-mask restore
instruction which is wrong. It occurs mainly due to the dependency on
isBasicBlockPrologue that doesn't account early inserted instructions
(spills and splits) during RA and causes the block prolog break.
A better approach for deciding the insertion point should be worked out.
For now, improving the helper function to consider all possible early
insertions. This patch includes the spill instructions. The copies
associated with liverange split should also be included in the block
prolog.
For a future patch, is it important to keep the lowered SGPR
spills to be recognized as spill instructions during regalloc.
Directly lowering them into V_WRITELANE/V_READLANE won't allow
us to attach the SPILL flag to their instructions.
This patch introduces the pseudo instructions with the SGPRSpill
flag set in their Desc. They will get lowered to equivalent
instructions later during post RA pseudo expansion.
* Enhanced the logic of ExpandMemCmp pass to merge contiguous
subsequences
in LoadSequence, based on sizes allowed in `AllowedTailExpansions`.
* This enhancement seeks to minimize the number of basic blocks and
produce optimized code when using memcmp with non-register aligned sizes.
* Enable this feature for AArch64 with memcmp sizes modulo 8 equal to
3, 5, and 6.
For instructions like vmv.s.x and friends where we don't care about LMUL
or the
SEW/LMUL ratio, we can change the LMUL in its state so that it has the
same
SEW/LMUL ratio as the previous state. This allows us to avoid more VL
toggles
later down the line (i.e. use vsetvli zero, zero, which requires that
the
SEW/LMUL ratio must be the same)
This is an alternative approach to the idea in #69259, but note that
they
don't catch exactly the same test cases.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
The pre-index matcher just needs some small heuristics to make sure it
doesn't cause regressions. Apart from that it's a simple change, since
the only difference is an immediate operand of '1' vs '0' in the
instruction.
Some textual editing errors got through this pull request that was
merged a few weeks ago: https://github.com/llvm/llvm-project/pull/65876
This patch clears up the unintentional duplicated line, and white-space
at the end of the lines.
Noticed whilst working on #69494. VP intrinsics whose functional
equivalent is
an intrinsic were being marked as their lanes being non-speculatable,
even if
the underlying intrinsic was speculatable.
This meant that
```llvm
%1 = call <4 x i32> @llvm.vp.umax(<4 x i32> %x, <4 x i32> %y, <4 x i1> %mask, i32 %evl)
```
would be expanded out to
```llvm
%.splatinsert = insertelement <4 x i32> poison, i32 %evl, i64 0
%.splat = shufflevector <4 x i32> %.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
%1 = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %.splat
%2 = and <4 x i1> %1, %mask
%3 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```
instead of
```llvm
%1 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```
The cause of this was isSafeToSpeculativelyExecuteWithOpcode checking
the
function attributes for the VP instruction itself, not the functional
intrinsic. Since isSafeToSpeculativelyExecuteWithOpcode expects an
already
materialized instruction, we can't use it directly for the intrinsic
case. So
this fixes it by manually checking the function attributes on the
intrinsic.
Fix bug where the code expects just a single MI, but a series of
bundled MIs need to be handled instead.
The semi-formed bundled are created by SplitKit for the case where
not all lanes are live (buildSingleSubRegCopy). Then the remat
kicks in, and since the values that are copied in the bundle do not
need to be preserved due to the remat (dead defs), all instructions
in the bundle should be marked as dead.
However, only the first one gets marked as dead, which causes the
verifier to complain later with error: "Live range continues after
dead def flag".
Differential Revision: https://reviews.llvm.org/D156999
Extend the list of instructions that can be rematerialized in
SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar
loads.
Try shrinking instructions to remat only the part needed for current
context. Add SIInstrInfo::reMaterialize target hook, and handle
shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of
concept.
The (MF.size() == 0) assertis is triggering when building at -O0.
Reverting this while I work out what is going wrong.
This reverts commit 7e8eccd990d37d2771ca5ad7a84f54c3cfc4a5e1.
This patch is a follow up on https://reviews.llvm.org/D155299. This
patch combines add+lsr to rshrnb when 'B' in:
C = A + B
D = C >> Shift
is equal to (1 << (Shift-1), and the bits in the top half of each vector
element are zeroed or ignored, such as in a truncating masked store.
A literal like 0xffff8000 is valid to be used as KIMM in a SOPK
instruction, but at the moment our checks expect it to be fully sign
extended to a 64-bit signed integer. This is not required since all
cases which are being shrunk only accept 32-bit operands.
We need to sign extend the operand to 64-bit though so it passes the
verifier and properly printed.
We have VP tests for vmacc but not vmadd. This copies the vmacc tests
but swaps
the false operand of vp.merge to be the multiplicand instead of the
addend.
This shows how we could fold the vmerge into the vmadd's mask if we
commuted %a
and %b.
This patch prevents argument promotion from promoting pointers to
fixed-length vector types larger than 128 bits like `<8 x float>` into
the values of the pointees.
Such vector types are used for SVE VLS but there is no ABI for SVE VLS
arguments and the backend cannot lower such value arguments.
Fixes#69147
Rematerialization during register allocation is currently limited to a
single instruction with no inputs.
This patch introduces a pseudoinstruction that represents the
materialization of a constant. I've started with a sequence of 2
instructions for now, which covers at least the common LUI+ADDI(W) case.
This instruction will be expanded into real instructions immediately
after register allocation using a new pass. This gives the post-RA
scheduler a chance to separate the 2 instructions to improve ILP.
I believe this matches the approach used by AArch64.
Unfortunately, this loses some CSE opportunies when an LUI value is used
by multiple constants with different LSBs.
This feature is off by default and a new backend command line option is
added to enable it for testing.
This avoids the spill and reloads reported in #69586.
MipsIncomingValueHandler::assignCustomValue should return 1 instead of
2. The return value is the number of additional ArgLocs being consumed.
It's assumed that at least 1 is consumed.
Correct the LocVT used for the spill when there are no registers left.
It should be f64 instead of i32. This allows a workaround to be removed
in the SelectionDAG path.