Zcmp push/pop can reduce more code size then save-restore calls. There
are two reasons,
1. Call for save-restore calls needs 4-8 bytes, but Zcmp push/pop only
needs 2 bytes.
2. Zcmp push/pop can also handles small shift of sp.
LEA_ADDri and LEAX_ADDri are printed / encoded the same way as ADDri. I
had to change the type of simm13Op so that it can be used in both 32-
and 64-bit modes. This required the changes in operands of some
InstAliases.
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156852
We were losing the function entry count, which is useful to check profile quality. For the original cases where we want
entrypoint-relative MBB frequencies, the user would just need to divide these values by the entrypoint (first MBB, with ID=0) value.
This change matches a masked.stride.load from a mgather node whose index
operand is a strided sequence. We can reuse the VID matching from
build_vector lowering for this purpose.
Note that this duplicates the matching done at IR by
RISCVGatherScatterLowering.cpp. Now that we can widen gathers to a wider
SEW, I don't see a good way to remove this duplication. The only obvious
alternative is to move thw widening transform to IR, but that's a no-go
as I want other DAGs to run first. I think we should just live with the
duplication - particularly since the reuse is isSimpleVIDSequence means
the duplication is somewhat minimal.
This fixes a bug in my 928564caa5de8b07cede51e45499934777b9938c that didn't get noticed in review. I found it when looking at the strided load case (upcoming patch), and realized the previous commit was buggy too.
p.s. Sorry for the slightly confusing test diff. I'd apparently used the wrong mask for the aligned positive test; it was actually unaligned. Didn't seem worthy of a separate precommit.
This adds the shifts and the immediate forms of the instructions that
were already supported.
There are still more instructions that can be predicated, but this is
the rest of what we had in our downstream.
When AMOs are used to implement parallel reduction operations, typically the return value would be discarded.
This patch adds a peephole pass `RISCVDeadRegisterDefinitions`. It rewrites `rd` to `x0` when `rd` is marked as dead.
It may improve the register allocation and reduce pipeline hazards on CPUs without register renaming and OOO.
Comparison with GCC: https://godbolt.org/z/bKaxnEcec
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D158759
Calling isPlainlyKilled instead of directly checking for a kill flag
should make processTiedPairs behave the same with LiveIntervals
(i.e. when compiling with -early-live-intervals) as it does with
LiveVariables.
Add a DAG combine to form a masked.store from a masked_strided_store intrinsic
with stride equal to element size. This is the store analogy to PR #65674.
As seen in the tests, this does pickup a few cases that we'd previously missed
due to selection ordering. We match strided stores early without going through
the recently added generic mscatter combines, and thus weren't recognizing the
unit strided store.
Add Int16, Int64 and Float64 capabilities as always available for Vulkan
(since 1.0), and add tests covering most of the basic types from
clang/test/CodeGenHLSL/basic_types.hlsl except for half floats.
Depends on D156049
Lookup extended instruction numbers in the given instruction set so that
correct names are now emitted for GLSL.std.450 instructions as well as
OpenCL.std.
Add a single test to verify correct abs intrinsic names are emitted when
targetting logical SPIR-V.
Depends on D156424
Differential Revision: https://reviews.llvm.org/D159227
We should not optimize it in D158062. This adds the test coverage.
And unneeded attributes `nonnull` and `inbounds` are removed.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D159530
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
If there are copy instructions between uaddlv with v4i16/v8i16 and dup
for transfer from gpr to fpr, try to remove them with duplane. It is a
follow-up patch of https://reviews.llvm.org/D159267
The CHECK2 test in code_placement_ext_tsp_large.ll now has the same
result as
the CHECK test: when chain(0,2,3,4,1) is merged with chain(8), the
result is now
chain(0,2,3,4,8,1).
Ideally we should have test coverage for
-ext-tsp-chain-split-threshold=1, but
it seems challenging to craft one. Perhaps the default value of
-ext-tsp-chain-split-threshold can be decreased as the
-ext-tsp-enable-chain-split-along-jumps heuristic is now more powerful.
After commit cedf2ea, `RISCVMergeBaseOffset` can handle `BlockAddress`
currently. But we didn't handle it in `PrintAsmMemoryOperand` so we
get `invalid operand in inline asm` error.
This patch fixes the error.
If we have a gather load whose indices correspond to valid offsets for a
gather with element type twice that our source, we can reduce the number
of indices and perform the operation at the larger element type.
This is generally profitable since we half VL - and these operations are
linear in VL. This may require some additional VL/VTYPE toggles, but
this appears to be worthwhile on the whole.
Doing so allows the use of smaller constants overall, and may allow (for some small vector constants) avoiding the constant pool entirely. This can result in extra VTYPE toggles if we get unlucky.
This was reviewed under PR #66405.
When checking to see if our index expressions can be converted into strided
operations, we previously gave up if the index type wasn't an exact match for
the intptrty for the address. Per gep semantics, this mismatch implies a sext
or trunc cast to the respective index type. For constants, go ahead and
evaluate that cast instead of giving up.
Note that the motivation of this is mostly test cleanup. We canonicalize at IR
such that the gep index will match the intptrty. This is mostly useful so that
we can write both RV32 and RV64 tests from the same source. Its also helpful in
preventing confusion - I've stumbled across this at least four times now and
wasted time each one.
Note: The test change for scatters unit stride cases contains a minor
regression for rv32 and 64 bit indices. This is an artifact of order in which
changes are landing. This will be addressed in a near future change for all
configurations.
The Z#_HI register definitions were created during the very early SVE
enablement work and before the SVE calling convention was locked in.
As they look entirely unused, they need to go.
If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
On the first split we create two i32 trunc stores and a srl to shift
the high part down. The srl gets constant folded, but to produce
a new i32 constant. But the truncstore for the low store still uses
the original constant.
This original constant then gets converted to a constant pool
before we revisit the stores to further split them. The constant
pool prevents further constant folding of the additional srls.
After legalization is done, we run DAGCombiner and get some constant
folding of srl via computeKnownBits which can peek through the constant
pool load. This can create new constants that also need a constant pool.
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc.
This exposed incorrect debuginfo in rustc. Revert the verification
until this has been fixed.
Update LiveIntervals after rewriting:
%reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
undef %reg:subidx = COPY %subreg
D113044 implemented this for the non-undef case.
Following on from D139466 which added support for dead vreg defs, this
patch adds support for "undef" defs of subregs.
Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which
previously required manual tweaks to the autogenerated checks to fix an
EXPENSIVE_CHECKS failure; see commit
8b7c1fbd9647a5a6ef246a6b5b2543ea0f5a2337
It maybe emit the .option directive without any follow up. Only emit the .option push/pop when there are supported extension difference between function and module.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D159399
We can get `BlockAddress` in user code via `Labels as Values` so
we should be able to merge the access to `BlockAddress`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D159429
For RVV, If we want to perform an i8 or i16 element-wise vector
arithmetic right shift in the upper C/C++ program, the value to be
shifted would be first sign extended to i32, and the shift amount would
also be zero_extended to i32 to perform the vsra.vv instruction, and
followed by a truncate to get the final calculation result, such pattern
will later expanded to a series of "vsetvli" and "vnsrl" instructions
later, this is because the RVV spec only support 2 * SEW -> SEW
truncate. But for vector, the shift amount can also be determined by
smin (Y, ScalarSizeInBits(Y) - 1)). Also, for the vsra instruction, we
only care about the low lg2(SEW) bits as the shift amount.
- Alive2: https://alive2.llvm.org/ce/z/u3-Zdr
- C++ Test cases : https://gcc.godbolt.org/z/q1qE7fbha