512bit vpbroadcastw is available only with AVX512BW. Avoid lowering
BUILD_VEC into vbroard_cast node when the condition is not met. This
fixed a crash (see the added new test).
Try to pick the FP register bank based on surrounding use/defs. Code is
basically copied from AArch64.
Need legalizer changes to make this more useful. Right now we're stuck
with only being able to FP select types less than or equal to XLen.
When hoisting an invariant load, we should not combine it with an
existing load through common subexpression elimination (CSE). This is
because there might be memory-changing instructions between the existing
load and the end of the block entering the loop.
Fixes https://github.com/llvm/llvm-project/issues/72855
#73092 supported the encoding/decoding for PUSHP/POPP
#73233 supported the encoding/decoding for PUSH2[P]/POP2[P]
In this patch, we teach frame lowering to spill/reload registers w/
these instructions.
1. Use PPX for balanced spill/reload
2. Use PUSH2/POP2 for continuous spills/reloads
3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary
If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector.
Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.
This is mainly useful for better constant sharing.
This changes the fadd legalization to handle fp16 types, and treats more types
as legal so that the backend can produce the correct patterns. This is
currently a missing identity fold for `fadd x -0.0 -> x`
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
This fixes cases when SizeInBits is a multiple of the narrow size.
If SizeBits is equal to NarrowTy size, the first block would create an
illegal G_SEXT_INREG where the the extension size is equal to the type.
I tried to turn it into G_TRUNC+G_SEXT, but that just turned back into
G_SEXT_INREG causing an infinite loop. So punt to the splitting case.
In the for loop we should copy when the part ends on SizeInBits. In that
case there is no G_SEXT_INREG needed for partial. But we should note
that register in PartialExtensionReg for the first full part to use.
If the part starts on SizeInBits then we should do an AShr of
PartialExtensionReg.
We should only get to the G_SEXT_INREG case if the SizeInBits is in the
middle of the part.
Manually add clobbers for various register combinations to tests. This
highlights incorrectly performing shrink-wrapping, with
StoreSwiftAsyncContext expansion clobbering a live register.
Update regex to _explicitly_ show which exp versions are added.
The previous regex used `exp[^e]` to avoid matching calls like:
`@llvm.experimental.stepvector`.
Note: ArmPL Mappings for scalable types are not yet utilized
(eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib`
pass needs improvements.
Add support for emitting GFX11.5 s_singleuse_vdst instructions. This is
a power saving feature whereby the compiler can annotate VALU
instructions whose results are known to have only a single use, so the
hardware can in some cases avoid writing the result back to VGPR RAM.
To begin with the pass is disabled by default because of one missing
feature: we need an exclusion list of opcodes that never qualify as
single-use producers and/or consumers. A future patch will implement
this and enable the pass by default.
---------
Co-authored-by: Scott Egerton <scott.egerton@amd.com>
In the StackColoring pass we first scan for possible stack slot merges.
A SlotRemap map is setup with the remappings that should be performed.
Then the main work is done by calling remapInstructions and providing
that map.
Most of the work in remapInstructions would just be a waste of time in
situations when the SlotRemap map is empty, but it turns out that the
part that adjusts Alias Analysis information could end up dropping AA
metadata even when there are no stack slot merges being done. This
happens since all instruction's machine memory operands are considered,
and if we can't determine the underlying object that is accessed (using
getUnderlyingObjectsForCodeGen) then we conservatively drop AA metadata.
This patch simply avoids calling remapInstructions if we don't intend to
do any remappings (i.e. if SlotRemap is empty). That avoids touching AA
metadata when all we do is to remove lifetime markers. That seems like a
safe thing to do, as it is the same thing as happens when we bail out
early due to other reasons (e.g. when only having one lifetime marker).
For targets that do not care about Alias Analysis information after the
StackColoring pass this shouldn't have any impact, except that it might
improve compile time slightly as we now skip spending time in
remapInstructions when not doing any stack merges.
If we are loading the same ptr at different vector widths, then reuse the larger load and just extract the low subvector.
Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.
This is mainly useful for better constant sharing.
The patch is based on a reverted patch:
https://reviews.llvm.org/D103597. It was trying to rename registers
before alias check, which is not safe and causes miscompiles. This patch
does 2 things:
1. Do the renaming with necessary checks passed, including alias check.
2. Rename the register for the instructions between the pairs and
combine the second load into the first. By doing so we can just check
the renamability between the pairs and avoid scanning unknown amount of
instructions before/after the pairs.
Necessary refactoring has been made in order to reuse as much code
possible with STR renaming.
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes#72777.
This patch extends AArch64FrameLowering::emitProglogue to check if the
inserted prologue clobbers live registers.
It updates `llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir`
with an extra load to make x9 live before the store, preserving the
original test.
It uses the original
`llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir` as
`llvm/test/CodeGen/AArch64/emit-prologue-clobber-verification.mir`,
because there x9 is marked as live on entry, but used as scratch reg as
it is not callee saved.
The new assertion catches a mis-compile in
`store-swift-async-context-clobber-live-reg.ll` on
https://github.com/apple/llvm-project/tree/next