Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets
```c
int foo() {
if (__nvvm_reflect("__CUDA_ARCH") >= 700)
asm("valid;\n");
}
```
Relanding after fixing a bug.
This reverts commit 9211e67da36782db44a46ccb9ac06734ccf2570f.
Summary:
This seemed to crash one one of the CUDA math tests. Revert until it can
be fixed.
Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets
```c
int foo() {
if (__nvvm_reflect("__CUDA_ARCH") >= 700)
asm("valid;\n");
}
```
The current implementation of aliases tries to remove all the aliases in
the module to prevent the generic version of `AsmPrinter` from emitting
them incorrectly. Unfortunately, if the aliases are used this will fail.
Instead let's override the function to print aliases directly.
In addition, the declarations of the alias functions must occur before
the uses. To fix this we emit alias declarations as part of
`emitDeclarations` and only emit the `.alias` directives at the end
(where we can assume the aliasee has also already been declared).
Otherwise we will crash since target intrinsics don't have their types
legalized. Let the mgather get legalized first, then do the combine on
the legal type.
Fixes#81088
Co-authored-by: Craig Topper <craig.topper@sifive.com>
This adjusts the isSubVectorExtractCheap callback to consider any
extract which fits entirely within the first VLEN bits of the src vector
(and uses a 5 bit immediate for the slide) as cheap. These can be done
via a single m1 vslide1down.vi instruction.
This allows our generic DAG combine logic to kick in and recognize a few
more cases where shuffle source is longer than the dest, but that using
a wider shuffle is still profitable. (Or as shown in the test diff, we
can split the wider source and do two narrower shuffles.)
Fixes https://github.com/llvm/llvm-project/issues/80910.
Per the documentation in ISDOpcodes.h, for BUILD_VECTOR "The types of
the operands must match the vector element type, except that integer
types are allowed to be larger than the element type, in which case the
operands are implicitly truncated."
This transform was assuming that the scalar operand type matched the
result type. This resulted in essentially performing a truncate before a
binop, instead of after. As demonstrated by the test case changes, this
is often not legal.
Fixes#81136 - we might be loading from a constant pool entry wider than the destination register bitwidth, affecting the vextload scale calculation.
ConvertToBroadcastAVX512 doesn't yet set an explicit bitwidth (it will default to the constant pool bitwidth) due to difficulties in looking up the original register width through the fold tables, but as we only use rebuildSplatCst this shouldn't cause any miscompilations, although it might prevent folding to broadcast if only the lower bits match a splatable pattern.
When using -mbranch-protection=pac-ret+pc, x16 is used in the function
epilogue to hold the address of the signing instruction. This is used by
a HINT instruction which can only use x16, so we can't change this. This
means that we can't use it to hold the function pointer for an indirect
tail-call.
There is existing code to force indirect tail-calls to use x16 or x17
when BTI is enabled, so there are now 4 combinations:
bti pac-ret+pc Valid function pointer registers
off off Any non callee-saved register
on off x16 or x17
off on Any non callee-saved register except x16
on on x17
Metadata is still 1.2, not 1.3 after V6.
I thought that amdhsa.version mapped to the COV version but it's
separate, and there are no MD changes in V6, hence it doesn't need to be
updated.
This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.
The first case handles when we know the low bits are zero because the
step
amount is a power of two. This is taken from
https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due
to it
not having any test difference, I've included it here for completeness's
sake.
The second case handles the case when we have an upper bound on
vscale_range.
We can use this to work out the upper bound on the number of elements,
and thus
what the maximum step will be. From the maximum step we then know which
hi bits
are zero.
On its own, computing the known hi bits results in some small
improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm
hoping
to be able to use this later to reduce the LMUL in index calculations
for
vrgather/indexed accesses.
---------
Co-authored-by: Philip Reames <preames@rivosinc.com>
* they all do stack adjustments, so they all use and def x2.
* popret and popretz also return
* popretz also defines x10
This adds that to the TD file and updates the PushPopOptimizer to
preserve the extra implicit operands added during frame lowering when
converting to popret(z).
Implement the following assembly format flags, which are already
supported by GCC:
'A': On z14 or higher: If operand is a mem print the alignment
hint usable with vl/vst prefixed by a comma.
'O': print only the displacement of a memory reference or address.
'R': print only the base register of a memory reference or address.
Implement 'A' conservatively, since the memory operand alignment
information is not available for INLINEASM at the moment.
Reland the original patch with additional commit containing fix for two
issues:
1. Attempting to bitcast using MVTs with no corresponding LLVM type.
getDWordFromOffset now works directly with the original vector to get
the corresponding elements given the DWordOffset.
2. Improper bit tracking in CalculateByteProvider for vector types using
certain ops. Previously, bit tracking for certain ops (e.g.
ISD::TRUNCATE) assumed operands were scalar types, which is not correct
since these ops have different semantics depending on vector / scalar.
CalculateByteProvider / CalculateSrcByte now exit on vector types,
handling which is a TODO.
OpFlag and WrapperKind should be chosen consistently with each other in
regards to PIC, otherwise we hit asserts later on.
Broken by c04a05d8.
Fixes#80831.
We disabled these extra-special RUNlines due to unexpected interactions
between the various things we've been fixing. Re-enable them (they'll run
on the llvm-new-debug-iterators buildbot) as they all now pass.
If part of a register (lowered from REG_SEQUENCE) is undefined then we
should propagate undef flags to uses of those lanes. This is only
performed when live intervals are present as it requires live intervals
to correctly match uses to defs, and the primary goal is to allow
precise computation of subrange intervals.
Global Instruction Selector could not select the code:
%0:gprb(s32) = G_CONSTANT i32 -1
In DAG selector the similar code is selected to the instruction MVNi
using custom operand `mod_imm_not`. Changing its definition from
`PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM`
make the relevant rule available for GlobalISel too.
Registers that are pushed/popped by Zcmp or libcalls have pre-defined
frame indices that are never allocated in MachineFrameInfo. They're
being used throughout PEI, but the rest of codegen doesn't work that way
and expects each frame index to be a valid index in MFI.
This patch keeps it local to PEI and removes them from the
CalleeSavedInfo list at the end of the pass.
Before this pass, any MIR testing post-PEI is broken and asserts (see
issue #79491).
In github PR #78731 it looks like I added test coverage for RemoveDIs to
either the wrong test, or not enough. Adding
--try-experimental-debuginfo-iterators to this particular test is enough to
restore some coverage it seems.
The comment and code here seems to match getTypeForExtReturn. The
history shows that at the time this code was added, similar code existed
in SelectionDAGBuilder. SelectionDAGBuiler code has since been
refactored into getTypeForExtReturn.
This patch makes FastISel match SelectionDAGBuilder.
The test changes are because X86 has customization of
getTypeForExtReturn. So now we only extend returns to i8.
Stumbled onto this difference by accident.
FastISel may create a redundant BGTZ terminal which fallthroughes.
```
BGTZ %2:gpr32, %bb.1, implicit-def $at
bb.1.bb1:
; predecessors: %bb.0
```
The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835
In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).
Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also
handles MIPS.
There was an error where dividend of type i64 and actual used number of
bits of 32 fell into path that assumes only 24 bits being used. Check
that AtLeast field is used correctly when using computeNumSignBits and
add necessary extend/trunc for 32 bits path.
Regolden and update testcases.
@jrbyrnes @bcahoon @arsenm @rampitec
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
The previous approach used opaque registers which can change between different
architectures and required encoding the bitfield information in the backend,
which may change between versions.
This change is an extension the previously added support - which only handled
entry functions. This adds support for all functions.
The change also includes some re-factoring to separate common code.
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove
the outer setcc as it will produce the same value as the inner. This can be
generalized to anything where the top bits are known to be 0, as the value will
remain as 1 or 0.
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.
PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.
This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
This shufflevector gets scalarized into a build_vector of extract_vector_elts
because the output type doesn't match the input vector type.
Normally this is combined back into a vector_shuffle in DAGCombine, but this
one fails because we don't consider a extract_subvector to be cheap,
specifically because it's at an index > 31.
This should be canonicalized back into a vector_shuffle at some point so we can
lower it as a vrgather.vv.
Add opcodes for different store instructions to the target hook that can
enable more STP pairs. This is split off from the patch that does the
same for some load instructions (#79003).
Patch co-authored by Cameron McInally.
When promoted value, it is meaningless to copy value from reg to another
reg with the same type.
This PR add additional check for this cases to reduce the code size.
Fixes: #80053.
Fixes https://github.com/llvm/llvm-project/issues/80744. This transform
doesn't handled vectors at all, The fixed length ones pass the first
check, but would fail the constant operand checks which immediate follow.
This patch takes the simplest approach, and just guards the transform
for scalar integers.