There was an error in decoding shift type, which permitted shift types
other than LSL to be (incorrectly) folded into the addressing mode of a
load/store instruction.
If a function has odd number of same type of registers to save, and the
calling convention also requires odd number of such type of CSRs, an FP
register would be accidentally marked as saved when producePairRegisters
returns true.
This patch also fixes the AArch64LowerHomogeneousPrologEpilog pass not
handling AArch64::NoRegister; actually this pass must be fixed along
with the register pairing so i can write a test for it.
This patch commits tests that can be optimized by improving
performCONCAT_VECTORCombine to do a better job at decomposing the base
pointer and recognizing a constant offset.
V3 has been deprecated for a while as well, so it can safely be removed
like V2 was removed.
- [Clang] Set minimum code object version to 4
- [lld] Fix tests using code object v3
- Remove code object V3 from the AMDGPU backend, and delete or port v3
tests to v4.
- Update docs to make it clear V3 can no longer be emitted.
`M68k_RTD` is really similar to X86's stdcall, in which callee pops the
arguments from stack. In LLVM IR it can be written as `m68k_rtdcc`.
This patch also improves how ExpandPseudo Pass handles popping stack at
function returns in the absent of the RTD instruction.
Differential Revision: https://reviews.llvm.org/D149864
The current lowering of statepoints does not take into account return
attributes present on the `gc.result` leading to different code being
generated than if one were to not use statepoints. These return
attributes can affect the ABI which is why it is important that they are
applied in the lowering.
This moves the legalization of G_FMA to the action builder that can handle more
types. The existing arm64-vfloatintrinsics.ll has been removed as they are
covered in other test files.
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.
When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node.
It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`.
Fixes#68855.
This re-applies commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481, which
was reverted by 8abb2ace888bdd04a1bdb4ac2f2fc25d57a5760a.
The issue was fixed by 7510f32f906ab4e583542eae2611b020f88629af
The optimization in CodeGenPrepare, where GEPs are unmerged across
indirect branches must respect the types of both GEPs and their sizes
when adjusting the indices.
The sample here shows the bug:
https://godbolt.org/z/8e9o5sYPP
The value `%elementValuePtr` addresses the second field of the
`%struct.Blub`. It is therefore a GEP with index 1 and type i8.
The value `%nextArrayElement` addresses the next array element. It is
therefore a GEP with index 1 and type `%struct.Blub`.
Both values point to completely different addresses, even if the indices
are the same, due to the types being different.
However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast
from `%elementValuePtr`, meaning both were treated as equal.
The cause for this is that the unmerging optimization does not take
types into consideration.
It sees both GEPs have `%currentArrayElement` as source operand and
therefore tries to rewrite `%nextArrayElement` in terms of
`%elementValuePtr`.
It changes the index to the difference of the two GEPs. As both indices
are `1`, the difference is `0`. As the indices are `0` the GEP is later
replaced with a simple bitcast in CodeGenPrepare.
Before adjusting the indices, the types of the GEPs would have to be
aligned and the indices scaled accordingly for the optimization to be
correct.
Due to the size of the struct being `16` and the `%elementValuePtr`
pointing to offset `1`, the correct index for the unmerged
`%nextArrayElement` would be 15.
I assume this bug emerged from the opaque pointer change as GEPs like
`%elementValuePtr` that access the struct field based of type i8 did not
naturally occur before.
In light of future migration to ptradd, simply not performing the
optimization if the types mismatch should be sufficient.
When an sreg sub-register of a q register was spilled,
AArch64InstrInfo::foldMemoryOperandImpl would emit a spill of a d
register, which gives the wrong result when the target is big-endian as
the following q register fill will put the value in the top half.
Fix this by greatly simplifying the existing code for widening the spill
to only handle wzr to xzr widening, as the default result we get if the
function returns nullptr is already that a widened spill will be
emitted.
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.
Emit `zext(and(x,c))` instead.
Fixes https://github.com/llvm/llvm-project/issues/68783.
These are treated as DBAR 0 on older uarchs, so we can start to
unconditionally emit the new hints right away.
Co-authored-by: WANG Rui <wangrui@loongson.cn>
Following up on prior RFC
(https://lists.llvm.org/pipermail/llvm-dev/2020-September/145357.html)
we can now improve above our highly-optimized basic-block-sections
binary (e.g., 2% for clang) by applying path cloning. Cloning can
improve performance by reducing taken branches.
This patch prepares the profile format for applying cloning actions.
The basic block cloning profile format extends the basic block sections
profile in two ways.
1. Specifies the cloning paths with a 'p' specifier. For example, `p 1 4
5` specifies that blocks with BB ids 4 and 5 must be cloned along the
edge 1 --> 4.
2. For each cloned block, it will appear in the cluster info as
`<bb_id>.<clone_id>` where `clone_id` is the id associated with this
clone.
For example, the following profile specifies one cloned block (2) and
determines its cluster position as well.
```
f foo
p 1 2
c 0 1 2.1 3 2 5
```
This patch keeps backward-compatibility (retains the behavior for old
profile formats). This feature is only introduced for profile version >=
1.
Split a virtual register with hint may generate COPY instructions in
multiple cold basic blocks, and increase code size. So disable this
split when the function is optimized for size.
Instructions that take immediate addresses sign-extend their operands, so cannot be used when we actually need zero extension. Use indirect addressing to avoid problems.
The functions in the test are a modified versions of the functions by the same names in large-constants.ll, with i64 types changed to i32.
Fixes#55061
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D124406
G_TRUNC will get lowered into trunc(merge(trunc(unmerge),
trunc(unmerge))) if the source is larger than 128 bits or the truncation
is more than half of the current bit size.
Now mirrors ZEXT/SEXT code more closely for vector types.
When performing a tail call, check the value of LR register after
authentication to prevent the callee from signing and spilling an
untrusted value. This commit implements a few variants of check,
more can be added later.
If it is safe to assume that executable pages are always readable,
LR can be checked just by dereferencing the LR value via LDR.
As an alternative, LR can be checked as follows:
; lowered AUT* instruction
; <some variant of check that LR contains a valid address>
b.cond break_block
ret_block:
; lowered TCRETURN
break_block:
brk 0xc471
As the existing methods either break the compatibility with execute-only
memory mappings or can degrade the performance, they are disabled by
default and can be explicitly enabled with a command line option.
Individual subtargets can opt-in to use one of the available methods
by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod().
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D156716
The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have
encodings for
non-value-returning forms, although actually using them isn't supported
by
hardware. However, these encodings aren't supported by the backend,
meaning
that they can't even be assembled or disassembled.
Add support for the non-returning encodings, but gate actually using
them
in instruction selection behind a new feature
FeatureAtomicCSubNoRtnInsts,
which no target uses. This does allow the non-returning instructions to
be
tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover
them.
The feature does not gate assembling or disassembling them, this is now
not an error, and encoding and decoding tests have been adapted
accordingly.
This is an attempt at rebooting https://reviews.llvm.org/D28990
I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.
This should fix PR46320.
Reviewed By: echristo, rnk, tmgross
Differential Revision: https://reviews.llvm.org/D86310
Update test/CodeGen/AMDGPU/remat-smrd.mir:
* Convert a negative case of non-dereferenceable invariant load to positive one.
* Add new cases for subreg.
This patch adds stackmap support for RISC-V without targets (i.e. the nop patchable forms).
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D123496
This is what AArch64 has done in https://reviews.llvm.org/D20762.
Tests are added in macro fusion tests, which uncover a bug that
DAG mutations don't take effect.
This PR improves memory barriers generated by atomic operations.
Memory barrier semantics of LL/SC:
```
LL: <memory-barrier> + <load-exclusive>
SC: <store-conditional> + <memory-barrier>
```
Changes:
* Remove unnecessary memory barriers before LL and between LL/SC.
* Fix acquire semantics. (If the SC instruction is not executed, then
the guarantee of acquiring semantics cannot be ensured. Therefore, an
acquire barrier needs to be generated when memory ordering includes an
acquire operation.)
This reverts commit b5ff71e261b637ab7088fb5c3314bf71d6e01da7. As described in
https://github.com/llvm/llvm-project/issues/68730, this appears to have exposed
an existing liveness issue. Revert to green until we can figure out how to
address the root cause.
Note: This was not a clean revert. I ended up doing it by hand.