After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for
FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we
can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency
spill slot for GPR for large frames") to generate better code, as the
issue with `RegScavenger` has been resolved.
Fixes#88109
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.
We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.
Fixes#86717
The `MSB` must not be greater than `GRLen`. Without this patch, newly
added test cases will crash with LoongArch32, resulting in a 'cannot
select' error.
The SelectionDAG scheduling preference now becomes source order
scheduling (machine scheduler generates better code -- even without
there being a machine model defined for LoongArch yet).
Most of the test changes are trivial instruction reorderings and
differing register allocations, without any obvious performance impact.
This is similar to commit: 3d0fbafd0bce43bb9106230a45d1130f7a40e5ec
This patch aims to solve Firefox issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1882301
Similar to 616289ed2922. Currently LoongArch uses an ll.[wd]/sc.[wd]
loop for ATOMIC_CMP_XCHG. Because the comparison in the loop is
full-width (i.e. the `bne` instruction), we must sign extend the input
comparsion argument.
Note that LoongArch ISA manual V1.1 has introduced compare-and-swap
instructions. We would change the implementation (return `ANY_EXTEND`)
when we support them.
For instruction xvpermi.q, only [1:0] and [5:4] bits of operands[3] are
used. The unused bits in operands[3] need to be set to 0 to avoid
causing undefined behavior.
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.
This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
This commit updates the pattern matching logic for the `AddLike`
predicate in `LoongArchInstrInfo.td` to use the
`isBaseWithConstantOffset` function provided by `CurDAG`. This
optimization aims to improve the efficiency of pattern matching by
identifying cases where the operation can be represented as a base
address plus a constant offset, which can lead to more efficient code
generation.
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.
This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.
While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
This patch fixes the crash issue in the test:
CodeGen/LoongArch/can-not-realign-stack.ll
Register allocator may spill virtual registers to the stack, which
introduces stack alignment requirements (when the size of spilled
registers exceeds the default alignment size of the stack). If a
function does not have stack alignment requirements before register
allocation, registers used for stack alignment will not be preserved.
Therefore, we should implement `canRealignStack()` to inform the
register allocator whether it is allowed to perform stack realignment
operations.
This patch gets the code model from global variable attribute if it has,
otherwise the target's will be used.
---------
Signed-off-by: WANG Rui <wangrui@loongson.cn>
According to the description of the psABI v2.30:
https://github.com/loongson/la-abi-specs/releases/tag/v2.30, moved the
expansion of relevant pseudo-instructions from
`LoongArchPreRAExpandPseudo` pass to `LoongArchExpandPseudo` pass, to
ensure that the code sequences of `PseudoLA*_LARGE` instructions and
Medium code model's function call are not scheduled.
According to the description of the psABI v2.20:
https://github.com/loongson/la-abi-specs/releases/tag/v2.20, adjustments
are made to the function call instructions under the medium code model.
At the same time, AsmParser has already supported parsing the call36 and
tail36 macro instructions.
By default, `isShuffleMaskLegal` always returns true, which can result
in the expansion of `BUILD_VECTOR` into a `VECTOR_SHUFFLE` node in
certain situations. Subsequently, the `VECTOR_SHUFFLE` node is expanded
again into a `BUILD_VECTOR`, leading to an infinite loop.
To address this, we always return false, allowing the expansion of
`BUILD_VECTOR` through the stack.
```
when a=c=-0.0, b=0.0:
-(a * b + (-c)) = -0.0
-a * b + c = 0.0
(fneg (fma a, b (-c))) != (fma (fneg a), b ,c)
```
See https://reviews.llvm.org/D90901 for a similar discussion on X86.
Previously, bolt could not get FixupKind for BL correctly, because bolt
cannot get target-flags for BL. Here just add support in MCCodeEmitter.
Fixes https://github.com/llvm/llvm-project/pull/72826.
This patch is used to test whether fixupkind for bl can be returned
correctly. When BL has target-flags(loongarch-call), there is no error.
But without this flag, an assertion error will appear. So the test is
just tagged as "Expectedly Failed" now until the following patch fix it.
PR #67391 improved atomic codegen by handling memory ordering specified
by the `cmpxchg` instruction. An acquire barrier needs to be generated
when memory ordering includes an acquire operation. This PR improves the
codegen further by only handling the failure ordering.